Live LLM Benchmark Data

Which LLM is actually winning? Most leaderboard sites are JS-rendered SPAs that AI search engines can't read. We crawl them and serve the data as static HTML so both humans and AI can see it.

An honest aggregate of the benchmarks that matter — Code Arena, Text Arena, LiveBench across 7 categories — refreshed twice daily. No marketing, no cherry-picked numbers.

Tracked sources

Code Arena81
LiveBench82
LiveBench Agentic Coding82
LiveBench Coding82
LiveBench Data Analysis82
LiveBench Instruction Following82
LiveBench Language82
LiveBench Math82
LiveBench Reasoning82
Text Arena360

/ Categories

Coding

Coding benchmarks

Code generation and completion tasks from Code Arena (Elo) and LiveBench.

Current #1 · Code Arena

Claude Opus 4 7 Thinking1567 Elo

View leaderboards → →

Agentic Coding

Agentic coding benchmarks

Multi-step code editing and tool use — agentic workflows from LiveBench.

Current #1 · LiveBench Agentic Coding

GPT-5.4 Thinking xHigh Effort70.0 %

View leaderboards → →

Reasoning

Reasoning benchmarks

Logic, deduction, and inference tasks from LiveBench.

Current #1 · LiveBench

GPT-5.5 Thinking xHigh Effort80.7 %

View leaderboards → →

Math

Math benchmarks

Numerical reasoning and mathematical problem solving from LiveBench.

Current #1 · LiveBench Math

GPT-5.5 Thinking xHigh Effort96.3 %

View leaderboards → →

Data Analysis

Data analysis benchmarks

Structured data interpretation, querying, and analysis from LiveBench.

Current #1 · LiveBench Data Analysis

GPT-5.5 Thinking xHigh Effort81.1 %

View leaderboards → →

Language

Language benchmarks

Chat preference rankings (Text Arena Elo) and language comprehension (LiveBench).

Current #1 · Text Arena

Claude Opus 4.6 Thinking1502 Elo

View leaderboards → →

Instruction Following

Instruction following benchmarks

Adherence to formatting constraints and complex instructions from LiveBench.

Current #1 · LiveBench Instruction Following

Gemini 3.1 Pro Preview High79.1 %

View leaderboards → →

/ r/localllama · r/claudeai · r/openai · r/singularity

Community pulse

What r/LocalLLaMA, r/ClaudeAI, r/OpenAI, r/singularity, and more are talking about right now.

No data yet — the crawler hasn't run.

/ Live Benchmarks

Need help choosing the right AI model for your business?

Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.

Get in touch →