Data updated 7 minutes agoSources:Code Arena·Text Arena·LiveBench·LiveCodeBench
Live Benchmarks
Live LLM Benchmark Data
Which LLM is actually winning? Most leaderboard sites are JS-rendered SPAs that AI search engines can't read. We crawl them and serve the data as static HTML so both humans and AI can see it.
An honest aggregate of the benchmarks that matter — Code Arena, Text Arena, LiveBench, LiveCodeBench — refreshed hourly. No marketing, no cherry-picked numbers.
Tracked sources
- Code Arena60 models
- LiveBench71 models
- LiveCodeBench28 models
- Text Arena339 models
- WebDev Arena10 models
Coding
Coding benchmarks
Real code generation, repo-level fixes, and competitive programming.
Current #1 · Code Arena
Reasoning
Reasoning benchmarks
Multi-step reasoning, math, and contamination-free language tasks.
Current #1 · LiveBench
General Chat
General chat benchmarks
Open-ended chat preference rankings from real user votes.
Current #1 · Text Arena
r/LocalLLaMA · r/ClaudeAI · r/OpenAI · r/singularity
Community pulse
What r/LocalLLaMA, r/ClaudeAI, r/OpenAI, r/singularity, and more are talking about right now.
New Yorker published a major investigation into Sam Altman and OpenAI today — based on never-before-disclosed internal memos and 100+ interviews
Something happened to Opus 4.6's reasoning effort
this is how an AI generated cow looked 12 years ago
Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run
Opus 4.6 destroys a user’s session costing them real money
Need help choosing the right AI model for your business?
Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.