Updated 3 hours agoSources:LiveBench GlobalLiveBench Reasoning
/ Live Benchmarks / Reasoning
Reasoning benchmarks
Logic, deduction, and inference tasks from LiveBench.
LiveBench
View original source →| # | Model | Score | Input $/M | Output $/M | Context | CI |
|---|---|---|---|---|---|---|
| 1 | GPT-5.5 Thinking xHigh EffortOpenAI | 80.7% | — | — | — | — |
| 2 | GPT-5.4 Thinking xHigh EffortOpenAI | 80.3% | — | — | — | — |
| 3 | Gemini 3.1 Pro Preview HighGoogle | 79.9% | — | — | — | — |
| 4 | Claude 4.7 Opus Thinking xHigh EffortAnthropic | 76.9% | — | — | — | — |
| 5 | Claude 4.6 Opus Thinking High EffortAnthropic | 76.3% | — | — | — | — |
| 6 | Claude 4.5 Opus Thinking High EffortAnthropic | 76.0% | — | — | — | — |
| 7 | Claude 4.6 Sonnet Thinking Medium EffortAnthropic | 75.5% | — | — | — | — |
| 8 | Gemini 3.5 Flash HighGoogle | 75.0% | — | — | — | — |
| 9 | GPT-5.2 HighOpenAI | 74.8% | — | — | — | — |
| 10 | GPT-5.2 CodexOpenAI | 74.3% | — | — | — | — |
| 11 | Qwen 3.7 MaxAlibaba | 74.3% | — | — | — | — |
| 12 | GPT-5.1 Codex Max HighOpenAI | 74.0% | — | — | — | — |
| 13 | DeepSeek V4 ProDeepSeek | 73.6% | — | — | — | — |
| 14 | Gemini 3 Pro Preview HighGoogle | 73.4% | — | — | — | — |
| 15 | GPT-5.3 Codex HighOpenAI | 72.8% | — | — | — | — |
| 16 | Gemini 3 Flash Preview HighGoogle | 72.4% | — | — | — | — |
| 17 | Kimi K2.6 ThinkingMoonshot AI | 72.2% | — | — | — | — |
| 18 | GPT-5.1 HighOpenAI | 72.0% | — | — | — | — |
| 19 | Qwen 3.6 PlusAlibaba | 70.8% | — | — | — | — |
| 20 | GPT-5 ProOpenAI | 70.5% | — | — | — | — |
| 21 | GLM 5.1Z.AI | 70.2% | — | — | — | — |
| 22 | GPT-5.4 Nano xHighOpenAI | 70.1% | — | — | — | — |
| 23 | Kimi K2.5 ThinkingMoonshot AI | 69.1% | — | — | — | — |
| 24 | GLM 5Z.AI | 68.8% | — | — | — | — |
| 25 | GPT-5.1 CodexOpenAI | 68.6% | — | — | — | — |
| 26 | Claude Sonnet 4.5 ThinkingAnthropic | 68.2% | — | — | — | — |
| 27 | Grok 4.20 BetaxAI | 68.0% | — | — | — | — |
| 28 | GPT-5.4 Mini xHighOpenAI | 67.5% | — | — | — | — |
| 29 | DeepSeek V4 FlashDeepSeek | 67.3% | — | — | — | — |
| 30 | Grok 4.3xAI | 66.7% | — | — | — | — |
| 31 | GPT-5 Mini HighOpenAI | 65.9% | — | — | — | — |
| 32 | Qwen 3.6 27BAlibaba | 65.6% | — | — | — | — |
| 33 | Minimax M2.7Minimax | 63.5% | — | — | — | — |
| 34 | DeepSeek V3.2 ThinkingDeepSeek | 62.2% | — | — | — | — |
| 35 | Grok 4xAI | 62.0% | — | — | — | — |
| 36 | Claude 4.1 Opus ThinkingAnthropic | 61.8% | — | — | — | — |
| 37 | Gemini 3.1 Flash Lite Preview HighGoogle | 61.7% | — | — | — | — |
| 38 | Gemma 4 31BGoogle | 61.6% | — | — | — | — |
| 39 | Kimi K2 ThinkingMoonshot AI | 61.6% | — | — | — | — |
| 40 | Claude Haiku 4.5 ThinkingAnthropic | 61.3% | — | — | — | — |
| 41 | Claude 4 Sonnet ThinkingAnthropic | 61.3% | — | — | — | — |
| 42 | GPT-5.1 Codex MiniOpenAI | 60.4% | — | — | — | — |
| 43 | Qwen 3.6 FlashAlibaba | 60.4% | — | — | — | — |
| 44 | Minimax M2.5Minimax | 60.1% | — | — | — | — |
| 45 | GPT-5.3 InstantOpenAI | 60.0% | — | — | — | — |
| 46 | Grok 4.1 FastxAI | 60.0% | — | — | — | — |
| 47 | Claude 4.5 Opus Medium EffortAnthropic | 59.1% | — | — | — | — |
| 48 | DeepSeek V3.2 Exp ThinkingDeepSeek | 58.9% | — | — | — | — |
| 49 | Gemini 2.5 Pro (Max Thinking)Google | 58.3% | — | — | — | — |
| 50 | MiMo V2 ProXiaomi | 58.1% | — | — | — | — |
| 51 | GLM 4.7Z.AI | 58.1% | — | — | — | — |
| 52 | GLM 4.6Z.AI | 55.2% | — | — | — | — |
| 53 | Claude 4.1 OpusAnthropic | 54.5% | — | — | — | — |
| 54 | Claude Sonnet 4.5Anthropic | 53.7% | — | — | — | — |
| 55 | Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google | 53.1% | — | — | — | — |
| 56 | Qwen 3 235B A22B Thinking 2507Alibaba | 53.0% | — | — | — | — |
| 57 | DeepSeek V3.2DeepSeek | 51.8% | — | — | — | — |
| 58 | Claude 4 SonnetAnthropic | 51.0% | — | — | — | — |
| 59 | Qwen 3 Next 80B A3B ThinkingAlibaba | 50.4% | — | — | — | — |
| 60 | DeepSeek V3.2 ExpDeepSeek | 49.9% | — | — | — | — |
| 61 | GLM 5V TurboZ.AI | 49.6% | — | — | — | — |
| 62 | GPT-5.2 No ThinkingOpenAI | 48.9% | — | — | — | — |
| 63 | Qwen 3 235B A22B Instruct 2507Alibaba | 48.8% | — | — | — | — |
| 64 | GPT-5 Nano HighOpenAI | 48.6% | — | — | — | — |
| 65 | Qwen 3 Next 80B A3B InstructAlibaba | 48.4% | — | — | — | — |
| 66 | Kimi K2 InstructMoonshot AI | 48.1% | — | — | — | — |
| 67 | Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google | 47.7% | — | — | — | — |
| 68 | GPT OSS 120bOpenAI | 46.1% | — | — | — | — |
| 69 | Claude Haiku 4.5Anthropic | 45.3% | — | — | — | — |
| 70 | Grok Code FastxAI | 45.1% | — | — | — | — |
| 71 | Qwen 3 32BAlibaba | 43.6% | — | — | — | — |
| 72 | GPT-5.1 No ThinkingOpenAI | 42.6% | — | — | — | — |
| 73 | Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google | 42.6% | — | — | — | — |
| 74 | Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google | 42.4% | — | — | — | — |
| 75 | Devstral 2Mistral | 41.2% | — | — | — | — |
| 76 | GLM 4.6VZ.AI | 40.1% | — | — | — | — |
| 77 | Grok 4.20 Beta (Non-Reasoning)xAI | 39.7% | — | — | — | — |
| 78 | Qwen 3 30B A3BAlibaba | 39.0% | — | — | — | — |
| 79 | Elephant AlphaOpenRouter | 36.0% | — | — | — | — |
| 80 | Grok 4.1 Fast (Non-Reasoning)xAI | 33.5% | — | — | — | — |
| 81 | Trinity Large PreviewArcee | 32.7% | — | — | — | — |
| 82 | Nemotron 3 Super 120B A12BNVIDIA | 32.5% | — | — | — | — |
LiveBench Reasoning
View original source →| # | Model | Score | Input $/M | Output $/M | Context | CI |
|---|---|---|---|---|---|---|
| 1 | Claude 4.6 Opus Thinking High EffortAnthropic | 88.7% | — | — | — | — |
| 2 | GPT-5.4 Thinking xHigh EffortOpenAI | 88.1% | — | — | — | — |
| 3 | GPT-5.5 Thinking xHigh EffortOpenAI | 87.7% | — | — | — | — |
| 4 | Claude 4.7 Opus Thinking xHigh EffortAnthropic | 87.7% | — | — | — | — |
| 5 | Claude 4.6 Sonnet Thinking Medium EffortAnthropic | 84.8% | — | — | — | — |
| 6 | Gemini 3.1 Pro Preview HighGoogle | 84.0% | — | — | — | — |
| 7 | GPT-5.1 Codex Max HighOpenAI | 83.7% | — | — | — | — |
| 8 | Qwen 3.7 MaxAlibaba | 83.3% | — | — | — | — |
| 9 | GPT-5.2 HighOpenAI | 83.2% | — | — | — | — |
| 10 | DeepSeek V4 ProDeepSeek | 82.7% | — | — | — | — |
| 11 | Gemini 3.5 Flash HighGoogle | 82.0% | — | — | — | — |
| 12 | GPT-5.1 CodexOpenAI | 82.0% | — | — | — | — |
| 13 | GPT-5 ProOpenAI | 81.7% | — | — | — | — |
| 14 | GPT-5.4 Nano xHighOpenAI | 81.0% | — | — | — | — |
| 15 | Grok 4.1 FastxAI | 80.2% | — | — | — | — |
| 16 | GPT-5.3 Codex HighOpenAI | 80.2% | — | — | — | — |
| 17 | Claude 4.5 Opus Thinking High EffortAnthropic | 80.1% | — | — | — | — |
| 18 | Kimi K2.6 ThinkingMoonshot AI | 79.4% | — | — | — | — |
| 19 | Grok 4xAI | 79.1% | — | — | — | — |
| 20 | GPT-5.1 HighOpenAI | 78.8% | — | — | — | — |
| 21 | GPT-5.2 CodexOpenAI | 77.7% | — | — | — | — |
| 22 | Claude Sonnet 4.5 ThinkingAnthropic | 77.6% | — | — | — | — |
| 23 | Gemini 3 Pro Preview HighGoogle | 77.4% | — | — | — | — |
| 24 | DeepSeek V3.2 ThinkingDeepSeek | 77.2% | — | — | — | — |
| 25 | Kimi K2.5 ThinkingMoonshot AI | 76.0% | — | — | — | — |
| 26 | Qwen 3.6 PlusAlibaba | 75.8% | — | — | — | — |
| 27 | Grok 4.20 BetaxAI | 75.3% | — | — | — | — |
| 28 | Minimax M2.7Minimax | 74.8% | — | — | — | — |
| 29 | Gemini 3 Flash Preview HighGoogle | 74.5% | — | — | — | — |
| 30 | GLM 5.1Z.AI | 72.5% | — | — | — | — |
| 31 | GPT-5.4 Mini xHighOpenAI | 72.5% | — | — | — | — |
| 32 | Claude 4.1 Opus ThinkingAnthropic | 72.3% | — | — | — | — |
| 33 | Grok 4.3xAI | 70.8% | — | — | — | — |
| 34 | Gemini 2.5 Pro (Max Thinking)Google | 70.8% | — | — | — | — |
| 35 | DeepSeek V4 FlashDeepSeek | 70.6% | — | — | — | — |
| 36 | Qwen 3.6 27BAlibaba | 70.3% | — | — | — | — |
| 37 | MiMo V2 ProXiaomi | 69.7% | — | — | — | — |
| 38 | GLM 5Z.AI | 69.1% | — | — | — | — |
| 39 | Claude 4 Sonnet ThinkingAnthropic | 69.0% | — | — | — | — |
| 40 | GPT-5 Mini HighOpenAI | 68.3% | — | — | — | — |
| 41 | GPT-5.1 Codex MiniOpenAI | 64.7% | — | — | — | — |
| 42 | DeepSeek V3.2 Exp ThinkingDeepSeek | 64.4% | — | — | — | — |
| 43 | Kimi K2 ThinkingMoonshot AI | 63.5% | — | — | — | — |
| 44 | GPT-5.3 InstantOpenAI | 63.1% | — | — | — | — |
| 45 | Qwen 3.6 FlashAlibaba | 62.9% | — | — | — | — |
| 46 | GLM 4.6Z.AI | 62.1% | — | — | — | — |
| 47 | Claude Haiku 4.5 ThinkingAnthropic | 61.7% | — | — | — | — |
| 48 | GLM 4.7Z.AI | 59.7% | — | — | — | — |
| 49 | Gemini 3.1 Flash Lite Preview HighGoogle | 59.7% | — | — | — | — |
| 50 | Gemma 4 31BGoogle | 59.4% | — | — | — | — |
| 51 | Qwen 3 235B A22B Thinking 2507Alibaba | 59.4% | — | — | — | — |
| 52 | Minimax M2.5Minimax | 59.3% | — | — | — | — |
| 53 | Qwen 3 235B A22B Instruct 2507Alibaba | 58.4% | — | — | — | — |
| 54 | Qwen 3 Next 80B A3B ThinkingAlibaba | 58.2% | — | — | — | — |
| 55 | GLM 5V TurboZ.AI | 56.1% | — | — | — | — |
| 56 | Qwen 3 Next 80B A3B InstructAlibaba | 54.8% | — | — | — | — |
| 57 | Claude 4.5 Opus Medium EffortAnthropic | 53.2% | — | — | — | — |
| 58 | Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google | 51.5% | — | — | — | — |
| 59 | Qwen 3 32BAlibaba | 48.3% | — | — | — | — |
| 60 | DeepSeek V3.2 ExpDeepSeek | 45.5% | — | — | — | — |
| 61 | Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google | 44.6% | — | — | — | — |
| 62 | DeepSeek V3.2DeepSeek | 44.3% | — | — | — | — |
| 63 | Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google | 43.3% | — | — | — | — |
| 64 | GPT-5.2 No ThinkingOpenAI | 42.8% | — | — | — | — |
| 65 | Grok Code FastxAI | 42.3% | — | — | — | — |
| 66 | Claude Sonnet 4.5Anthropic | 42.3% | — | — | — | — |
| 67 | Kimi K2 InstructMoonshot AI | 42.2% | — | — | — | — |
| 68 | Claude 4.1 OpusAnthropic | 40.9% | — | — | — | — |
| 69 | GPT-5 Nano HighOpenAI | 40.3% | — | — | — | — |
| 70 | Elephant AlphaOpenRouter | 40.0% | — | — | — | — |
| 71 | Claude 4 SonnetAnthropic | 39.7% | — | — | — | — |
| 72 | GPT OSS 120bOpenAI | 39.2% | — | — | — | — |
| 73 | GLM 4.6VZ.AI | 37.2% | — | — | — | — |
| 74 | Qwen 3 30B A3BAlibaba | 36.7% | — | — | — | — |
| 75 | Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google | 36.2% | — | — | — | — |
| 76 | Nemotron 3 Super 120B A12BNVIDIA | 34.4% | — | — | — | — |
| 77 | Claude Haiku 4.5Anthropic | 33.9% | — | — | — | — |
| 78 | Devstral 2Mistral | 27.7% | — | — | — | — |
| 79 | GPT-5.1 No ThinkingOpenAI | 26.8% | — | — | — | — |
| 80 | Grok 4.20 Beta (Non-Reasoning)xAI | 25.6% | — | — | — | — |
| 81 | Grok 4.1 Fast (Non-Reasoning)xAI | 23.4% | — | — | — | — |
| 82 | Trinity Large PreviewArcee | 20.6% | — | — | — | — |
/ Live Benchmarks
Need help choosing the right AI model for your business?
Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.