Skip to content

Updated 5 hours agoSources:LiveBench Data Analysis

/ Live Benchmarks / Data Analysis

Data analysis benchmarks

Structured data interpretation, querying, and analysis from LiveBench.

LiveBench Data Analysis

View original source →
#ModelScore
1
GPT-5.5 Thinking xHigh EffortOpenAI
81.1%
2
GPT-5.4 Thinking xHigh EffortOpenAI
79.3%
3
Gemini 3.1 Pro Preview HighGoogle
78.5%
4
Claude 4.7 Opus Thinking xHigh EffortAnthropic
78.3%
5
GPT-5.2 CodexOpenAI
78.2%
6
GPT-5.2 HighOpenAI
78.2%
7
Claude 4.6 Sonnet Thinking Medium EffortAnthropic
78.0%
8
Gemini 3 Flash Preview HighGoogle
74.8%
9
DeepSeek V4 ProDeepSeek
74.5%
10
Claude 4.5 Opus Thinking High EffortAnthropic
74.4%
11
Gemini 3 Pro Preview HighGoogle
74.4%
12
Qwen 3.7 MaxAlibaba
71.8%
13
GPT-5.4 Mini xHighOpenAI
71.0%
14
Qwen 3.6 27BAlibaba
70.4%
15
GPT-5.1 Codex Max HighOpenAI
70.1%
16
Qwen 3.6 PlusAlibaba
69.9%
17
Claude 4.6 Opus Thinking High EffortAnthropic
69.9%
18
GPT-5.1 HighOpenAI
69.6%
19
DeepSeek V4 FlashDeepSeek
68.0%
20
GLM 5Z.AI
67.9%
21
GPT-5.4 Nano xHighOpenAI
67.6%
22
Kimi K2.6 ThinkingMoonshot AI
65.1%
23
Gemini 3.5 Flash HighGoogle
64.9%
24
Grok 4xAI
63.4%
25
GLM 5.1Z.AI
63.2%
26
Grok 4.20 BetaxAI
62.9%
27
GPT-5.3 Codex HighOpenAI
62.7%
28
Kimi K2.5 ThinkingMoonshot AI
61.4%
29
Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google
61.0%
30
GPT-5.1 CodexOpenAI
60.8%
31
Claude Haiku 4.5 ThinkingAnthropic
59.3%
32
Qwen 3.6 FlashAlibaba
58.8%
33
Gemma 4 31BGoogle
58.8%
34
GPT-5 ProOpenAI
57.0%
35
Claude Sonnet 4.5 ThinkingAnthropic
57.0%
36
Minimax M2.7Minimax
56.3%
37
Grok 4.3xAI
55.8%
38
GPT-5 Mini HighOpenAI
55.2%
39
GLM 4.7Z.AI
55.2%
40
Gemini 3.1 Flash Lite Preview HighGoogle
54.9%
41
Claude 4 Sonnet ThinkingAnthropic
54.6%
42
GLM 5V TurboZ.AI
54.1%
43
Qwen 3 Next 80B A3B ThinkingAlibaba
53.6%
44
Kimi K2 ThinkingMoonshot AI
52.3%
45
Grok 4.1 FastxAI
52.2%
46
Qwen 3 235B A22B Thinking 2507Alibaba
52.2%
47
GLM 4.6Z.AI
52.0%
48
Gemini 2.5 Pro (Max Thinking)Google
51.6%
49
DeepSeek V3.2 Exp ThinkingDeepSeek
51.5%
50
DeepSeek V3.2 ThinkingDeepSeek
50.0%
51
Qwen 3 Next 80B A3B InstructAlibaba
49.8%
52
GPT-5.1 Codex MiniOpenAI
49.7%
53
Minimax M2.5Minimax
49.6%
54
MiMo V2 ProXiaomi
49.2%
55
Grok Code FastxAI
49.0%
56
Claude 4.1 Opus ThinkingAnthropic
49.0%
57
GPT-5.3 InstantOpenAI
48.0%
58
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google
47.9%
59
GPT-5.2 No ThinkingOpenAI
47.7%
60
Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google
47.3%
61
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google
47.0%
62
Claude Sonnet 4.5Anthropic
47.0%
63
Qwen 3 32BAlibaba
46.5%
64
GLM 4.6VZ.AI
46.4%
65
Claude 4.5 Opus Medium EffortAnthropic
45.5%
66
Claude 4.1 OpusAnthropic
45.4%
67
Claude Haiku 4.5Anthropic
45.1%
68
DeepSeek V3.2DeepSeek
45.0%
69
Qwen 3 30B A3BAlibaba
44.9%
70
Qwen 3 235B A22B Instruct 2507Alibaba
44.7%
71
DeepSeek V3.2 ExpDeepSeek
44.3%
72
Claude 4 SonnetAnthropic
44.1%
73
GPT-5.1 No ThinkingOpenAI
44.1%
74
Grok 4.20 Beta (Non-Reasoning)xAI
43.5%
75
GPT-5 Nano HighOpenAI
43.4%
76
Kimi K2 InstructMoonshot AI
43.3%
77
Grok 4.1 Fast (Non-Reasoning)xAI
40.6%
78
Trinity Large PreviewArcee
40.3%
79
Devstral 2Mistral
39.1%
80
GPT OSS 120bOpenAI
38.8%
81
Elephant AlphaOpenRouter
38.5%
82
Nemotron 3 Super 120B A12BNVIDIA
21.2%

/ Live Benchmarks

Need help choosing the right AI model for your business?

Benchmarks are a starting point, not an answer. The right model depends on your workload, budget, and integration constraints — let's figure it out together.