Skip to content

Daten aktualisiert vor 3 StundenQuellen:LiveBench Math

Live Benchmarks / Mathematik

Mathematik-Benchmarks

Numerisches Reasoning und mathematische Problemlösung aus LiveBench.

RankModelScore
1
GPT-5.4 Thinking xHigh EffortOpenAI
94.2%
2
GPT-5.2 HighOpenAI
93.2%
3
GPT-5.4 Nano xHighOpenAI
91.3%
4
Gemini 3.1 Pro Preview HighGoogle
91.0%
5
Claude 4.5 Opus Thinking High EffortAnthropic
90.4%
6
Claude 4.6 Opus Thinking High EffortAnthropic
89.3%
7
GPT-5.2 CodexOpenAI
88.8%
8
GPT-5.3 Codex HighOpenAI
87.8%
9
Grok 4.20 BetaxAI
87.1%
10
Claude 4.6 Sonnet Thinking Medium EffortAnthropic
87.0%
11
GPT-5.1 HighOpenAI
86.9%
12
GPT-5 ProOpenAI
86.2%
13
DeepSeek V3.2 ThinkingDeepSeek
85.0%
14
GLM 5.1Z.AI
84.9%
15
Kimi K2.5 ThinkingMoonshot AI
84.9%
16
Gemini 3 Flash Preview HighGoogle
84.2%
17
Qwen 3.6 PlusAlibaba
83.7%
18
Grok 4.1 FastxAI
83.7%
19
GLM 5Z.AI
83.5%
20
GPT-5.1 Codex Max HighOpenAI
83.2%
21
Grok 4xAI
83.0%
22
DeepSeek V3.2 Exp ThinkingDeepSeek
82.4%
23
GPT-5 Mini HighOpenAI
82.2%
24
Gemini 3 Pro Preview HighGoogle
81.8%
25
GLM 4.6Z.AI
81.1%
26
Kimi K2 ThinkingMoonshot AI
81.1%
27
Minimax M2.7Minimax
80.5%
28
GPT-5.1 CodexOpenAI
79.6%
29
Claude Sonnet 4.5 ThinkingAnthropic
79.3%
30
GPT-5.4 Mini xHighOpenAI
78.6%
31
Claude Haiku 4.5 ThinkingAnthropic
77.5%
32
Minimax M2.5Minimax
77.4%
33
MiMo V2 ProXiaomi
77.0%
34
GPT-5.1 Codex MiniOpenAI
76.3%
35
GLM 4.7Z.AI
76.0%
36
Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google
75.3%
37
Qwen 3 Next 80B A3B ThinkingAlibaba
74.3%
38
Gemma 4 31BGoogle
73.9%
39
Gemini 3.1 Flash Lite Preview HighGoogle
73.6%
40
Qwen 3 235B A22B Thinking 2507Alibaba
73.4%
41
Claude 4.1 Opus ThinkingAnthropic
73.2%
42
GPT-5.3 InstantOpenAI
72.4%
43
Claude 4 Sonnet ThinkingAnthropic
70.5%
44
GLM 5V TurboZ.AI
70.4%
45
Qwen 3 Next 80B A3B InstructAlibaba
70.2%
46
GPT OSS 120bOpenAI
68.9%
47
Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google
68.8%
48
GPT-5 Nano HighOpenAI
68.4%
49
Gemini 2.5 Pro (Max Thinking)Google
68.3%
50
Qwen 3 235B A22B Instruct 2507Alibaba
68.0%
51
Qwen 3 32BAlibaba
67.4%
52
Claude 4.5 Opus Medium EffortAnthropic
66.3%
53
Qwen 3 30B A3BAlibaba
65.3%
54
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google
64.9%
55
DeepSeek V3.2 ExpDeepSeek
64.4%
56
DeepSeek V3.2DeepSeek
64.0%
57
Claude 4.1 OpusAnthropic
62.8%
58
Claude Sonnet 4.5Anthropic
62.6%
59
GLM 4.6VZ.AI
62.5%
60
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google
61.0%
61
Claude 4 SonnetAnthropic
60.4%
62
GPT-5.2 No ThinkingOpenAI
58.3%
63
Kimi K2 InstructMoonshot AI
58.1%
64
Claude Haiku 4.5Anthropic
58.0%
65
Grok Code FastxAI
56.0%
66
Devstral 2Mistral
52.5%
67
Grok 4.20 Beta (Non-Reasoning)xAI
45.5%
68
Trinity Large PreviewArcee
44.9%
69
GPT-5.1 No ThinkingOpenAI
44.5%
70
Grok 4.1 Fast (Non-Reasoning)xAI
38.9%
71
Nemotron 3 Super 120B A12BNVIDIA
36.4%

Verwandte Diskussion

Community-Puls

Brauchen Sie Hilfe bei der Auswahl des richtigen KI-Modells?

Benchmarks sind ein Ausgangspunkt, keine Antwort. Das richtige Modell hängt von Ihrem Workload, Budget und Ihren Integrations-Anforderungen ab – lassen Sie es uns gemeinsam herausfinden.