Skip to content

Daten aktualisiert vor 3 StundenQuellen:LiveBench Data Analysis

Live Benchmarks / Datenanalyse

Datenanalyse-Benchmarks

Strukturierte Dateninterpretation, Abfragen und Analyse aus LiveBench.

LiveBench Data Analysis

Originalquelle ansehen →
RankModelScore
1
GPT-5.4 Thinking xHigh EffortOpenAI
79.3%
2
Gemini 3.1 Pro Preview HighGoogle
78.5%
3
GPT-5.2 CodexOpenAI
78.2%
4
GPT-5.2 HighOpenAI
78.2%
5
Claude 4.6 Sonnet Thinking Medium EffortAnthropic
78.0%
6
Gemini 3 Flash Preview HighGoogle
74.8%
7
Claude 4.5 Opus Thinking High EffortAnthropic
74.4%
8
Gemini 3 Pro Preview HighGoogle
74.4%
9
GPT-5.4 Mini xHighOpenAI
71.0%
10
GPT-5.1 Codex Max HighOpenAI
70.1%
11
Qwen 3.6 PlusAlibaba
69.9%
12
Claude 4.6 Opus Thinking High EffortAnthropic
69.9%
13
GPT-5.1 HighOpenAI
69.6%
14
GLM 5Z.AI
67.9%
15
GPT-5.4 Nano xHighOpenAI
67.6%
16
Grok 4xAI
63.4%
17
GLM 5.1Z.AI
63.2%
18
Grok 4.20 BetaxAI
62.9%
19
GPT-5.3 Codex HighOpenAI
62.7%
20
Kimi K2.5 ThinkingMoonshot AI
61.4%
21
Gemini 2.5 Flash (Max Thinking) (2025-09-25)Google
61.0%
22
GPT-5.1 CodexOpenAI
60.8%
23
Claude Haiku 4.5 ThinkingAnthropic
59.3%
24
Gemma 4 31BGoogle
58.8%
25
GPT-5 ProOpenAI
57.0%
26
Claude Sonnet 4.5 ThinkingAnthropic
57.0%
27
Minimax M2.7Minimax
56.3%
28
GPT-5 Mini HighOpenAI
55.2%
29
GLM 4.7Z.AI
55.2%
30
Gemini 3.1 Flash Lite Preview HighGoogle
54.9%
31
Claude 4 Sonnet ThinkingAnthropic
54.6%
32
GLM 5V TurboZ.AI
54.1%
33
Qwen 3 Next 80B A3B ThinkingAlibaba
53.6%
34
Kimi K2 ThinkingMoonshot AI
52.3%
35
Grok 4.1 FastxAI
52.2%
36
Qwen 3 235B A22B Thinking 2507Alibaba
52.2%
37
GLM 4.6Z.AI
52.0%
38
Gemini 2.5 Pro (Max Thinking)Google
51.6%
39
DeepSeek V3.2 Exp ThinkingDeepSeek
51.5%
40
DeepSeek V3.2 ThinkingDeepSeek
50.0%
41
Qwen 3 Next 80B A3B InstructAlibaba
49.8%
42
GPT-5.1 Codex MiniOpenAI
49.7%
43
Minimax M2.5Minimax
49.6%
44
MiMo V2 ProXiaomi
49.2%
45
Grok Code FastxAI
49.0%
46
Claude 4.1 Opus ThinkingAnthropic
49.0%
47
GPT-5.3 InstantOpenAI
48.0%
48
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)Google
47.9%
49
GPT-5.2 No ThinkingOpenAI
47.7%
50
Gemini 2.5 Flash (Max Thinking) (2025-06-05)Google
47.3%
51
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)Google
47.0%
52
Claude Sonnet 4.5Anthropic
47.0%
53
Qwen 3 32BAlibaba
46.5%
54
GLM 4.6VZ.AI
46.4%
55
Claude 4.5 Opus Medium EffortAnthropic
45.5%
56
Claude 4.1 OpusAnthropic
45.4%
57
Claude Haiku 4.5Anthropic
45.1%
58
DeepSeek V3.2DeepSeek
45.0%
59
Qwen 3 30B A3BAlibaba
44.9%
60
Qwen 3 235B A22B Instruct 2507Alibaba
44.7%
61
DeepSeek V3.2 ExpDeepSeek
44.3%
62
Claude 4 SonnetAnthropic
44.1%
63
GPT-5.1 No ThinkingOpenAI
44.1%
64
Grok 4.20 Beta (Non-Reasoning)xAI
43.5%
65
GPT-5 Nano HighOpenAI
43.4%
66
Kimi K2 InstructMoonshot AI
43.3%
67
Grok 4.1 Fast (Non-Reasoning)xAI
40.6%
68
Trinity Large PreviewArcee
40.3%
69
Devstral 2Mistral
39.1%
70
GPT OSS 120bOpenAI
38.8%
71
Nemotron 3 Super 120B A12BNVIDIA
21.2%

Verwandte Diskussion

Community-Puls

Brauchen Sie Hilfe bei der Auswahl des richtigen KI-Modells?

Benchmarks sind ein Ausgangspunkt, keine Antwort. Das richtige Modell hängt von Ihrem Workload, Budget und Ihren Integrations-Anforderungen ab – lassen Sie es uns gemeinsam herausfinden.