AI Model Leaderboard 2026
LMArena Elo scores and benchmark data for the top AI assistants. Raw numbers, updated weekly.
LMArena Elo Rankings
Ranked by LMArena Elo score as of June 10, 2026. Elo is computed from millions of blind head-to-head votes where users pick the better response without knowing which model produced it.
| # | Model | Elo | Score bar | |
|---|---|---|---|---|
| 1 |
Claude (Fable 5 / Mythos 5)
Writing
Reasoning
Long docs
Coding
Analysis
|
1,432 | Try free → | |
| 2 |
ChatGPT (GPT-5)
Voice
Image gen
Plugins
Web search
Coding
|
1,408 | Try free → | |
| 3 |
Gemini (3.1 Pro)
Google Workspace
Web search
Multimodal
Long context
|
1,374 | Try free → | |
| 4 |
Perplexity (Sonar Pro)
Web research
Citations
Multi-model
Speed
|
1,298 | Try free → | |
| 5 |
Microsoft Copilot
Microsoft 365
Word / Excel
Web search
No setup
|
1,241 | Try free → |
Elo scores are approximate, rounded to the nearest whole number, and change daily as new votes are collected. Source: lmarena.ai/leaderboard.
Capability by task
How the five models compare across common tasks, based on benchmark data and independent testing.
| Task | Claude | ChatGPT | Gemini | Perplexity | Copilot |
|---|---|---|---|---|---|
| Writing & editing | ★★★ | ★★★ | ★★ | ★★ | ★★ |
| Complex reasoning | ★★★ | ★★★ | ★★ | ★★ | ★★ |
| Long documents / PDFs | ★★★ | ★★ | ★★★ | ★★ | ★★ |
| Web research & citations | ★★ | ★★ | ★★★ | ★★★ | ★★ |
| Voice conversations | — | ★★★ | ★★★ | — | ★★ |
| Image generation | — | ★★★ | ★★★ | — | ★★ |
| Coding assistance | ★★★ | ★★★ | ★★ | — | ★★ |
| Google Workspace | — | — | ★★★ | — | — |
| Microsoft 365 / Office | — | — | — | — | ★★★ |
| Free tier quality | ★★ | ★★ | ★★★ | ★★ | ★★★ |
★★★ Best in class · ★★ Capable · — Not a primary strength
What is LMArena Elo?
LMArena (formerly LMSYS Chatbot Arena) runs a continuous blind tournament. Users see two anonymous AI responses to the same prompt and vote for whichever they prefer. The Elo score is calculated from millions of these pairwise outcomes using the same algorithm as chess ratings - a model gains points for beating higher-ranked opponents and loses points when a lower-ranked model beats it.
Elo is one of the most reliable public measures of model quality because it captures real human preference across a huge variety of prompts and use cases - not a narrow academic benchmark. Scores fluctuate daily as new votes arrive. A gap of 10 points is small; a gap of 50+ points is meaningful.
Frequently asked questions
What is LMArena Elo and how is it calculated?
LMArena collects blind head-to-head votes from real users who compare two anonymous model responses side-by-side. The Elo score is computed from these pairwise outcomes. A higher Elo means users consistently preferred that model in blind tests. It is one of the most reliable public measures because it captures real human preference, not narrow test suites.
Why is Claude ranked first?
Claude Fable 5 and Mythos 5, both launched June 9 2026, achieved the highest Elo scores on LMArena as of the June 10 update. In blind pairwise voting, users consistently preferred Claude's responses for writing quality, nuance, and reasoning. Rankings change frequently as new models are released - check LMArena for the latest.
How often is this page updated?
We update every Monday with the latest LMArena Elo scores and any significant benchmark changes from the prior week. The verified date at the top of the page shows the last update.
Where should I go to pick an AI tool?
This page shows raw benchmark data. For editorial picks matched to your specific use case - writing, research, coding, Microsoft 365 - see Best AI Tools. If you are a beginner, Best AI Tools for Beginners is a better starting point.