AI news

Musk Quietly Releases Grok 4.1, Dominates All Leaderboards in the Large Model Arena

Honghao Wang

18 Nov 2025 — 4 min read

Grok 4.1: Elon Musk’s AI Leap to the Top of the Leaderboard

Just now, Elon Musk released Grok 4.1, simultaneously taking first and second place in the Large Model Arena rankings.

---

🚀 How Did They Pull It Off?

Grok 4.1 Thinking Mode leads the chart with an Elo score of 1483, topping non-xAI models by +31 points.
Grok 4.1 Non-Thinking Mode secures second place with 1465 points — outperforming the full reasoning modes of all other public leaderboard models.

Just months earlier, the previous Grok 4 ranked only 33rd. In less than half a year, xAI has made a massive leap forward.

---

Dominating New “Expert” & “Professional” Rankings

In the newly added Expert and Professional leaderboards, Grok 4.1 Thinking Mode also dominates.

Expert Leaderboard

Contains questions expected to be asked only by top specialists in their fields.

Professional Leaderboard

Split into eight subcategories:

Software & IT Services
Writing, Literature & Language
Life Sciences, Physical Sciences & Social Sciences
Entertainment, Sports & Media
Business, Management & Financial Operations
Mathematics, Legal & Government
Healthcare

---

Performance Summary:

Grok 4.1 ranks #1 in six of eight categories.
Only loses to Gemini 2.5 in Literature, and to Claude 4.5 and o3 in Mathematics.

> Note: Scores are still labeled Preliminary due to low vote counts. Reliability will improve as more votes are collected.

---

Strong Emotional Intelligence: EQ-Bench Test

In EQ-Bench (an emotional intelligence assessment by LLMs), Grok 4.1 outperforms Kimi K2 (non-Thinking version).

EQ-Bench Measures:

Proactive social awareness
Emotional comprehension
Insight & empathy
Interpersonal skills

---

Quiet Rollout & User Preference Testing

Grok 4.1 was tested silently starting November 1, with gradual rollout via blind A/B trials.

64.78% of users preferred the new model.

xAI’s official site now offers side-by-side comparisons between Grok 4.1 and earlier versions.

---

Example Comparisons

Emotional Response

Creative Writing

---

Technical Improvements

According to xAI’s technical report, Grok 4.1 delivers:

Better creativity
Enhanced emotional engagement
Improved collaborative interaction
Subtle intent detection
Consistent personality maintenance
Retained intelligence and reliability of Grok 4

Reinforcement Learning at New Scale

Dust Tran, head of post-training at xAI, explained:

> Our small team rebuilt reinforcement learning algorithms using real user conversation data, combined with scoring from strong reasoning reward models.

> We scaled RL by an order of magnitude, far beyond Grok 4’s pre-training scale.

---

Fast-Response Mode & Hallucination Reduction

Fast-response mode skips chain-of-thought reasoning:
Avg. token count drops from ~2,300 to ~850.
Special post-training focus on reducing factual hallucinations in information retrieval prompts.

Measured Improvements:

FActScore test (500 biography questions) shows clear gains in non-reasoning mode vs. Grok 4.

---

Why Grok 4.1 Matters

In the rapidly evolving AI landscape, Grok 4.1 shows how:

Advancements in RLHF
Emotional intelligence tuning
Personality alignment

can quickly boost real-world performance.

For creators aiming to stay competitive, platforms like AiToEarn官网 provide:

AI-powered content generation
Instant multi-platform publishing
Monetization & analytics tracking
Model ranking insights

---

Grok 4.1 can produce rich image + text responses:

---

Availability

Grok 4.1 is available:

On grok.com
X platform
iOS & Android apps

> Tip: It launches in automatic mode but can be manually selected in the model picker.

---

References

https://x.ai/news/grok-4-1
https://x.com/arena/status/1990530984014676155
https://x.com/dustinvtran/status/1990532663258853720

---

For AI Creators

Platforms like AiToEarn官网 help leverage the multi-modal power of Grok 4.1:

Generate AI content
Publish across major platforms
Track analytics & monetize outputs effectively

---

Do you want me to also create a summary leaderboard table for Grok 4.1’s wins versus competitors? That could make the rankings section even clearer.

Musk Quietly Releases Grok 4.1, Dominates All Leaderboards in the Large Model Arena

Honghao Wang

Grok 4.1: Elon Musk’s AI Leap to the Top of the Leaderboard

🚀 How Did They Pull It Off?

Dominating New “Expert” & “Professional” Rankings

Expert Leaderboard

Professional Leaderboard

Strong Emotional Intelligence: EQ-Bench Test

EQ-Bench Measures:

Quiet Rollout & User Preference Testing

Example Comparisons

Emotional Response

Creative Writing

Technical Improvements

Reinforcement Learning at New Scale

Fast-Response Mode & Hallucination Reduction

Measured Improvements:

Why Grok 4.1 Matters

Availability

References

For AI Creators

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China

Grok 4.1: Elon Musk’s AI Leap to the Top of the Leaderboard

🚀 How Did They Pull It Off?

Dominating New “Expert” & “Professional” Rankings

Expert Leaderboard

Professional Leaderboard

Strong Emotional Intelligence: EQ-Bench Test

EQ-Bench Measures:

Quiet Rollout & User Preference Testing

Example Comparisons

Emotional Response

Creative Writing

Technical Improvements

Reinforcement Learning at New Scale

Fast-Response Mode & Hallucination Reduction

Measured Improvements:

Why Grok 4.1 Matters

Multi-Modal Output Capability

Availability

References

For AI Creators

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China