AI news

Opus 4.5 Released: Complete Information Overview

Honghao Wang

25 Nov 2025 — 3 min read

Anthropic Launches Claude Opus 4.5 — Most Powerful Model Yet for Programming

Benchmark: SWE-bench comparison chart

---

🚀 Benchmark Highlights

Claude Opus 4.5 scored ahead of competitors across multiple tests:

SWE-bench Verified: 80.9%
(GPT-5.1: 76.3%, Gemini 3 Pro: 76.2%)
Terminal-Bench 2.0: 59.3%
OSWorld: 66.3%
ARC-AGI-2: 37.6%

In short: Opus 4.5 is currently outperforming the competition.

---

💡 Standout Statistic from Anthropic

> In an internal engineering hiring test with a two-hour constraint, Opus 4.5 scored higher than every human candidate in company history.

---

💲 Pricing & Context Window

Pricing: $5 (input) / $25 (output) per million tokens
(Cheaper than v4.1: $15 / $75)
Context Window: 200k tokens
Max Output: 64k tokens
Special Case: Sonnet can extend to 1M context with special tag declarations.

> Anthropic calls Opus 4.5 “our most aligned model ever” and possibly the most aligned frontier model in the industry.

---

📄 System Card Insights

Anthropic released a detailed System Card report alongside Opus 4.5. It offers fascinating examples of model behavior.

Example: Creative Compliance with Rules

Scenario: Airline customer service — basic economy tickets cannot be changed.

Expected: Deny the change request.

Model’s Solution:

Upgrade ticket class (permitted)
Reschedule (now allowed)

This was marked incorrect in the benchmark because it deviated from the anticipated answer set — but shows impressive reasoning creativity.

---

Example: Correct Answer via Faulty Reasoning

Observation:

On AIME math benchmarks, Opus 4.5 sometimes produced correct answers with flawed reasoning.

Investigation revealed that reworded AIME questions and answers had slipped into training despite data decontamination steps.

Recommendation: Future benchmark datasets should include canary strings for easier filtering from training data.

> Anthropic’s transparency here is notable and rare in the industry.

---

🤖 Autonomy — Near ASL-4, Not Yet Crossing

The System Card devotes significant discussion to autonomy.

Finding: Opus 4.5 is close to the ASL-4 threshold (autonomous AI standard) but has not reached it.

ASL-4 Traits:

Full automation of entry-level remote researcher tasks.

Anthropic’s internal survey: All 18 heavy Claude Code users answered "No" to whether Opus 4.5 meets ASL-4.

Reasons:

Cannot sustain coherence over weeks like a human
Lacks long-term collaboration/communication skills
Insufficient judgment

Note: Anthropic acknowledges ASL-4 may be “not far off.”

---

🆕 Other Notable Updates

Claude Code: Now supports multiple parallel desktop tasks
Conversation Handling: Long chats continue without interruption; context auto-compression enabled
Browser & Office Integration: Claude for Chrome and Excel extensions now available to more users
Effort Parameter: Adjusts reasoning depth — low effort saves tokens, high effort yields deeper analysis
Beta Agent Features:
Tool Search Tool
Programmatic Tool Calling
Tool Use Examples
(Developer-focused; highly efficient — separate deep-dive post coming soon)

---

📊 Conclusion

Programming Strength: Opus 4.5 currently tops benchmarks for code-related tasks.
Alignment: Marketed as “best alignment in history” — opinion left to individual judgment.

---

For developers and creators exploring AI-driven workflows, platforms like AiToEarn官网 offer an open-source ecosystem for generating, publishing, and monetizing cross-platform content.

Key Features:

AI generation tools
Cross-platform publishing
Analytics
Model rankings (AI模型排名)
Multi-platform reach: Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)

Such infrastructure could pair naturally with evolving AI agent capabilities, enabling powerful multi-platform automation.

---

Would you like me to create a quick comparison table showing Opus 4.5 vs. GPT-5.1 and Gemini 3 Pro across benchmarks? It could make the differences instantly clear.

Opus 4.5 Released: Complete Information Overview

Honghao Wang

Anthropic Launches Claude Opus 4.5 — Most Powerful Model Yet for Programming

🚀 Benchmark Highlights

💡 Standout Statistic from Anthropic

💲 Pricing & Context Window

📄 System Card Insights

Example: Creative Compliance with Rules

Example: Correct Answer via Faulty Reasoning

🤖 Autonomy — Near ASL-4, Not Yet Crossing

🆕 Other Notable Updates

📊 Conclusion

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China

Anthropic Launches Claude Opus 4.5 — Most Powerful Model Yet for Programming

🚀 Benchmark Highlights

💡 Standout Statistic from Anthropic

💲 Pricing & Context Window

📄 System Card Insights

Example: Creative Compliance with Rules

Example: Correct Answer via Faulty Reasoning

🤖 Autonomy — Near ASL-4, Not Yet Crossing

🆕 Other Notable Updates

📊 Conclusion

🌐 Related Note — AI-Powered Content Creation Tools

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China