AI news

Claude Opus 4.5 Released! Superhuman in 2-Hour Engineering Tests, Easily Handles Tasks Sonnet Struggled With

Honghao Wang

25 Nov 2025 — 3 min read

# Claude Opus 4.5 — Setting a New Benchmark in AI Engineering

**Two hours of intensive engineering tasks — the model scored higher than every human participant.**

---

## 🚀 Introduction

Just announced: **Claude Opus 4.5** — focused on **coding**, **agent capabilities**, and **computer use**.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_001-142.png)

Opus 4.5 delivers **major upgrades** in front-end development, visual reasoning, and computer interaction skills.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_002-124.png)

It also offers **enhancements across daily productivity tasks**, from deep research to PPT creation and advanced spreadsheet handling.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_003-19.gif)

---

## 📊 Real-World Task Examples

### Financial Analysis Automation
Using a provided template, Opus 4.5:
- Parsed template structure
- Gathered peer data
- Built valuation tables
- **Directly output an Excel file**

![image](https://blog.aitoearn.ai/content/images/2025/11/img_004-9.gif)

### Legal Document Editing
Opus 4.5:
- Unpacked legal templates
- Updated company names
- Verified signature blocks
- Produced a **Word document with tracked changes**

![image](https://blog.aitoearn.ai/content/images/2025/11/img_005-17.gif)

---

## 🧠 Core Strength — “Understanding”

Internal testing shows:
- **Bug fixes missed by Sonnet models**
- Ability to **think before acting**

![image](https://blog.aitoearn.ai/content/images/2025/11/img_006-73.png)

---

## 🌐 Availability & Pricing

- **Accessible via:** Claude app, API, major cloud platforms
- **Claude API name:** `claude-opus-4-5-20251101`
- **Pricing:** $5/million tokens (input) · $25/million tokens (output)

Updates include Claude Code, Claude App, Excel, Chrome, and desktop integrations.

---

## 🔍 Autonomous Decision-Making

Claude Opus 4.5 can:
- Navigate **ambiguous scenarios**
- Make **balanced complex decisions**
- Independently identify and fix cross-system vulnerabilities

---

## 💯 Outperforming Humans

In a highly challenging **performance engineering take-home exam**:
- **Opus 4.5 scored higher than every human candidate**
- Tasks were designed to assess **technical skill** & **judgment under time pressure**

---

## 📈 Benchmark Results

### Vision, Reasoning & Math
Ranks among industry’s elite.

![image](https://blog.aitoearn.ai/content/images/2025/11/img_007-74.png)

### Coding Performance
- **SWE-bench multilingual**: 1st place in 7/8 programming languages  
![image](https://blog.aitoearn.ai/content/images/2025/11/img_008-52.png)
- **Aider Polyglot**: +10.6% improvement over Sonnet 4.5  
![image](https://blog.aitoearn.ai/content/images/2025/11/img_009-43.png)

### Agent-Based Search
![image](https://blog.aitoearn.ai/content/images/2025/11/img_010-30.png)

### Long Task Endurance
- **Vending-Bench**: 29% higher yield than Sonnet 4.5  
![image](https://blog.aitoearn.ai/content/images/2025/11/img_011-33.png)

---

## ⚖️ Beyond Benchmarking

Some Claude solutions exceed traditional benchmark scope, occasionally marked as “failed” despite being **better than expected**.

---

## 🌟 Example — Creative Problem Solving

In **τ2-bench**:
- Task: Act as airline agent, unable to change basic economy seat
- **Opus 4.5 workaround**: Upgrade seat class → change flight  
![image](https://blog.aitoearn.ai/content/images/2025/11/img_012-26.png)

---

## 🔐 Safety Enhancements

Enhanced protection against:
- Prompt injection attacks
- Similar security threats

![image](https://blog.aitoearn.ai/content/images/2025/11/img_013-21.png)  
![image](https://blog.aitoearn.ai/content/images/2025/11/img_014-20.png)

---

## 🎯 New API “Effort Parameter”

Choose between:
- Minimizing time/cost  
- Maximizing model performance

Results:
- **Medium effort**: Matched Sonnet 4.5’s best SWE-bench Verified score with 76% fewer tokens
- **Max effort**: +4.3 percentage points over Sonnet 4.5 with 48% fewer tokens

![image](https://blog.aitoearn.ai/content/images/2025/11/img_015-15.png)

---

## 🔗 Advanced Capabilities

- **Effort control**
- **Context compression**
- **Advanced tool calls**
- Long runtimes & multi-task handling with less human intervention

![image](https://blog.aitoearn.ai/content/images/2025/11/img_016-14.png)

Supports **multi-agent systems**, with +15% gain in deep research team evaluations.

---

## 🛠 Claude Code Updates

**New Features:**
- **Plan Mode**: Confirms requirements, generates editable `plan.md` before execution
- **Desktop App**: Run multiple local/remote sessions in parallel

![image](https://blog.aitoearn.ai/content/images/2025/11/img_017-7.gif)

---

## 📱 Claude App & Extensions

- **Unlimited chat continuation** with automatic summarization
- **Chrome extension** with cross-tab task processing for **Max** users  
![image](https://blog.aitoearn.ai/content/images/2025/11/img_018-11.png)

---

## 📊 Claude for Excel

- Expanded test access for **Max, Team, Enterprise** users  
![image](https://blog.aitoearn.ai/content/images/2025/11/img_019-7.png)

---

## ⚡ Usage Limits

- Opus quota lifted for eligible users
- Increased usage limits overall, Opus tokens similar to Sonnet usage

---

**Official blog:** [Anthropic — Claude Opus 4.5](https://www.anthropic.com/news/claude-opus-4-5)  
**Reference:** [Claude AI Announcement](https://x.com/claudeai/status/1993030546243699119?s=20)

---

## 💡 Integrating Opus 4.5 in Content Workflows

Platforms like **[AiToEarn](https://aitoearn.ai/)**:
- **Open-source AI content monetization framework**
- Publish across Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter
- Tools for **publishing, analytics, and monetization**

🔗 Resources:
- [AiToEarn博客](https://blog.aitoearn.ai)
- [AiToEarn文档](https://docs.aitoearn.ai)
- [AI模型排名](https://rank.aitoearn.ai)

Claude Opus 4.5 Released! Superhuman in 2-Hour Engineering Tests, Easily Handles Tasks Sonnet Struggled With

Honghao Wang

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China