Claude Sonnet

Claude Sonnet 4.5 Excels in SWE-Bench Verified for Software Bug Fixing, Supports 30+ Hour Coding Tasks

Honghao Wang

12 Oct 2025 — 3 min read

Anthropic Unveils Claude Sonnet 4.5 — Its Most Advanced Coding AI Yet

Anthropic has released Claude Sonnet 4.5, its most capable, coding-focused AI model to date.

The update delivers major strides in:

Agentic task handling
Long-horizon performance
Real-world computer-use proficiency

Enhanced training techniques and safety protocols have cut down on sycophancy, deceptive responses, power-seeking behavior, and delusional outputs.

Claude Sonnet 4.5 is now available via the:

Pricing remains unchanged from Sonnet 4.

---

Performance Advancements

Claude Sonnet 4.5 builds on Anthropic’s iterative approach to increasing capability while keeping alignment and safety controls intact.

Key Improvements

Sustained reasoning and execution
Maintains complex, multi-step logic and code execution for over 30 continuous hours
SWE-bench Verified (details)
Score: 77.2% — up from 72.7% in Sonnet 4
OSWorld benchmark (details)
Score: 61.4%, up from 42.2% just 4 months prior
Stronger results in real-world computer-use tasks

---

Ecosystem Context

With its enhanced endurance and complex-task handling, Claude Sonnet 4.5 opens doors for:

Automated coding agents
Virtual desktop assistants
Advanced productivity automation

Platforms such as AiToEarn官网 amplify these capabilities by providing an open-source, global AI content monetization framework.

AiToEarn integrates:

AI generation tools
Cross-platform publishing workflows
Performance analytics
Model rankings (AI模型排名)

It enables creators to publish — and monetize — across channels like Douyin, Bilibili, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter).

---

Source: Anthropic Claude Sonnet 4.5

---

Safety and Alignment Upgrades

Anthropic calls Sonnet 4.5 its “most aligned frontier model”, balancing capability gains with stricter safeguards.

ASL-3 Protection System

Improved automated classifiers detect and block harmful instructions, including CBRN risks
False positives reduced:
10× lower than initial rollout
50% lower than Claude Opus 4

---

Agentic Safety Testing

In evaluating autonomous, tool-enabled performance:

150 malicious coding requests tested — only 2 failures
Achieved 98.7% safety score vs. 89.3% for Sonnet 4
Stronger refusal and resistance against prompt-injection attacks

Recommendation: Anthropic advises upgrading to Claude Sonnet 4.5 as a drop-in replacement for improved performance at no extra cost.

---

Early Adopter Feedback

> Scott Wu, Co-Founder & CEO, Cognition:

> "For Devin, Claude Sonnet 4.5 increased planning performance by 18% and end-to-end evaluation scores by 12%... It enables Devin to run longer, tackle more difficult tasks, and deliver production-ready code."

> Michele Catasta, President, replit:

> "Sonnet 4.5’s edit capabilities are exceptional… from a 9% error rate on Sonnet 4 to 0% internally. Higher tool success at lower cost — this is agentic coding at its best."

> Simon Willison, Independent Open Source Developer (blog):

> "It feels like a better coding model than GPT-5-Codex, which had been my go-to since its launch."

---

Competitive Landscape

Anthropic’s trajectory toward safer, autonomous coding models parallels other industry moves — e.g., OpenAI’s GPT-5-Codex for large-scale code refactoring and extended code review.

---

Monetization Potential with AiToEarn

For developers and creators, the combination of Claude Sonnet 4.5’s advanced capabilities and AiToEarn’s publishing tools offers:

Seamless AI content generation
Multi-platform publishing
Analytics-driven optimization
Monetization across major social/video channels: Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)

AiToEarn’s integrated model ranking and analytics ensure that safe, high-quality AI outputs reach the largest possible audience efficiently.

---

Conclusion:

Claude Sonnet 4.5 marks a significant leap in agentic coding performance, safety, and usability — positioning it as a key enabler for sustainable AI-driven productivity in both development and creative ecosystems.

---

Would you like me to also create a side-by-side benchmark comparison table between Sonnet 4 and Sonnet 4.5 for quick reference? That would make it easy for readers to scan the performance gains.