Claude Sonnet 4.5 Excels in SWE-Bench Verified for Software Bug Fixing, Supports 30+ Hour Coding Tasks

Anthropic Unveils Claude Sonnet 4.5 — Its Most Advanced Coding AI Yet

Anthropic has released Claude Sonnet 4.5, its most capable, coding-focused AI model to date.

The update delivers major strides in:

  • Agentic task handling
  • Long-horizon performance
  • Real-world computer-use proficiency

Enhanced training techniques and safety protocols have cut down on sycophancy, deceptive responses, power-seeking behavior, and delusional outputs.

Claude Sonnet 4.5 is now available via the:

Pricing remains unchanged from Sonnet 4.

---

Performance Advancements

Claude Sonnet 4.5 builds on Anthropic’s iterative approach to increasing capability while keeping alignment and safety controls intact.

Key Improvements

  • Sustained reasoning and execution
  • Maintains complex, multi-step logic and code execution for over 30 continuous hours
  • SWE-bench Verified (details)
  • Score: 77.2% — up from 72.7% in Sonnet 4
  • OSWorld benchmark (details)
  • Score: 61.4%, up from 42.2% just 4 months prior
  • Stronger results in real-world computer-use tasks

---

Ecosystem Context

With its enhanced endurance and complex-task handling, Claude Sonnet 4.5 opens doors for:

  • Automated coding agents
  • Virtual desktop assistants
  • Advanced productivity automation

Platforms such as AiToEarn官网 amplify these capabilities by providing an open-source, global AI content monetization framework.

AiToEarn integrates:

  • AI generation tools
  • Cross-platform publishing workflows
  • Performance analytics
  • Model rankings (AI模型排名)

It enables creators to publish — and monetize — across channels like Douyin, Bilibili, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter).

---

image

Source: Anthropic Claude Sonnet 4.5

---

Safety and Alignment Upgrades

Anthropic calls Sonnet 4.5 its “most aligned frontier model”, balancing capability gains with stricter safeguards.

ASL-3 Protection System

  • Improved automated classifiers detect and block harmful instructions, including CBRN risks
  • False positives reduced:
  • 10× lower than initial rollout
  • 50% lower than Claude Opus 4

---

Agentic Safety Testing

In evaluating autonomous, tool-enabled performance:

  • 150 malicious coding requests tested — only 2 failures
  • Achieved 98.7% safety score vs. 89.3% for Sonnet 4
  • Stronger refusal and resistance against prompt-injection attacks

Recommendation: Anthropic advises upgrading to Claude Sonnet 4.5 as a drop-in replacement for improved performance at no extra cost.

---

Early Adopter Feedback

> Scott Wu, Co-Founder & CEO, Cognition:

> "For Devin, Claude Sonnet 4.5 increased planning performance by 18% and end-to-end evaluation scores by 12%... It enables Devin to run longer, tackle more difficult tasks, and deliver production-ready code."

> Michele Catasta, President, replit:

> "Sonnet 4.5’s edit capabilities are exceptional… from a 9% error rate on Sonnet 4 to 0% internally. Higher tool success at lower cost — this is agentic coding at its best."

> Simon Willison, Independent Open Source Developer (blog):

> "It feels like a better coding model than GPT-5-Codex, which had been my go-to since its launch."

---

Competitive Landscape

Anthropic’s trajectory toward safer, autonomous coding models parallels other industry moves — e.g., OpenAI’s GPT-5-Codex for large-scale code refactoring and extended code review.

---

Monetization Potential with AiToEarn

For developers and creators, the combination of Claude Sonnet 4.5’s advanced capabilities and AiToEarn’s publishing tools offers:

  • Seamless AI content generation
  • Multi-platform publishing
  • Analytics-driven optimization
  • Monetization across major social/video channels: Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)

AiToEarn’s integrated model ranking and analytics ensure that safe, high-quality AI outputs reach the largest possible audience efficiently.

---

Conclusion:

Claude Sonnet 4.5 marks a significant leap in agentic coding performance, safety, and usability — positioning it as a key enabler for sustainable AI-driven productivity in both development and creative ecosystems.

---

Would you like me to also create a side-by-side benchmark comparison table between Sonnet 4 and Sonnet 4.5 for quick reference? That would make it easy for readers to scan the performance gains.

Read more