LLM security

Have LLM Jailbreak Threats Been Systematically Overestimated? A New “Decomposition-Based Scoring” Paradigm for Jailbreak Evaluation Released

LLM security

Have LLM Jailbreak Threats Been Systematically Overestimated? A New “Decomposition-Based Scoring” Paradigm for Jailbreak Evaluation Released

# 2025-10-12 12:02 Beijing --- ## JADES Framework: A Transparent, Reliable, and Auditable Standard for Jailbreak Evaluation ![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-84.jpg) ![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-78.jpg) Developed collaboratively by researchers from the **Helmholtz Center for Information

By Honghao Wang
No Matter the Model Size, 250 Toxic Docs Can Take It Down — Anthropic: LLMs Are More Fragile Than You Think

LLM security

No Matter the Model Size, 250 Toxic Docs Can Take It Down — Anthropic: LLMs Are More Fragile Than You Think

From 600M to 13B LLM — Just 250 Documents Can Implant a Backdoor Date: 2025-10-10 11:45 Beijing --- Key Insight Hacking large language models (LLMs) may be far easier than previously believed. Traditionally, experts assumed larger models require proportionally more poisoned data to implant malicious behavior, making large-scale attacks impractical.

By Honghao Wang