AI Safety

Nanyang Technological University Reveals Complete Collapse of AI “Operational Safety” — Simple Disguises Can Fool All Models

AI Safety

Nanyang Technological University Reveals Complete Collapse of AI “Operational Safety” — Simple Disguises Can Fool All Models

Author Information * First Author: Jingdi Lei — PhD student at Nanyang Technological University, focusing on large language models (LLMs), particularly model reasoning, post-training, and alignment. * Corresponding Author: Soujanya Poria — Associate Professor, School of Electrical and Electronic Engineering, Nanyang Technological University. * Other Co-authors: From Walled AI Labs, Singapore’s Infocomm Media Development

By Honghao Wang
Have LLM Jailbreak Threats Been Systematically Overestimated? A New “Decomposition-Based Scoring” Paradigm for Jailbreak Evaluation Released

LLM security

Have LLM Jailbreak Threats Been Systematically Overestimated? A New “Decomposition-Based Scoring” Paradigm for Jailbreak Evaluation Released

# 2025-10-12 12:02 Beijing --- ## JADES Framework: A Transparent, Reliable, and Auditable Standard for Jailbreak Evaluation ![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-84.jpg) ![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-78.jpg) Developed collaboratively by researchers from the **Helmholtz Center for Information

By Honghao Wang
No Matter the Model Size, 250 Toxic Docs Can Take It Down — Anthropic: LLMs Are More Fragile Than You Think

LLM security

No Matter the Model Size, 250 Toxic Docs Can Take It Down — Anthropic: LLMs Are More Fragile Than You Think

From 600M to 13B LLM — Just 250 Documents Can Implant a Backdoor Date: 2025-10-10 11:45 Beijing --- Key Insight Hacking large language models (LLMs) may be far easier than previously believed. Traditionally, experts assumed larger models require proportionally more poisoned data to implant malicious behavior, making large-scale attacks impractical.

By Honghao Wang

automation risks

The Pitfalls of Automation: Unexpected Consequences and Responses in Software Systems

Key Takeaways * Automation can behave in counterintuitive ways during software incidents — sometimes impeding resolution or complicating human intervention. * Strict separation of tasks into “automation” vs. “human” can lead to designs that make incidents harder to resolve. * Overuse of automation can erode human knowledge, skills, and situational awareness. * Joint Cognitive Systems

By Honghao Wang