LLM security - aitoearn

prompt injection

Quotes from Bruce Schneier and Barath Raghavan

Prompt Injection: An Intractable Challenge in Persistent-Memory LLMs > Prompt injection may be fundamentally unsolvable in today’s LLMs. > LLMs process sequences of tokens, but there is no mechanism to mark tokens with privilege levels. Every proposed solution opens up new injection vectors: > - Delimiter? Attackers simply include

LLM security

OpenAI, Anthropic, and DeepMind Joint Statement: Current LLM Safety Defenses Are Fragile

2025-10-19 · Jilin Rethinking AI Safety Evaluation We may have been using the wrong approach to evaluate LLM safety. > Key Insight: This study tested 12 defense methods — almost all failed. It’s truly rare to see OpenAI, Anthropic, and Google DeepMind — three major competitors — jointly publish a paper on security

LLM security

OpenAI, Anthropic, and DeepMind Joint Statement: Current LLM Safety Defenses are Inadequate

Machine Heart Report > This study tested 12 LLM defense methods — most failed against adaptive attacks. It is rare to see OpenAI, Anthropic, and Google DeepMind — three fierce competitors — co-author a paper on evaluating security defenses for large language models (LLMs). Apparently, when LLM safety is at stake, rivalry can

LLM security

Have LLM Jailbreak Threats Been Systematically Overestimated? A New “Decomposition-Based Scoring” Paradigm for Jailbreak Evaluation Released

# 2025-10-12 12:02 Beijing --- ## JADES Framework: A Transparent, Reliable, and Auditable Standard for Jailbreak Evaluation ![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-84.jpg) ![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-78.jpg) Developed collaboratively by researchers from the **Helmholtz Center for Information

LLM security

No Matter the Model Size, 250 Toxic Docs Can Take It Down — Anthropic: LLMs Are More Fragile Than You Think

From 600M to 13B LLM — Just 250 Documents Can Implant a Backdoor Date: 2025-10-10 11:45 Beijing --- Key Insight Hacking large language models (LLMs) may be far easier than previously believed. Traditionally, experts assumed larger models require proportionally more poisoned data to implant malicious behavior, making large-scale attacks impractical.