AI Safety - aitoearn

ChatGPT

Over a Million People Discussing Suicide with ChatGPT Weekly, OpenAI Issues Urgent "Life-Saving" Update

ChatGPT and Mental Health: A Growing Global Concern At 3 a.m., a user types into the ChatGPT dialogue box: "I can’t hold on any longer." Seconds later, the AI responds: "Thank you for telling me. You’re not alone. Would you like me to help

ChatGPT

Over 1 Million People Discuss Suicide with ChatGPT Weekly — OpenAI Issues Urgent "Life-Saving" Update

ChatGPT and Mental Health: What OpenAI’s New Data Reveals At 3 a.m., a user typed into ChatGPT: > “I can’t hold on any longer.” Seconds later, the AI replied: > “Thank you for being willing to tell me. You are not alone. Would you like me to

prompt injection

Quotes from Bruce Schneier and Barath Raghavan

Prompt Injection: An Intractable Challenge in Persistent-Memory LLMs > Prompt injection may be fundamentally unsolvable in today’s LLMs. > LLMs process sequences of tokens, but there is no mechanism to mark tokens with privilege levels. Every proposed solution opens up new injection vectors: > - Delimiter? Attackers simply include

LLM security

OpenAI, Anthropic, and DeepMind Joint Statement: Current LLM Safety Defenses Are Fragile

2025-10-19 · Jilin Rethinking AI Safety Evaluation We may have been using the wrong approach to evaluate LLM safety. > Key Insight: This study tested 12 defense methods — almost all failed. It’s truly rare to see OpenAI, Anthropic, and Google DeepMind — three major competitors — jointly publish a paper on security

AI Safety

Nanyang Technological University Reveals Complete Collapse of AI “Operational Safety” — Simple Disguises Can Fool All Models

Author Information * First Author: Jingdi Lei — PhD student at Nanyang Technological University, focusing on large language models (LLMs), particularly model reasoning, post-training, and alignment. * Corresponding Author: Soujanya Poria — Associate Professor, School of Electrical and Electronic Engineering, Nanyang Technological University. * Other Co-authors: From Walled AI Labs, Singapore’s Infocomm Media Development

ChatGPT

ChatGPT’s Adult Mode Is Coming, But As an Adult I’m Not Excited at All

OpenAI to Launch Adult Mode in ChatGPT This December Early this morning, OpenAI CEO Sam Altman announced that ChatGPT will debut an “Adult Mode” in December. --- Why Now? Altman explained that the initial heavy restrictions on ChatGPT were due to concerns over mental health risks and potential negative incidents.

AI Safety

The “Patch the Bug” Approach Is Obsolete in the AI Era… So Why Do Bosses Still Think It Can Be Fixed?

# Understanding Why Traditional Software Logic Fails in AI Safety > **Editor’s Note (CSDN)** > While the public worries about AI bugs and risks, many business leaders remain strikingly calm. Are they armed with a hidden technical advantage — or is our understanding of AI risk fundamentally flawed? This article explains

ChatGPT Adult Mode

ChatGPT’s Adult Mode Is Coming, But As an Adult I’m Not Excited at All

OpenAI Announces ChatGPT "Adult Mode" Launch in December Early this morning, OpenAI CEO Sam Altman revealed that ChatGPT will introduce an “Adult Mode” in December. --- Why Adult Mode Is Coming Altman explained that early versions of ChatGPT included heavy restrictions to reduce mental health risks and avoid

LLM security

OpenAI, Anthropic, and DeepMind Joint Statement: Current LLM Safety Defenses are Inadequate

Machine Heart Report > This study tested 12 LLM defense methods — most failed against adaptive attacks. It is rare to see OpenAI, Anthropic, and Google DeepMind — three fierce competitors — co-author a paper on evaluating security defenses for large language models (LLMs). Apparently, when LLM safety is at stake, rivalry can

LLM security

Have LLM Jailbreak Threats Been Systematically Overestimated? A New “Decomposition-Based Scoring” Paradigm for Jailbreak Evaluation Released

# 2025-10-12 12:02 Beijing --- ## JADES Framework: A Transparent, Reliable, and Auditable Standard for Jailbreak Evaluation ![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-84.jpg) ![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-78.jpg) Developed collaboratively by researchers from the **Helmholtz Center for Information

LLM security

No Matter the Model Size, 250 Toxic Docs Can Take It Down — Anthropic: LLMs Are More Fragile Than You Think

From 600M to 13B LLM — Just 250 Documents Can Implant a Backdoor Date: 2025-10-10 11:45 Beijing --- Key Insight Hacking large language models (LLMs) may be far easier than previously believed. Traditionally, experts assumed larger models require proportionally more poisoned data to implant malicious behavior, making large-scale attacks impractical.

automation risks

The Pitfalls of Automation: Unexpected Consequences and Responses in Software Systems

Key Takeaways * Automation can behave in counterintuitive ways during software incidents — sometimes impeding resolution or complicating human intervention. * Strict separation of tasks into “automation” vs. “human” can lead to designs that make incidents harder to resolve. * Overuse of automation can erode human knowledge, skills, and situational awareness. * Joint Cognitive Systems