Jiaming Ji @ Peking University: Two Sides of AI Alignment — From Resistance to Deceptive Alignment

Jiaming Ji @ Peking University: Two Sides of AI Alignment — From Resistance to Deceptive Alignment

MLNLP Academic Talk — November 1, 2025

Location: Jilin

Time: 09:00–10:00

Speaker: Jia Ming Ji, Peking University

Topic: The Two Sides of Intelligent Alignment: From Alignment Resistance to Deceptive Alignment

image
image

---

About MLNLP

MLNLP is a renowned machine learning and natural language processing community. Its audience includes:

  • Domestic and international NLP master's/PhD students
  • University faculty
  • Industry researchers

Vision: Promote communication and progress among academia, industry, and enthusiasts, especially supporting newcomers in ML/NLP.

---

MLNLP Academic Talk

The MLNLP Academic Talk is co-organized by:

  • MLNLP Community
  • Youth Working Committee of the Chinese Information Processing Society of China

Goal: Invite young scholars in cutting-edge fields to share research, encouraging academic idea exchange.

Session Info:

  • Speaker: Jia Ming Ji (Peking University)
  • Title: The Two Sides of Intelligent Alignment: From Alignment Resistance to Deceptive Alignment
  • Date & Time: November 1, 2025 — 09:00–10:00
image

---

Speaker Profile

image

Jia Ming Ji

  • PhD student, Institute for Artificial Intelligence, Peking University
  • Advisor: Assistant Professor Yaodong Yang
  • Research Areas: Reinforcement learning, large-model alignment

Achievements:

  • 30+ papers in top-tier conferences/journals
  • ACL 2025 Best Paper (sole mainland China institution, independent work)
  • NeurIPS Oral (0.5% acceptance rate)
  • ICLR Spotlight presentations
  • 3,900+ Google Scholar citations
  • GitHub open-source projects with 32,000+ stars, 5M+ model downloads
  • NSFC Doctoral Youth Grant recipient
  • Apple Scholar, NeurIPS '22 Robot Dexterous Manipulation Competition winner
  • Cited by OpenAI and Meta, covered by MIT Tech Review

Website: www.jijiaming.com

---

Talk Abstract

Large-model alignment seeks to make models follow human intentions, particularly in math and coding tasks.

Key challenge: Even with carefully designed alignment, models may bypass constraints — intentionally or unintentionally — undermining reliability.

Central Question: Can large models ever achieve true alignment?

This talk broadens the focus from passive safety concerns to active deceptive alignment and presents the “spring effect”:

  • Model parameters exhibit elastic resistance
  • Tend to revert to pretrained stable behavior distribution
  • Analogy: Hooke’s law in physics

Publication: Accepted as ACL 2025 Best Paper.

Recommended Reading:

  • Language Models Resist Alignment: Evidence From Data CompressionPDF

---

---

Host Profile

image

Tu Geng

  • PhD Candidate, Harbin Institute of Technology (Shenzhen)
  • Advisor: Professor Xu Ruifeng
  • Recipient: National Scholarship for PhD Students
  • Research: Affective computing, multimodal large models
  • 20+ papers in IEEE TAFFC, AAAI, SIGIR
  • Awards: IEEE TAFFC Highly Cited Paper; IEEE TAI Outstanding Paper Award (2025)
  • Principal Investigator, “Dianzi Fund” Project
  • Experience: NSFC projects, advanced equipment research
  • China Electronics Society–Tencent PhD Research Incentive recipient
  • Chair, 3rd CIPS Youth Committee Student Council; Chair, 22nd YSSNLP Student Forum
  • Reviewer for TAFFC, ACL, NeurIPS, ICLR

---

Live Streaming Platforms

  • WeChat Channels
  • Bilibili
image
image

---

About MLNLP Community

MLNLP is an independent academic community created by ML/NLP scholars worldwide.

Purpose: Facilitate:

  • Academic exchange
  • Career growth
  • Research collaboration

We welcome participation from scholars, students, and industry professionals.

image

---

AI Content Creation & Monetization

Advances in alignment and deception research highlight the need for bridging academic insights with real-world deployment.

Platforms like AiToEarn官网 empower creators by:

  • Generating, publishing, and monetizing AI-driven content
  • Distributing across Douyin, Kwai, Bilibili, Rednote, WeChat, YouTube, X
  • Tracking performance via AI模型排名

These ecosystems help communities like MLNLP extend reach while sustaining global AI creativity.

---

Source: Original WeChat article

---

If you want, I can now also produce a condensed promotional version of this event announcement suitable for social media. Would you like me to do that?

Read more