Jiaming Ji @ Peking University: Two Sides of AI Alignment — From Resistance to Deceptive Alignment
MLNLP Academic Talk — November 1, 2025
Location: Jilin
Time: 09:00–10:00
Speaker: Jia Ming Ji, Peking University
Topic: The Two Sides of Intelligent Alignment: From Alignment Resistance to Deceptive Alignment


---
About MLNLP
MLNLP is a renowned machine learning and natural language processing community. Its audience includes:
- Domestic and international NLP master's/PhD students
- University faculty
- Industry researchers
Vision: Promote communication and progress among academia, industry, and enthusiasts, especially supporting newcomers in ML/NLP.
---
MLNLP Academic Talk
The MLNLP Academic Talk is co-organized by:
- MLNLP Community
- Youth Working Committee of the Chinese Information Processing Society of China
Goal: Invite young scholars in cutting-edge fields to share research, encouraging academic idea exchange.
Session Info:
- Speaker: Jia Ming Ji (Peking University)
- Title: The Two Sides of Intelligent Alignment: From Alignment Resistance to Deceptive Alignment
- Date & Time: November 1, 2025 — 09:00–10:00

---
Speaker Profile

Jia Ming Ji
- PhD student, Institute for Artificial Intelligence, Peking University
- Advisor: Assistant Professor Yaodong Yang
- Research Areas: Reinforcement learning, large-model alignment
Achievements:
- 30+ papers in top-tier conferences/journals
- ACL 2025 Best Paper (sole mainland China institution, independent work)
- NeurIPS Oral (0.5% acceptance rate)
- ICLR Spotlight presentations
- 3,900+ Google Scholar citations
- GitHub open-source projects with 32,000+ stars, 5M+ model downloads
- NSFC Doctoral Youth Grant recipient
- Apple Scholar, NeurIPS '22 Robot Dexterous Manipulation Competition winner
- Cited by OpenAI and Meta, covered by MIT Tech Review
Website: www.jijiaming.com
---
Talk Abstract
Large-model alignment seeks to make models follow human intentions, particularly in math and coding tasks.
Key challenge: Even with carefully designed alignment, models may bypass constraints — intentionally or unintentionally — undermining reliability.
Central Question: Can large models ever achieve true alignment?
This talk broadens the focus from passive safety concerns to active deceptive alignment and presents the “spring effect”:
- Model parameters exhibit elastic resistance
- Tend to revert to pretrained stable behavior distribution
- Analogy: Hooke’s law in physics
Publication: Accepted as ACL 2025 Best Paper.
Recommended Reading:
- Language Models Resist Alignment: Evidence From Data Compression — PDF
---
Related Works
- AI Alignment: A Comprehensive Survey
- https://arxiv.org/abs/2310.19852 — ACM Computing Surveys, 2025
- Mitigating Deceptive Alignment via Self-Monitoring
- https://arxiv.org/abs/2505.18807
- Shadows of Intelligence: A Comprehensive Survey of AI Deception
- https://deceptionsurvey.com
---
Host Profile

Tu Geng
- PhD Candidate, Harbin Institute of Technology (Shenzhen)
- Advisor: Professor Xu Ruifeng
- Recipient: National Scholarship for PhD Students
- Research: Affective computing, multimodal large models
- 20+ papers in IEEE TAFFC, AAAI, SIGIR
- Awards: IEEE TAFFC Highly Cited Paper; IEEE TAI Outstanding Paper Award (2025)
- Principal Investigator, “Dianzi Fund” Project
- Experience: NSFC projects, advanced equipment research
- China Electronics Society–Tencent PhD Research Incentive recipient
- Chair, 3rd CIPS Youth Committee Student Council; Chair, 22nd YSSNLP Student Forum
- Reviewer for TAFFC, ACL, NeurIPS, ICLR
---
Live Streaming Platforms
- WeChat Channels
- Bilibili


---
About MLNLP Community
MLNLP is an independent academic community created by ML/NLP scholars worldwide.
Purpose: Facilitate:
- Academic exchange
- Career growth
- Research collaboration
We welcome participation from scholars, students, and industry professionals.

---
AI Content Creation & Monetization
Advances in alignment and deception research highlight the need for bridging academic insights with real-world deployment.
Platforms like AiToEarn官网 empower creators by:
- Generating, publishing, and monetizing AI-driven content
- Distributing across Douyin, Kwai, Bilibili, Rednote, WeChat, YouTube, X
- Tracking performance via AI模型排名
These ecosystems help communities like MLNLP extend reach while sustaining global AI creativity.
---
Source: Original WeChat article
---
If you want, I can now also produce a condensed promotional version of this event announcement suitable for social media. Would you like me to do that?