Yang Zhilin and the Kimi Team Respond Late at Night: All Controversies After K2 Thinking Went Viral

Yang Zhilin and the Kimi Team Respond Late at Night: All Controversies After K2 Thinking Went Viral

2025-11-11 — Beijing Update

image

Last week, Moonshot AI released and open-sourced the enhanced version of Kimi K2

image

Kimi K2 Thinking, positioned as a “model as Agent”, immediately generated strong buzz in the AI community.

Hugging Face co-founder Thomas Wolf wondered aloud:

> "Is this another DeepSeek moment?"

image

---

First AMA: Founders Respond

Early this morning, Zhilin Yang, Xinyu Zhou, and Yuxin Wu held a content-heavy AMA on Reddit, tackling direct and sometimes sharp questions from the community.

image

From left to right: Zhilin Yang, Xinyu Zhou, Yuxin Wu

Key Takeaways

  • KDA Attention Mechanism will continue into Kimi K3.
  • The “$4.6M training cost” is unofficial; actual cost is hard to quantify.
  • A Vision-Language Model (VL) is already under development.
  • Progress has been made in reducing output slop, but it’s a persistent challenge for LLMs.

---

Performance Recap: K2 Thinking’s Benchmarks

K2 Thinking scored exceptionally well in global benchmarks:

  • HLE: Humanity’s Last Exam — extremely challenging AI benchmark.
  • BrowseComp: Tests online reasoning with information retrieval and comprehension.
  • AIME25: Mathematics reasoning — matches GPT-5 and Claude 4.5, far ahead of DeepSeek V3.2.
image

Source: datacamp

---

Inside the KDA Mechanism

Within Kimi K2, the KDA (Kimi Delta Attention) replaces traditional full-attention with incremental updates + gating.

Benefits:

  • Fixes MoE problems like poor long-context consistency.
  • Cuts KV cache and memory requirements.
  • Acts like a high-performance attention engine in Transformers.

Zhilin Yang:

> “Related ideas will very likely be applied in K3.”

Xinyu Zhou:

> “We are exploring further improvements to KDA.”

---

K3 Timeline

Yang joked about K3’s release schedule:

> “Before Sam Altman’s trillion-dollar data center is completed.”

(That project would take at least 8 years—if it happens at all.)

---

Speed vs Accuracy

Some testers found K2 Thinking’s reasoning was smarter than GPT‑5 but much slower.

Yang explained:

  • K2 Thinking’s long-chain reasoning is deliberate.
  • Token efficiency improvements are in progress.

Trade-off: Prioritize depth of thought now, improve speed over time.

---

Creative Writing & "Slop" Problem

User feedback praised K2’s natural tone but noted:

  • Occasional verbosity, repetition, uneven rhythm.
  • Emotional sanitization reduces “human tension.”

Yang:

> “We have made progress in reducing slop. It's a long-standing LLM challenge.”

> “Reducing censorship and artificial positivity is possible; we will explore further.”

---

K2 Thinking: Moonshot AI’s Turnaround

During industry anticipation for DeepSeek R2, Moonshot AI launched the strongest open-source reasoning model to date.

Distinguishing Factors

  • Agent-first design.
  • Upgrades in reasoning, search, coding, and writing.
  • Native INT4, ultra-sparse MoE, Test-Time Scaling.
  • 200–300 autonomous tool calls without human intervention.

---

Test-Time Scaling Strategy

  • More thinking tokens → deeper long-form reasoning.
  • More tool invocation rounds → greater practical task coverage.

Results:

  • Qualitative leap in reasoning depth.
  • Coherent multi-step problem solving maintained over hundreds of reasoning iterations.
  • Top scores in HLE, BrowseComp, and SWE-Bench.

---

Coding Capability: Agent-Level Development

Benchmarks:

  • SWE-Multilingual: 61.1%
  • SWE-Bench Verified: 71.3%
  • Terminal-Bench: 47.1%

K2 Thinking moves beyond code completion to full end-to-end engineering tasks:

  • Understand requirements.
  • Debug and verify code.
  • Autonomously call tools and configure environments.

---

Intelligent Search & Browsing

K2 Thinking cycles through:

Thinking → Searching → Reading → Rethinking.

Benefits:

  • Handles ill-defined problems effectively.
  • Converges on accurate answers via iterative retrieval and reasoning.

---

Writing & General Capabilities

  • Creative writing: Structures complex ideas with natural pacing.
  • Academic assistance: Maintains logical rigor in research tasks.
  • Conversation: Balances factual insight with emotional nuance.

---

Engineering Innovations

Key decisions:

  • INT4 Quantization (instead of FP8) for extreme compression.
  • Quantization-Aware Training (QAT) ensures stability in low-bit environments.
  • Weight quantization applied only to MoE components for a speed/performance trade-off.

Result

  • ~2× inference speed.
  • Lower memory usage.
  • No accuracy loss — all benchmarks run at native INT4 precision.

---

KDA: Kimi Delta Attention

Solves quadratic complexity in long-context tasks:

  • Incremental computation updates only changed parts.
  • KV cache reduction of ~75%.
  • Gating ensures semantic continuity across MoE experts.

MoE makes it bigger, KDA makes it steadier.

---

AMA & References

Portal: Reddit AMA

Reference Links:

---

2025 Technology Annual Review — Vote Now!

image

Vote for three areas you most want reviewed. Share suggestions and trends you’re curious about in the comments!

---

AICon 2025 — Dec 19–20 Beijing

Topics: AI Agents, Context Engineering, AI Product Innovation, and more.

Special Offer: 10% discount for the final AI event of 2025.

image

---

image

---

📎 Read the original article

📎 Open in WeChat

---

> Tip for Creators: Tools like AiToEarn官网 offer open-source AI content generation, multi-platform publishing, and monetization — enabling creators to leverage models like K2 Thinking across Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter).

---

Would you like me to simplify this further into a 1-page briefing format so it’s easier to share internally? That could make it more actionable for your team.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.