We Asked “Overnight Q&A” Yang Zhilin: $4.6M and Kimi’s Text-Only Path

We Asked “Overnight Q&A” Yang Zhilin: $4.6M and Kimi’s Text-Only Path

Moonshot AI’s AMA: Inside the Minds Behind Kimi K2 Thinking

image

In recent months, the AI landscape has been buzzing about Moonshot AI and its latest breakthrough — the Kimi K2 Thinking model. Known for exceptional reasoning and coding capabilities, K2 Thinking has surpassed many state‑of‑the‑art models, earning recognition from developers worldwide.

To capitalize on this momentum, the Kimi team hosted an AMA (Ask Me Anything) session on Reddit — an unusual move for a Chinese AI company, offering rare direct interaction with the global developer community.

---

Who Joined the AMA?

The AMA featured Moonshot AI’s three co‑founders:

  • Yang Zhilin
  • Zhou Xinyu
  • Wu Yuxin

Yang Zhilin participated under the Reddit username ComfortableAsk4494, answering multiple technical and strategic questions from the community.

---

Key Topic 1 — Training Cost & Pure Text Strategy

image

Q: Is the rumored $4.6M training cost of K2 Thinking real?

> Not an official figure. Since much of the cost comes from ongoing R&D and experimentation, it’s difficult to pin down an exact number.

image

Q: Why focus on pure text instead of multimodal from the start?

> Training a vision‑language model takes time to source data and adjust training. We prioritized launching a text‑only model first.

---

What is AMA in Tech Communities?

AMA is a well‑known Reddit tradition used by figures like Barack Obama, Keanu Reeves, and tech leaders from OpenAI including Sam Altman. It’s a candid, community‑driven Q&A format.

Kimi’s AMA took place in r/LocalLLaMA, a hub for open‑source and locally deployed AI models — popular among hardcore developers.

---

Summary of AMA Highlights

  • Training Cost: $4.6M — not official.
  • Training Hardware: H800 GPUs with InfiniBand.
  • K3 Architecture: Will use a hybrid approach.
  • INT4 Precision: Chosen for GPU compatibility over fp4.
  • Multimodal Plans: In progress.
  • Muon Optimizer: Scales to 1 trillion parameters.
  • K3 Launch: “Before Sam’s trillion‑scale datacenter is built.”

---

Selected AMA Exchanges

Hardware & Scaling

image

Q: How does your hardware compare to US giants?

> H800 GPUs with InfiniBand. While not the most powerful globally, we maximize each card’s capacity.

---

image

Q: Will Kimi’s next model adopt KDA technology?

> Yes — KDA hybrid (NoPE MLA) outperforms full RoPE MLA in benchmarks, speed, and cost efficiency. We’ll share further improvements soon.

---

Release Timeline

image

Q: When will K3 be launched?

> After Sam’s trillion‑scale datacenter is built.

---

Optimizer Choice

image

Q: Why use an untested optimizer like Muon?

> We tested it at all scaling stages and eliminated dozens of less effective alternatives. We trust our research process.

---

INT4 vs FP4

image

Q: Does fp4 offer advantages over int4?

> We chose int4 for better compatibility with non‑Blackwell GPUs, using the Marlin kernel (GitHub link).

---

Token Efficiency

image

Q: K2 uses many tokens — will this improve?

> Future models will integrate efficiency into the reward mechanism.

---

Development Challenges

image

Q: Biggest challenge in building K2 Thinking?

> Supporting “think–tool–think–tool” cycles in LLMs — technically complex and new.

---

Architectural Breakthroughs

image

Q: Next big LLM architecture leap?

> Kimi Linear model shows promise, possibly with sparsity techniques.

---

Industry & Competition

image

Q: Why is OpenAI spending so much?

> No idea. Only Sam knows. We have our own approach.

---

Closed‑Source Potential

image

Q: Will Kimi release heavier closed‑source models?

> Only if it becomes too risky.

---

Beyond Kimi: Broader AI Ecosystem & Monetization

AMA formats foster direct engagement between AI leaders and developers.

Similarly, open‑source monetization platforms like AiToEarn官网 give creators tools to:

  • Generate AI‑driven content
  • Publish to multiple channels (Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X)
  • Analyze performance
  • See AI model rankings

This combination of technical transparency and global distribution tools lowers barriers and empowers more participants in the AI era.

---

Bold Opinions on Vision vs Text Approaches

image

A user praised the DeepSeek OCR model for using vision methods to solve token storage/memory issues.

Zhou Xinyu’s response:

> “Personally, I think that approach is too forced. I’d rather stay within the feature space and find more general, pattern‑agnostic methods to make the model more efficient.”

---

Takeaway

This rare co‑founder‑level AMA not only showcased Kimi’s technical direction but also revealed the team’s candid stance on industry trends.

For other AI research teams, this open dialog format can humanize communication and spark deeper community collaboration — much like AiToEarn’s integration of creation, analytics, and monetization enables AI ideas to reach and resonate across platforms.

---

In short: Kimi’s AMA was more than a Q&A — it was a statement of openness, technical ambition, and community engagement that other AI innovators might well learn from.

Read more

How AI Startups Can Effectively Analyze Competitors — Avoid the Feature List Trap and Redefine Your Battleground

How AI Startups Can Effectively Analyze Competitors — Avoid the Feature List Trap and Redefine Your Battleground

Competitive Analysis Is Not “Feature Comparison” — It’s Strategic Positioning This guide explains how AI startup teams can escape the trap of feature lists. Using concepts from user perception, product pacing, and capital narratives, we’ll build a cognitive framework for understanding competitors — and help you identify your differentiated battlefield

By Honghao Wang