AI models

With Less Than 1% of OpenAI’s Budget, Kimi’s New Model Surpasses GPT-5

Honghao Wang

08 Nov 2025 — 5 min read

Open Source vs. Closed Source Models: The Real Showdown

Overview

After a four-month hiatus, Kimi has launched the inference-ready version of its flagship open-source model K2 — now renamed K2 Thinking.

It is the most powerful open-source thinking model Kimi has ever built, equipped with:

1 trillion parameters
Mixture of Experts (MoE) architecture
32B active parameters
Native INT4 quantization
256k context window
High compatibility with domestic GPUs

K2 Thinking represents a major leap forward in AI model development.

---

Benchmark Performance

According to official benchmarks, K2 Thinking achieved state-of-the-art (SOTA) results in Humanity’s Last Exam (HLE) — not only dominating the open-source sphere, but also surpassing similar closed-source models.

Compared to the original K2, this new model:

Requires no human intervention
Can autonomously execute up to 300 rounds of tool invocations
Supports multi-step reasoning
Solves significantly more complex problems

This shift from "Model as Agent" to "Model as Thinking Agent" reflects Kimi’s strategic focus on innovation over sheer resource scale — an effort to rival Western AI giants despite limited resources.

---

⬆️ Follow Founder Park for timely, high-value entrepreneurial insights.

---

Join the “AI Product Marketplace” Community

15,000+ members — industry pros, developers, entrepreneurs.

Scan the Feishu QR Code to join:

Membership benefits include:

Latest & most noteworthy AI product news
Exclusive giveaways (beta invites, membership codes)
Targeted channels for AI product exposure

---

1 — K2 Thinking:

SOTA Performance in Humanity’s Last Exam

Quoting Yao Shunyu from AI in the Second Half:

> “In AI’s second half — beginning now — focus will shift from solving problems to defining them. Evaluation will matter more than training.”

Why Evaluation Matters

With traditional benchmarks like NMLU and GPQA becoming insufficient, the Humanity’s Last Exam (HLE) emerged in 2025 to measure next-generation AI.

HLE facts:

2,500 advanced academic challenges
100+ disciplines
Contributions from nearly 1,000 experts in 50+ countries
Created by the Center for AI Safety and Scale AI
Officially released March 4, 2025

K2 Thinking’s HLE Score

Under uniform conditions (tools allowed: search engines, Python, web browsing):

Text-Only HLE dataset score: 44.9% — SOTA result

When HLE launched, scores <20% were common; today, top models exceed 40%. Despite scaling law limits, open-source models are now matching or outperforming closed-source competitors.

---

Impact on the AI Ecosystem

This convergence of performance fosters:

Greater competitiveness
More collaboration
A stronger open-source community

Platforms like AiToEarn官网 help creators deploy advanced models to generate and monetize content across major platforms — complete with analytics and AI model rankings (AI模型排名).

Independent reviewers (e.g., Guizang’s AI Toolbox, Cyber Zen Mind) report:

Significant coding capability boost
Noticeably improved problem-solving skills

---

2 — From “Brain in a Vat” to Agent Intelligence

Four months ago, at K2's original release, many asked: Why isn’t this a reasoning model?

Amid the world’s excitement over DeepSeek R1, Kimi’s priority was Agents — specifically models with more effective tool calling.

Two Model Paradigms (Yang Zhilin)

Yang explained:

Reasoning-focused paradigm
Example: o1 models
Relies on repeated attempt + reflection cycles
Still an isolated “brain in a vat”
Agentic paradigm / Multi-turn Agent Reinforcement Learning
Interacts extensively with external environment
Focused on multi-turn tool usage
Supports test-time scaling

Kimi chose interaction first, reflection later — evolving into K2 Thinking.

Key Advantages

Multi-turn: Multiple iterative steps during inference
Tool integration: Connects internal reasoning to the real world

Result:

K2 Thinking can autonomously complete 300 rounds of tool calls & reasoning.

In OpenAI’s L1–L5 tiers, L3 Agent intelligence may now be within reach.

---

Technical Notes

K2 Thinking’s API requires returning the `reasoning_content` field for continuity across multi-step tool usage — similar to Claude’s “extended thinking” method.

Currently, GPT and Gemini models do not support this feature.

---

3 — Resource Challenges and Market Context

After the K2 Thinking release, this chart appeared on X/Twitter:

Despite SOTA performance, Moonshot AI (K2’s creator) is valued at:

0.5% of OpenAI’s worth
~2% of Anthropic or Grok

---

Comparative Data

|----------------|------------------|------------------------------------|---------------|

Key takeaway:

Chinese AI startups face GPU scarcity, limited funding, higher costs, and fewer trial/error chances — yet K2 Thinking outperforms GPT‑5 & Grok 4 using <1% of resources.

---

As NVIDIA CEO Jensen Huang suggested to the Financial Times, China may be closing the AI gap with the U.S.

---

Case Study: Marketing Agent Startup

Seed Round — $25M USD
Boosts client sales by up to 40%
Combines tech + creative strategy + performance marketing
Heavy use of AI analytics for cross-platform optimization

---

How AiToEarn Fits In

AiToEarn官网 enables:

Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
Revenue tracking
Model ranking (GitHub repo)

Benefits:

Streamlined content generation → distribution → analytics
Faster go-to-market cycles
Consistent cross-channel branding
Higher ROI

---

Conclusion:

In today’s AI race, precise execution and efficient resource use matter more than sheer scale. Open-source breakthroughs like K2 Thinking, combined with monetization platforms such as AiToEarn, could empower smaller players to challenge trillion-dollar incumbents — creating a more balanced and dynamic AI industry.