AlphaProof

Nature reveals Google IMO gold medal model technical details: Core team of only 10 generates 80 million math problems for AI training in a year

Honghao Wang

13 Nov 2025 — 3 min read

Google DeepMind Unveils AlphaProof — IMO Gold Medal-Winning AI

Google DeepMind’s latest breakthrough in mathematical reasoning, AlphaProof, has been fully disclosed — including both its architecture and training methods.

Continuing DeepMind’s naming tradition, AlphaProof builds upon earlier successes like AlphaZero and now joins the ranks of Nature-published research.

---

Behind the Scenes: Development Insights

Lead author Tom Zahavy shared key moments from the project:

Small, focused team — around 10 core members, with more joining near the IMO competition.
Key breakthrough by Miklós Horváth (IMO gold medallist) — devised a method to generate multiple problem variations for training.

The team experimented for over a year, integrating only the most effective ideas into AlphaProof.

---

Core Concept: Turning Proof into a Game

AlphaProof transforms mathematical proving into a reinforcement learning environment using the Lean theorem prover:

Each proposition becomes a new “game level.”
The AI selects tactics to advance the proof.
Success yields sub-goals; completing all goals finishes the proof.

---

Architecture & Training

Model Design

3-billion-parameter encoder-decoder transformer as the “brain.”
Outputs:
Next tactics to try.
Steps remaining estimate.

Search Approach

Modified AlphaZero-style tree search.
AND-OR tree structure breaks proofs into independent subproblems.
Progressive sampling explores diverse strategies.

Data Acquisition

Pretraining — 300B tokens of code/math text for logic fundamentals.
Fine-tuning — 300K human-written proofs from Mathlib.
Automated formalization — Using Gemini 1.5 Pro to convert natural-language problems into Lean format.
Generated ~80M formalized problems from 1M questions.

Main Reinforcement Loop

Continually attempts to prove/disprove generated propositions.
Each attempt adds experience data for learning — even imperfect formalizations are useful.
Compute used: ~80,000 TPU days.

---

Variant Generation for Hard Problems

For especially tough targets:

Generate ~400,000 problem variants.
Includes simplifications, generalizations, and related cases.
Train dedicated models in parallel, each with its own curated curriculum.

---

IMO 2024 Performance & TTRL

At IMO complexity, more search time isn’t enough — enter Test-Time Reinforcement Learning (TTRL):

Create many variants of the target problem.
Train an “expert” model specifically on these variants.

Example: IMO 2024 Problem 1

Variants: Only rational α, stronger conditions, α near integer values.

Results:

Solved 3 problems (P1, P2, P6) — P6 was the hardest, solved by only 5 of 609 participants.
Each TTRL run: ~2–3 days compute.
Initially expected bronze; final full solutions emerged days later, securing gold.

---

Open Access for Researchers

Post-win, DeepMind opened AlphaProof for applications:

User feedback:

Alex Kontorovich: Effective at finding counterexamples — quickly reveals missing assumptions.
Talia Ringer: Proved one PhD lemma in under a minute; refuted another, exposing a definition flaw.

---

Known Limitations

Custom definitions bottleneck — Struggles outside well-established Mathlib concepts.
Lean dependency — Strength from mature tactics, weakness in evolving environment.
Data scarcity — Limited unique math problems; variant generation is promising but still limited.

---

Future Outlook

Generating novel problems will be key for general-purpose math AIs.
Geoffrey Hinton predicts AI will surpass humans in math knowledge-sharing and dataset generation.
AlphaProof is an early glimpse of this future.

---

Parallel in Creative Content

Platforms like AiToEarn官网 offer AI content generation, publishing, and monetization — across Douyin, Kwai, WeChat, Bilibili, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X.

They integrate:

Content generation
Cross-platform publishing
Analytics
Model ranking (AI模型排名)

Mirroring AlphaProof’s workflow — scaling capability & impact via integrated AI pipelines.

---

Paper & References

Paper:

https://www.nature.com/articles/s41586-025-09833-y

References:

[1] https://www.tomzahavy.com/post/how-we-achieved-an-imo-medal-one-year-before-everyone-else

[2] https://www.nature.com/articles/d41586-025-03585-5

---

If you publish advanced topics like the AlphaProof breakthrough, platforms like AiToEarn官网 help you create, distribute, and monetize globally — while tracking reach and performance via AI模型排名.