open-source AI

Open-Source Model Wins First Physics Olympiad Gold: Shanghai AI Lab's 235B Model Beats GPT-5 and Grok-4

Honghao Wang

25 Oct 2025 — 3 min read

🏅 Open-source AI Model Wins Gold at International Physics Olympiad

Historic Achievement

The P1-235B-A22B model from Shanghai AI Lab has achieved a 21.2/30 score at the International Physics Olympiad (IPhO) — surpassing the gold medal threshold and making history as the first open-source model to win gold.

In the HiPhO benchmark (13 top-tier global physics competitions, 2024–2025), P1-235B-A22B earned:

12 gold medals
1 silver medal
Tied first place on the leaderboard with Google Gemini-2.5-Pro

This surpasses:

GPT-5 – 11 golds
Grok-4 – 10 golds

It demonstrates that open-source models have now matched — and even surpassed closed-source models — in physics reasoning.

---

🌍 Significance of AI in Physics Reasoning

Physics reasoning is critical for understanding and shaping the real world. Prestigious competitions such as IPhO require:

Complex reasoning
Deep physics understanding

Winning gold is a vital milestone toward general physics intelligence and showcases real-world problem-solving potential.

---

🧪 HiPhO: Benchmark for Physics Olympiads

HiPhO (High School Physics Olympiad) is the first benchmark dedicated to recent Olympiad-level physics contests with human-aligned evaluation.

Coverage (2024–2025):

IPhO
APhO
EuPhO
Other regional Olympiads (total: 13 contests)

Evaluation Approach:

Official competition scoring standards
Fine-grained human-aligned evaluation of both answers and reasoning steps
Scores directly comparable to human contestants’ medal boundaries

HiPhO benchmark overview (13 competitions worldwide).

Training Dataset:

Thousands of Olympiad-level problems
Full context + verifiable answers
Standard solution paths
Purpose-built for reinforcement learning training

---

📈 Multi-stage Reinforcement Learning in P1

P1-series models achieve sustained improvement via multi-stage reinforcement learning, with two core strategies:

Expanding Context Window
Gradual increase of output length
Enables longer chains of reasoning
Improves complex problem-solving & reduces truncation errors
Pass-rate Filtering
Uses pass-rate statistics before training
Filters out tasks that are too easy or too hard

---

🤝 PhysicsMinions: Evolutionary Multi-agent Reasoning

To overcome single-model limitations, the team built PhysicsMinions — a collaborative evolutionary multi-agent system. It consists of three interconnected modules:

Visual Module (Visual Studio)
Observes and verifies multimodal problems
Extracts structured visual information
(Not used in P1 model’s experiments)
Logic Module (Logic Studio)
Generates initial solutions
Iteratively revises answers via self-reflection
Review Module (Review Studio)
Physics Validator: checks physical consistency (constants, units)
General Validator: checks logical and calculation soundness

❗ If a stage fails, an error report is sent back to the Logic Module for refinement — iterating toward higher accuracy.

PhysicsMinions framework and module interactions.

---

📊 Results & Benchmark Highlights

P1-235B-A22B:

12 gold + 1 silver (HiPhO)
Surpassed: GPT-5 (11 gold), Grok-4 (10 gold)
Scored 21.2/30 at IPhO 2025 — only open-source gold winner

P1-30B-A3B:

8 gold, 4 silver, 1 bronze (HiPhO)
Ranked 3rd among open-source models
Beats several closed-source models (e.g., o4-mini, Claude-4-Sonnet)

PhysicsMinions Impact:

P1-235B-A22B avg score: 35.9 (HiPhO)
With PhysicsMinions: 38.4 — overall 1st place surpassing Gemini-2.5-Pro (37.7) and GPT-5 (37.4)

P1 models also improved in math, code, STEM — showing generalization beyond physics.

---

🔗 Project Links

P1 Models:

Project Page: https://prime-rl.github.io/P1
GitHub: https://github.com/PRIME-RL/P1

HiPhO Benchmark:

Paper: https://arxiv.org/abs/2509.07894
Dataset: https://huggingface.co/datasets/SciYu/HiPhO
Leaderboard: https://phyarena.github.io/

PhysicsMinions:

Paper: https://arxiv.org/abs/2509.24855

---

🚀 Real-world Application for Creators

Platforms like AiToEarn官网 allow creators to:

Generate AI-assisted content
Publish across multiple social channels (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
Track analytics & performance
Monetize creativity efficiently

By integrating outputs from advanced AI models such as P1 into content workflows, creators can achieve global reach and fast monetization — powered by open-source AI breakthroughs.

Open-Source Model Wins First Physics Olympiad Gold: Shanghai AI Lab's 235B Model Beats GPT-5 and Grok-4

Honghao Wang

🏅 Open-source AI Model Wins Gold at International Physics Olympiad

Historic Achievement

🌍 Significance of AI in Physics Reasoning

🧪 HiPhO: Benchmark for Physics Olympiads

📈 Multi-stage Reinforcement Learning in P1

🤝 PhysicsMinions: Evolutionary Multi-agent Reasoning

📊 Results & Benchmark Highlights

🔗 Project Links

🚀 Real-world Application for Creators

Read more

Starting from 250K, Tank 400 Maxes Out Smart Home Use — Driver Assistance Works Even in Rainy Chongqing

Xie Saining, Fei-Fei Li, and Yann LeCun Team Up for the First Time! Introducing the New "Hyperception" Paradigm — AI Can Now Predict and Remember, Not Just See

Flexing Muscles While Building Walls: NVIDIA Launches OmniVinci, Outperforms Qwen2.5-Omni but Faces “Fake Open Source” Criticism

Song Zhiping: Companies Should Value and Promote “Obsessive” Talent