On-policy distillation

AI training

The Secret to Boosting AI Learning Efficiency by 50×: Online Strategy Distillation

Interpreting Thinking Machines Lab’s Latest Research: On‑Policy Distillation --- Introduction: Rethinking How Machines Learn Imagine you’re teaching a student to write an essay. * Traditional way: Give them ten sample essays and tell them to imitate. * → This is imitation learning. * Problem: Faced with a new topic, they struggle.

On-policy distillation

Weng Li’s “Elegant” Approach to Strategy Distillation — How It Redefines Cost and Efficiency | New Paper Analysis

A Leap of Imagination AI Future Compass — a paper-interpretation column breaking down top conference and journal highlights with frontline perspectives and accessible language. --- Breaking the "Impossible Triangle" For years, post-training of models has been trapped in an impossible triangle: Researchers want models to have strong capabilities, low

On-policy distillation

Thinking Machine's New Study Goes Viral: Combining RL + Fine-Tuning for More Cost-Effective Small Model Training

Thinking Machine’s Breakthrough: On-Policy Distillation for Efficient LLM Training Thinking Machine’s latest research is generating intense discussion in the AI community. After being personally reposted by Mira Murati — founder and former OpenAI CTO — many prominent figures praised its research value: According to Murati’s summary, the team has