AI news

Precisely Targeting "Tough Nuts": Hard Sample Filtering Breaks SFT Dependence, GRPO-Only Achieves Dual Optimality in Perception and Reasoning

AI news

Precisely Targeting "Tough Nuts": Hard Sample Filtering Breaks SFT Dependence, GRPO-Only Achieves Dual Optimality in Perception and Reasoning

A set of new experiments accepted by AAAI 2026 tackles one of the toughest challenges in post-training large multimodal models head-on. On two major benchmark categories — visual reasoning and visual perception — a GRPO-only paradigm, trained solely on medium + hard samples and without any SFT (supervised fine-tuning), achieves nearly all the