Google Officially Announces Gemini 3: Team Reveals Two “Aha Moments” in Model Training — Hassabis: “Another Step Toward AGI” | [Jingwei Low-Key Share]
Gemini 3: Another Step Toward AGI

On November 18 (Pacific Time), Google officially launched Gemini 3, marking a significant leap forward in the journey toward Artificial General Intelligence (AGI).
This new-generation model sets fresh boundaries for AI–human collaboration, delivering breakthroughs in reasoning, multimodal understanding, and coding.
---
Key “Aha Moments” in Training
- Natural-language creation of a 3D interactive game — the model could turn plain instructions into an actual playable environment.
- Deep literary interpretation in Gujarati — it translated and creatively reworked a poem, showing sensitivity, style, and cultural nuance.
These moments signal intelligence that approaches human-like intuition — blending creation and understanding seamlessly.
---
Performance Highlights
- Reasoning Excellence
- Gemini 3 Pro scored 1501 on the LMSys Elo Arena (top of the leaderboard).
- Achieved 91.9% accuracy in GPQA Diamonds (graduate-level reasoning).
- Multimodal Mastery
- 81% on MMMU-Pro benchmark.
- 87.6% on Video-MMMU.
- Advanced Coding
- Visual Coding — instantly transform ideas into interactive UI.
- Agentic Coding — autonomous task decomposition & tool-calling for efficiency.
---

Tulsee Doshi, Senior Director of Product Management for Gemini, described two personal moments of discovery:
- Building a playable 3D game from a simple text prompt.
- Crafting a creative rework of a Gujarati poem with impressive sensitivity.
One embodied creation, the other understanding — both revealing Gemini 3’s “magic” when combining multimodal input, complex reasoning, and desired output formats.
---
01 — Deconstructing Gemini 3
Gemini 3 is anchored in three major pillars:
- Reasoning
- Multimodal Understanding
- Coding
CTO Koray Kavukcuoglu emphasized the conversational style:
> “Gemini 3’s answers are intelligent, concise, and to the point.”
Reasoning Advancements
- LMSys Elo Arena — No.1 with +50 Elo over Gemini 2.5 Pro.
- GPQA Diamonds — 91.9% accuracy.
- Humanity’s Last Exam — 37.5% accuracy (no tools) in multi-step logic.
Deep Think Mode
- Humanity’s Last Exam — 41.0%.
- GPQA Diamond — 93.8%.
---
Multimodal Understanding
CEO Sundar Pichai likened progress to AI being able to “read the room” — integrating text, images, audio with context-awareness.
Gemini 3 Pro ranks as world-leading in multimodal comprehension:
- MMMU-Pro — 81%
- Video-MMMU — 87.6%
Real-world Applications
- Learning Aid — transform long lectures into interactive flashcards.
- Multilingual Recipes — digitize handwritten mixed-language content.
- Sports Analysis — study videos and give tailored training plans.
---
Coding Capabilities
If reasoning & multimodality are the “brain”, coding is the “hands”.
Two core innovations:
Vibe Coding — What You Feel is What You Get
Abstract ideas or moods get translated into functional, interactive digital products — ready-made front-ends generated entirely from natural language.
Agentic Coding — Autonomous Digital Agents
Enables planning, multi-step execution, and independent tool-calling.
Example: Automating concert ticket purchase workflow — the user only confirms the final step.
---
02 — Reconstructing the AI Experience
1. In Google Search: From Answer Engine → Discovery & Creation Engine
Query Fan-out method breaks complex queries into multiple smaller ones, retrieves at scale, and synthesizes results into tailored pages — combining web, maps, knowledge graph, and product data.
Generative UI builds custom interactive elements in real-time:
- Physics simulations for science queries.
- Dynamic travel guides with maps & schedules.
2. In the Gemini App: Personal, Agent-like Interaction
- Dynamic Views — render custom interactive layouts per request.
- Visual Layouts — magazine-style travel plans, image-rich itineraries.
- Gemini Agent — experimental tool for Ultra subscribers to execute real-world tasks across user’s Google ecosystem.
---
04 — Google AntiGravity: Developer Platform
A new agent development environment, giving AI autonomous execution within a dedicated interface.
Capabilities include:
- Task-based development (e.g., “Create a flight tracker app”).
- Self-verification & reporting.
- Adaptive learning of developer preferences.
Also integrated with Cursor, GitHub, JetBrains, Replit.
---
05 — Full-Stack Strategy & Speed Advantage
Google’s full-stack integration spans:
- Hardware — TPU chips, training clusters.
- Research — Google DeepMind breakthroughs.
- Models & Tools — foundational Gemini series.
- Products & Platforms — immediate consumer integration.
Benefit: Ultra-fast model deployment — Gemini 3 hit Google Search on launch day.
Enterprise offerings via Vertex AI and Gemini Enterprise already serve partners like Box, Thomson Reuters, and Rakuten, solving tasks from legal analysis to multimodal processing.
---
Related Reading
- Internal speech by Zhang Ying of Matrix Partners China: 2024, Four Key Decisions
- Xu Chuan-sheng of Matrix Partners: Same VC question over years
- Zhang Ying: Four key predictions for 2025
- Xu Chuan-sheng: Will the next “China” still be China?

---
---
⚡ Takeaway:
Gemini 3 bridges cutting-edge AI with instant real-world deployment — from immersive search experiences to autonomous agents and developer platforms — showing Google’s push toward AGI through speed, synergy, and full-stack integration.