Google Officially Announces Gemini 3: Team Reveals Two “Aha Moments” in Model Training — Hassabis: “Another Step Toward AGI” | [Jingwei Low-Key Share]

Google Officially Announces Gemini 3: Team Reveals Two “Aha Moments” in Model Training — Hassabis: “Another Step Toward AGI” | [Jingwei Low-Key Share]

Gemini 3: Another Step Toward AGI

image

On November 18 (Pacific Time), Google officially launched Gemini 3, marking a significant leap forward in the journey toward Artificial General Intelligence (AGI).

This new-generation model sets fresh boundaries for AI–human collaboration, delivering breakthroughs in reasoning, multimodal understanding, and coding.

---

Key “Aha Moments” in Training

  • Natural-language creation of a 3D interactive game — the model could turn plain instructions into an actual playable environment.
  • Deep literary interpretation in Gujarati — it translated and creatively reworked a poem, showing sensitivity, style, and cultural nuance.

These moments signal intelligence that approaches human-like intuition — blending creation and understanding seamlessly.

---

Performance Highlights

  • Reasoning Excellence
  • Gemini 3 Pro scored 1501 on the LMSys Elo Arena (top of the leaderboard).
  • Achieved 91.9% accuracy in GPQA Diamonds (graduate-level reasoning).
  • Multimodal Mastery
  • 81% on MMMU-Pro benchmark.
  • 87.6% on Video-MMMU.
  • Advanced Coding
  • Visual Coding — instantly transform ideas into interactive UI.
  • Agentic Coding — autonomous task decomposition & tool-calling for efficiency.

---

image

Tulsee Doshi, Senior Director of Product Management for Gemini, described two personal moments of discovery:

  • Building a playable 3D game from a simple text prompt.
  • Crafting a creative rework of a Gujarati poem with impressive sensitivity.

One embodied creation, the other understanding — both revealing Gemini 3’s “magic” when combining multimodal input, complex reasoning, and desired output formats.

---

01 — Deconstructing Gemini 3

Gemini 3 is anchored in three major pillars:

  • Reasoning
  • Multimodal Understanding
  • Coding

CTO Koray Kavukcuoglu emphasized the conversational style:

> “Gemini 3’s answers are intelligent, concise, and to the point.”

Reasoning Advancements

  • LMSys Elo Arena — No.1 with +50 Elo over Gemini 2.5 Pro.
  • GPQA Diamonds — 91.9% accuracy.
  • Humanity’s Last Exam — 37.5% accuracy (no tools) in multi-step logic.

Deep Think Mode

  • Humanity’s Last Exam — 41.0%.
  • GPQA Diamond — 93.8%.

---

Multimodal Understanding

CEO Sundar Pichai likened progress to AI being able to “read the room” — integrating text, images, audio with context-awareness.

Gemini 3 Pro ranks as world-leading in multimodal comprehension:

  • MMMU-Pro — 81%
  • Video-MMMU — 87.6%

Real-world Applications

  • Learning Aid — transform long lectures into interactive flashcards.
  • Multilingual Recipes — digitize handwritten mixed-language content.
  • Sports Analysis — study videos and give tailored training plans.

---

Coding Capabilities

If reasoning & multimodality are the “brain”, coding is the “hands”.

Two core innovations:

Vibe Coding — What You Feel is What You Get

Abstract ideas or moods get translated into functional, interactive digital products — ready-made front-ends generated entirely from natural language.

Agentic Coding — Autonomous Digital Agents

Enables planning, multi-step execution, and independent tool-calling.

Example: Automating concert ticket purchase workflow — the user only confirms the final step.

---

02 — Reconstructing the AI Experience

1. In Google Search: From Answer Engine → Discovery & Creation Engine

Query Fan-out method breaks complex queries into multiple smaller ones, retrieves at scale, and synthesizes results into tailored pages — combining web, maps, knowledge graph, and product data.

Generative UI builds custom interactive elements in real-time:

  • Physics simulations for science queries.
  • Dynamic travel guides with maps & schedules.

2. In the Gemini App: Personal, Agent-like Interaction

  • Dynamic Views — render custom interactive layouts per request.
  • Visual Layouts — magazine-style travel plans, image-rich itineraries.
  • Gemini Agent — experimental tool for Ultra subscribers to execute real-world tasks across user’s Google ecosystem.

---

04 — Google AntiGravity: Developer Platform

A new agent development environment, giving AI autonomous execution within a dedicated interface.

Capabilities include:

  • Task-based development (e.g., “Create a flight tracker app”).
  • Self-verification & reporting.
  • Adaptive learning of developer preferences.

Also integrated with Cursor, GitHub, JetBrains, Replit.

---

05 — Full-Stack Strategy & Speed Advantage

Google’s full-stack integration spans:

  • Hardware — TPU chips, training clusters.
  • Research — Google DeepMind breakthroughs.
  • Models & Tools — foundational Gemini series.
  • Products & Platforms — immediate consumer integration.

Benefit: Ultra-fast model deployment — Gemini 3 hit Google Search on launch day.

Enterprise offerings via Vertex AI and Gemini Enterprise already serve partners like Box, Thomson Reuters, and Rakuten, solving tasks from legal analysis to multimodal processing.

---

image

---

Read the original

Open in WeChat

---

⚡ Takeaway:

Gemini 3 bridges cutting-edge AI with instant real-world deployment — from immersive search experiences to autonomous agents and developer platforms — showing Google’s push toward AGI through speed, synergy, and full-stack integration.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.