How Powerful Is Gemini 3? In Just a Few Chats, It Produced a 14-Page PhD-Level Paper — Wharton Expert Says It’s Evolved into a Digital Colleague

How Powerful Is Gemini 3? In Just a Few Chats, It Produced a 14-Page PhD-Level Paper — Wharton Expert Says It’s Evolved into a Digital Colleague

Three Years of AI Leap: From Chatting to Building Interactive Games

Three years ago, ChatGPT arrived, showing us that AI could not only answer questions—it could converse. Quickly, its skills evolved from writing emails and reports to producing code, with seemingly limitless possibilities.

By 2025, nearly every major tech company launched multimodal AI models capable of processing and reasoning over text, image, audio, and video.

Recently, Gemini 3 took multimodal reasoning even further—overtaking many competitors.

---

How Far Have We Really Come?

Wharton AI professor Ethan Mollick posed a striking challenge: he asked Gemini 3 to demonstrate its own progression over the past three years.

The outcome was impressive:

  • Coordinated four Agents working in parallel
  • Planned tasks automatically
  • Wrote code, built websites, conducted research
  • Produced a 14‑page academic paper approaching PhD-level sophistication

What follows is Mollick’s first-hand account of that experiment.

image

---

From Otter Poems to a Candy-Powered Starship Game

When Mollick tested Google’s Gemini 3, instead of rattling off benchmarks, he gave it one prompt:

> “Demonstrate—by doing something—how much AI has progressed since November 2022.”

Back then, ChatGPT’s wow factor was writing coherent paragraphs or funny poems—like one about a candy-powered faster-than-light engine escaping an otter’s pursuit.

This time, Gemini 3 replied:

> “I won’t just write text. I’ll build you a fully interactive Candy-Powered Starship Simulator game. In 2022 AI could describe such a game. In 2025, AI codes it, designs the interface, and lets you pilot the ship.”

image

The game worked—with live narration, humor, and whimsical poetry. Mollick noted: when you stop treating Gemini 3 like a chatbot and view it as an autonomous creative partner, far more possibilities emerge.

---

Antigravity: Beyond “Code Writing”

Google’s Antigravity, launched alongside Gemini 3, resembles developer tools like Claude Code or OpenAI Codex—connecting directly to your computer to autonomously write and run programs under guidance.

Why it Matters—Even if You’re Not a Programmer

  • Anything you do on a computer boils down to code execution.
  • Powerful AI Agents can perform any task automatable via code—from building dashboards to controlling browsers.

Mollick experimented by giving Antigravity:

  • Permission to read all press releases on his computer
  • A task: “Build a beautiful site that organizes my AI predictions and shows which came true.”

Antigravity:

  • Read and processed files
  • Proposed an executable plan
  • Built the site
  • Tested it in his browser
  • Delivered a ready-to-publish result

While not flawless, execution felt like working with a capable human colleague—always under his supervision.

image

---

Testing “PhD-Level Intelligence”

Benchmark chatter often labels top-tier models as PhD-level. Mollick tried to verify that claim.

He gave Gemini 3:

  • Messy data from decade-old crowdfunding research
  • Instructions: “Clean these STATA files for analysis.”

It repaired corrupted data and mapped complex structures.

Next, he asked for:

> “An original research paper using crowdfunding data—address a serious theory in entrepreneurship or business strategy, run analyses, format for journal submission.”

Gemini 3:

  • Generated hypotheses
  • Ran statistical tests
  • Created a new NLP-based metric to measure idea uniqueness
  • Executed the code for this metric
  • Produced a formatted 14-page paper

It wasn’t perfect—methods needed refinement—but quality was comparable to an ambitious graduate student.

---

Shifting from Chatbots to Digital Colleagues

Key Takeaway: Gemini 3 + Antigravity show AI evolving from text responders to autonomous collaborators.

  • Human-in-the-loop is changing: from fixing AI mistakes → to guiding AI efforts.
  • AI is now a “think + execute” partner—not just a conversational tool.
image

---

Public Reaction: Skepticism & Support

HN commenters raised valid questions:

  • Is the AI-generated paper actually good or merely long?
  • Complex code often looks correct but fails in production.
  • AI needs human oversight, corrections, and clear task definition.

Examples:

  • Claude’s 20-page story still had plot inconsistencies.
  • Slide generation quality remains unpredictable.
  • Current LLMs shine when details aren’t mission-critical.

Some academics argue models are nearing graduate-student competence, but still require expert review.

---

Remaining Challenges

  • Reliability: Avoiding subtle errors
  • Judgment: Aligning output with nuanced human intent
  • Interfaces: AI work still feels trapped in a text box—UX innovation needed

---

Platforms Bridging AI Creation & Distribution

Tools like AiToEarn官网 exemplify integration:

  • Open-source framework
  • AI content generation
  • Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X)
  • Analytics + AI模型排名

These ecosystems may define the next leap: intelligent multi-agent work + seamless global distribution.

---

Final Thought

In under 1,000 days, AI jumped from whimsical otter poems → multi-agent research orchestration.

Potential is huge—but trust, guidance, and usability remain critical.

---

Your Turn

Has AI tangibly impacted your work or creativity?

Do you agree with Mollick’s optimism?

Share your experiences in the comments and join the discussion.

Reference:

https://www.oneusefulthing.org/p/three-years-from-gpt-3-to-gemini

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.