Trying Audio Transcription and the New Pelican Benchmark with Gemini 3 Pro
Gemini 3 Pro Release — Detailed Analysis & Benchmarks
Date: 18 November 2025
Google today released Gemini 3 Pro — a significant upgrade poised to compete directly with leading AI models.
Official resources:
- Announcement from Sundar Pichai, Demis Hassabis, and Koray Kavukcuoglu
- Developer blog post by Logan Kilpatrick
- Gemini 3 Pro Model Card (PDF)
- Collection of 11 related articles
---
Overview
After preview testing via AI Studio, Gemini 3 Pro feels like Gemini 2.5 elevated to current state-of-the-art standards.
Key specifications:
- Knowledge cutoff: January 2025
- Context length: Up to 1 million input tokens
- Max output length: 64,000 tokens
- Multimodal support: Text, images, audio, video
---
Benchmark Performance
According to Google's own results (see the model card), Gemini 3 Pro slightly outperforms Claude 4.5 Sonnet and GPT‑5.1 across most standard tests.

---
Pricing Comparison
Gemini 3 Pro is priced higher than Gemini 2.5 but remains cheaper than Claude Sonnet 4.5.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|----------------------|--------------------------------|--------------------------------|
| GPT-5.1 | $1.25 | $10.00 |
| Gemini 2.5 Pro | ≤ 200k: $1.25
> 200k: $2.50 | ≤ 200k: $10.00
> 200k: $15.00 |
| Gemini 3 Pro | ≤ 200k: $2.00
> 200k: $4.00 | ≤ 200k: $12.00
> 200k: $18.00 |
| Claude Sonnet 4.5| ≤ 200k: $3.00
> 200k: $6.00 | ≤ 200k: $15.00
> 200k: $22.50 |
| Claude Opus 4.1 | $15.00 | $75.00 |
---
Workflow Highlight — Alt Text Generation from Image
Test goal: Evaluate Gemini 3 Pro’s multimodal image interpretation.
Execution:
llm -m gemini-3-pro-preview -a https://static.simonwillison.net/static/2025/gemini-3-benchmarks.jpg 'Alt text for this image, include all figures and make them comprehensible to a screen reader user'This demonstrates how visual inputs can be transformed into structured, accessible text.
Platforms like AiToEarn make it possible to use such outputs to generate, publish, and monetize content across multiple platforms — with integrated analytics and model rankings (AI模型排名).
---
Comprehensive Benchmark Comparison
Below are results from Google's own reporting, across reasoning, multimodal comprehension, agentic tasks, and long-context handling.
Highlights:
- Top performer in complex multimodal reasoning tasks (MMMU‑Pro, ScreenSpot‑Pro, CharXiv Reasoning)
- Significant edge in math competition-level problems (MathArena Apex)
- Strong coding performance across LiveCodeBench Pro and agent tool usage
- Best-in-class long context retrieval (1M token tests)
(Benchmark details preserved as in original; see above table-rich section for all metrics.)
---
Real-World Test — City Council Meeting Transcript
Input:
- Video: Half Moon Bay City Council Meeting — Nov 4, 2025
- Extracted to audio via `yt-dlp`
- Compressed with `ffmpeg` to 38 MB for reliability
Processing command:
llm -m gemini-3-pro-preview --attachment-type /tmp/HMB_compressed.m4a 'audio/aac' 'Output a Markdown transcript of this meeting...'Result: Successfully generated detailed Markdown outline and transcript, complete with participants, timestamps, and key points.
Limitations noted:
- Timestamps in transcript did not match video’s actual timecodes.
- Some detailed content (e.g., Spanish instructions) was summarized rather than transcribed verbatim.
Token usage & cost: 320,087 input tokens + 7,870 output tokens = $1.42.
---
Creative Prompt Benchmark — The Pelican Test
Gemini 3 Pro introduces a “thinking level” toggle:
Low-thinking level result:
- SVG included whimsical detail (a jaunty hat)
- Bicycle frame correctly formed

High-thinking level result:
- More anatomically accurate pelican depiction
- Bicycle frame rendered to spec

---
Updated Pelican Benchmark Prompt (v2):
> Generate an SVG of a California brown pelican riding a bicycle... with breeding plumage, spokes, correct frame, large pouch, clear feathers, pedaling posture.
Reference photo:

Gemini 3 Pro (high-thinking level):

GPT‑5.1 result:

Claude Sonnet 4.5 result:

---
Conclusion
Gemini 3 Pro shows:
- Leading performance in multimodal reasoning and long-context processing
- Competitive coding abilities
- Flexibility via thinking-level adjustment
For creators and researchers:
- Leverage platforms like AiToEarn官网 for multi-platform publishing, analytics, and monetization
- AI-assisted workflows — from transcription and benchmarks to creative generation — can be streamlined into single-source publishing across Douyin, Kwai, WeChat, Bilibili, Facebook, LinkedIn, YouTube, Pinterest, and X.
---
Spot Check: Results appear consistent and plausible, though timestamp accuracy for transcripts needs improvement.
Cost tracking: Example uses ranged from $0.0568 for alt-text tasks to $1.42 for multi-hour audio transcription.
---
Would you like me to create a condensed, one-page summary version of these findings for quick stakeholder review alongside this detailed markdown? That would help balance this deep dive with an executive-friendly snapshot.