Apple Releases Pico-Banana-400K Dataset to Advance Text-Guided Image Editing
Pico-Banana-400K: High-Quality Dataset for Text-Guided Image Editing
Pico-Banana-400K is a curated dataset of 400,000 images developed by Apple researchers to advance text-guided image editing models.
The dataset was generated through the following pipeline:
- Source Images: Real photographs from the Open Images collection.
- Image Editing: Google's Nano-Banana applied edits based on text prompts.
- Quality Filtering: Gemini‑2.5‑Pro evaluated outputs for quality and prompt fidelity.
---
Why This Dataset Matters
Apple researchers describe Pico-Banana-400K as aiming to bridge the gap in large-scale, high-quality, shareable datasets for image editing. Existing datasets either:
- Are human-curated → high quality but small scale
- Are synthetic → large scale but dependent on proprietary models like GPT‑4o
Key distinguishing factors:
> “We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction fidelity through MLLM-based quality scoring and careful curation.”
This balance of quality, diversity, and scalability makes it valuable for:
- Researchers working on advanced editing models
- Creative professionals exploring AI-enhanced workflows
- Developers building AI-powered image manipulation tools
---
Dataset Creation Workflow
Step-by-step process:
- Pick Base Images
- Selected diverse subjects: people, objects, text-based scenes from Open Images.
- Generate Editing Prompts
- Gemini‑2.5‑Flash created initial instructions.
- Prompts were refined and shortened using Qwen2.5‑7B‑Instruct for more natural phrasing.
- Apply Edits
- Nano-Banana executed edits as per prompts.
- Evaluate Output Quality
- Gemini‑2.5‑Pro scored images against four criteria:
- Instruction compliance (40%)
- Editing realism (25%)
- Preservation balance (20%)
- Technical quality (15%)
- Curate Final Dataset
- Successful images added to main dataset.
- ~56K failed outputs preserved for robustness and preference learning.

---
Editing Taxonomy
Researchers designed 35 types of edits, grouped into eight categories, such as:
- Pixel & Photometric Adjustments
- Example: Changing overall color tone
- Object-Level Semantics
- Example: Relocating an object, changing its color
- Scene Composition
- Example: Adding a new background
- Stylistic Transformation
- Example: Converting a photo into a sketch
- Plus other transformation types to ensure coverage across editing tasks.
---
Dataset Structure
The dataset includes:
- Main Set (257K examples)
- Single-turn text–image–edit prompts.
- Multi-Turn Editing Subset (72K examples)
- Sequential modifications supporting reasoning and planning research.
- Failed Image Subset (56K examples)
- Negative examples for alignment research and reward model training.
- Instruction Pairing Subset
- Pairs long-form and concise instructions to improve rewriting and summarization models.
---
Licensing & Access
- Hosted by: Apple’s CDN
- Access: GitHub Repository
- License:
- Pico-Banana-400K → CC BY-NC-ND 4.0
- Open Images source → CC BY 2.0
---
Extending Dataset Use with AiToEarn
AiToEarn官网 is an open-source global AI content monetization platform that integrates:
- AI content generation
- Cross-platform publishing (Douyin, Bilibili, Instagram, YouTube, X)
- Analytics and model rankings (AI模型排名)
By combining Pico-Banana-400K with AiToEarn:
- Researchers move from model training → real-world deployment faster
- Creators monetize AI outputs efficiently across multiple platforms
- Organizations streamline publishing and analytics in AI-driven workflows
> Tip: Large-scale datasets like Pico-Banana-400K are ideal for experimentation. When paired with platforms like AiToEarn, projects can shift seamlessly from R&D to audience engagement and monetization.
---
Would you like me to also create a data workflow diagram for the Pico-Banana-400K pipeline so the steps are visually easier to follow?