Apple Releases Pico-Banana-400K Dataset to Advance Text-Guided Image Editing

Pico-Banana-400K: High-Quality Dataset for Text-Guided Image Editing

Pico-Banana-400K is a curated dataset of 400,000 images developed by Apple researchers to advance text-guided image editing models.

The dataset was generated through the following pipeline:

  • Source Images: Real photographs from the Open Images collection.
  • Image Editing: Google's Nano-Banana applied edits based on text prompts.
  • Quality Filtering: Gemini‑2.5‑Pro evaluated outputs for quality and prompt fidelity.

---

Why This Dataset Matters

Apple researchers describe Pico-Banana-400K as aiming to bridge the gap in large-scale, high-quality, shareable datasets for image editing. Existing datasets either:

  • Are human-curated → high quality but small scale
  • Are synthetic → large scale but dependent on proprietary models like GPT‑4o

Key distinguishing factors:

> “We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction fidelity through MLLM-based quality scoring and careful curation.”

This balance of quality, diversity, and scalability makes it valuable for:

  • Researchers working on advanced editing models
  • Creative professionals exploring AI-enhanced workflows
  • Developers building AI-powered image manipulation tools

---

Dataset Creation Workflow

Step-by-step process:

  • Pick Base Images
  • Selected diverse subjects: people, objects, text-based scenes from Open Images.
  • Generate Editing Prompts
  • Gemini‑2.5‑Flash created initial instructions.
  • Prompts were refined and shortened using Qwen2.5‑7B‑Instruct for more natural phrasing.
  • Apply Edits
  • Nano-Banana executed edits as per prompts.
  • Evaluate Output Quality
  • Gemini‑2.5‑Pro scored images against four criteria:
  • Instruction compliance (40%)
  • Editing realism (25%)
  • Preservation balance (20%)
  • Technical quality (15%)
  • Curate Final Dataset
  • Successful images added to main dataset.
  • ~56K failed outputs preserved for robustness and preference learning.
image

---

Editing Taxonomy

Researchers designed 35 types of edits, grouped into eight categories, such as:

  • Pixel & Photometric Adjustments
  • Example: Changing overall color tone
  • Object-Level Semantics
  • Example: Relocating an object, changing its color
  • Scene Composition
  • Example: Adding a new background
  • Stylistic Transformation
  • Example: Converting a photo into a sketch
  • Plus other transformation types to ensure coverage across editing tasks.

---

Dataset Structure

The dataset includes:

  • Main Set (257K examples)
  • Single-turn text–image–edit prompts.
  • Multi-Turn Editing Subset (72K examples)
  • Sequential modifications supporting reasoning and planning research.
  • Failed Image Subset (56K examples)
  • Negative examples for alignment research and reward model training.
  • Instruction Pairing Subset
  • Pairs long-form and concise instructions to improve rewriting and summarization models.

---

Licensing & Access

  • Hosted by: Apple’s CDN
  • Access: GitHub Repository
  • License:
  • Pico-Banana-400K → CC BY-NC-ND 4.0
  • Open Images source → CC BY 2.0

---

Extending Dataset Use with AiToEarn

AiToEarn官网 is an open-source global AI content monetization platform that integrates:

  • AI content generation
  • Cross-platform publishing (Douyin, Bilibili, Instagram, YouTube, X)
  • Analytics and model rankings (AI模型排名)

By combining Pico-Banana-400K with AiToEarn:

  • Researchers move from model trainingreal-world deployment faster
  • Creators monetize AI outputs efficiently across multiple platforms
  • Organizations streamline publishing and analytics in AI-driven workflows

> Tip: Large-scale datasets like Pico-Banana-400K are ideal for experimentation. When paired with platforms like AiToEarn, projects can shift seamlessly from R&D to audience engagement and monetization.

---

Would you like me to also create a data workflow diagram for the Pico-Banana-400K pipeline so the steps are visually easier to follow?

Read more