Google’s Top AI Surpassed by HKUST Open Source? The Image Editing Powerhouse Making Overseas Creators Shout “King Bomb”

Google’s Top AI Surpassed by HKUST Open Source? The Image Editing Powerhouse Making Overseas Creators Shout “King Bomb”

Driving Multimodal AI Toward True Generalized Creation

image

Introduction — The "Photoshop is Dead" Moment

In recent months, "Photoshop is dead" has become a hot topic in the AI creator community.

The explosive growth of image editing and generation models is challenging professional creative software’s long-standing dominance. Multimodal image technology upgrades from Google’s Nano Banana, ByteDance’s Seedream4.0, and Alibaba’s Qwen-Image-Edit-2509 have introduced entirely new capabilities, including:

  • OOTD outfit styling
  • Text rendering
  • Movie storyboard generation

These tools allow creators to focus on controllability, creativity, and product readiness without mastering complex photo-retouching skills.

image
image

---

From Innovation to Limitations

While models like Nano Banana combine language understanding, visual recognition, and multimodal instruction control for a natural creative process, some limitations remain:

  • Editing prompts can be ambiguous — often needing reference images or extra description
  • Generation tasks excel with concrete objects but struggle with abstract concepts (e.g., hairstyles, textures, lighting, artistic styles)

---

DreamOmni2 — A New Benchmark Model

Two weeks ago, Professor Jia Jiaya’s team at HKUST open-sourced DreamOmni2, addressing these weaknesses. Built on FLUX-Kontext, it maintains existing instruction editing and text-to-image abilities, while adding multi-reference image generation editing for greater flexibility.

Key Advantages:

  • Superior handling of both concrete and abstract concepts
  • Outperforms popular models like Nano Banana in certain scenarios
image

Example — Instruction-Based Editing:

> Change the hat color of the woman in the first image to match the sweater in the second reference image.

image

Example — Instruction-Based Generation:

> Place image 1 on the bedroom wall, change the cup in image 3 to have the same material as the plate in image 2, and place it on the table.

DreamOmni2 quickly gained traction, amassing 1.6k GitHub Stars in just two weeks.

Repository: https://github.com/dvlab-research/DreamOmni2

---

Hands-On Testing — Where DreamOmni2 Excels

We tested DreamOmni2’s instruction-based multimodal editing and compared it with GPT-4o and Nano Banana.

---

Test 1 — Background Replacement

Prompt:

"Replace the background of the panda in picture 1 with picture 2 to generate an ID photo."

image

Result: High-quality, fur detail preserved, professional-level output

image

---

Test 2 — Style Transfer

Prompt:

"Make the first image have the same style as the second image."

image

Result: Accurate tone, mood, and stylistic integration into original image

image

---

Test 3 — Clothing Replacement

Prompt:

"Replace the jacket in the first image with the clothes in the second image."

DreamOmni2:

image

Preserves posture and face; minimal collar differences.

GPT-4o:

image

Face overly smoothed; awkward proportion changes.

Nano Banana:

image

Pose intact, but color/shape altered; logo removed.

---

Test 4 — Instruction-Based Multimodal Generation

Example Prompt:

"Print the logo from image 1 on the object from image 2 and place it on a desk."

image

Result: Accurate logo placement, realistic light/shadow, desk context recognition

image

---

Test 5 — Pose Transfer from Sketch

Prompt:

"Anime image 1 adopts the pose of image 2"

image

Result: Captured pose naturally, smooth transformation from lines to motion

image

---

Test 6 — Necklace on a Cat

Prompt:

"Put the necklace from image 2 on the cat’s neck from image 1"

DreamOmni2:

image

GPT-4o:

image

AI artifacts noticeable

Nano Banana:

image

---

Technical Innovations

1. Three-Stage Data Construction Paradigm

DreamOmni2 overcomes dataset limitations via:

  • Feature Mixing Scheme — Generates high-quality concrete + abstract pairs without resolution loss.
  • Instruction-Based Editing Data — Combines real + generated images to create diverse editing examples.
  • Instruction-Based Generation Data — Integrates multiple reference images into cohesive datasets.
image
image
image

---

2. Multi-Reference Framework Design

  • Uses index encoding with positional offsets to prevent artifacts and confusion between reference images.

---

3. Joint VLM + Generation Model Training

  • LoRA modules activate automatically when reference images are detected.
  • Improves understanding of non-standard, real-world prompts.

---

Conclusion

DreamOmni2 unifies data, framework, and training to leap forward in multimodal generation/editing.

Advancements vs Previous Models:

  • Style transfer and structural reorganization with abstract attribute editing
  • Flexible object + concept combination in generation tasks
  • Open-source — enabling wide adoption and integration

---

Industry Impact

Platforms like AiToEarn bridge AI creation and monetization.

They integrate:

  • AI content generation
  • Cross-platform publishing (Douyin, Kwai, Bilibili, Instagram, YouTube, X, etc.)
  • Analytics and AI model ranking (AI模型排名)

This fits perfectly with the multimodal creative future enabled by DreamOmni2.

---

References:

---

Final Takeaway:

DreamOmni2 isn’t just an upgrade — it’s a new standard for instruction-based multimodal editing & generation, transforming how creators interact with AI.

---

Do you want me to create a side-by-side visual comparison table for DreamOmni2 vs GPT-4o vs Nano Banana results? That would make the performance differences even clearer.

Read more

Leading Investment in Ilya’s New Company, 13-Year Net IRR of 33%: Greenoaks’ Tech Investment Philosophy | [Matrix Low-Key Share]

Leading Investment in Ilya’s New Company, 13-Year Net IRR of 33%: Greenoaks’ Tech Investment Philosophy | [Matrix Low-Key Share]

# Greenoaks Investment Philosophy — Reducing Noise, Not Adding Complexity > **Core belief:** Investing is not about adding complexity — it’s about focusing, cutting noise, and identifying exceptional opportunities early. ![image](https://blog.aitoearn.ai/content/images/2025/11/img_001-156.jpg) --- ## Introduction In 2025, when **Safe Superintelligence (SSI)** — founded by

By Honghao Wang