Meituan’s New Standalone App — You Can’t Order Food, Only AI

Meituan’s New Standalone App — You Can’t Order Food, Only AI

Meituan’s “Food Delivery Speed” Approach to AI Models

Meituan appears to be embracing a “fast and stable” AI development mantra — rapidly releasing new models while maintaining high performance. Over the past two months, multiple models have been launched, and the momentum continues.

---

Introducing LongCat‑Flash‑Omni

The newest release — LongCat‑Flash‑Omni — is multi‑modal (“Omni” meaning all‑capable) and supports text, image, audio, and video inputs.

image

Benchmark Achievements

  • Comprehensive benchmarks (Omni‑Bench, WorldSense): Surpasses Qwen3‑Omni & Gemini‑2.5‑Flash.
  • Matches closed‑source Gemini‑2.5‑Pro in performance.
  • Top-tier rankings for each individual modality.
image

---

Speed: The Flash Advantage

Inherited from the LongCat‑Flash series:

  • 560B total parameters, only 27B active (MoE architecture).
  • Large knowledge base + efficient inference.
image

This makes it the first open‑source model capable of full‑modal real‑time interaction at mainstream flagship specs.

---

Public Access

  • Available now on LongCat app & web — free to try.
  • Supports text/voice input, voice calls; web allows image/file uploads.
image

---

Hands‑On Tests

“AI Counting Sheep”

image

Video: link

Video Call Beta

Tested with a perfume bottle — successful visual recognition and Q&A.

image

Physics Simulation Prompt

> Show a ball bouncing inside a rotating hexagon, affected by gravity and friction, with realistic wall collisions.

Result — code + visualization:

image

Meme Recognition

image

Answer: Duck + New Year’s Money (homophone pun)

image

Audio Noise Test

image

Audio: link

Passed — accurate speech extraction despite noise.

image

---

Performance Summary

Fast + Stable:

  • Instant response even for complex multimodal tasks.
  • Excels at deep reasoning, conversation, and recognition.

Genealogy:

  • Descendant of LongCat‑Flash‑Chat (speed‑optimized)
  • Descendant of LongCat‑Flash‑Thinking (professional depth)
image

---

Meituan’s Iteration Logic

  • Speed First — responsive, smooth model interactions.
  • Specialization — deep reasoning, physics simulation, robust speech recognition.
  • Full Modal Expansion — text, audio, visual integration; future image/video generation.

---

Generates long videos (~5 minutes) with stable output.

Video: link

image

---

Technical Challenges in Multimodal AI

  • Fusion Difficulty — different modalities need careful integration.
  • Offline vs. Streaming — differing processing logic.
  • Real‑Time Performance — latency and lag issues.
  • Training Efficiency — large multimodal data volumes slow training.

---

Architectural Innovations — ScMoE

image

Features:

  • End‑to‑end unified architecture.
  • Accepts text, audio, images, and video simultaneously.

---

Real‑Time Interaction Layer

  • Segmented audio‑video feature interleaving — sync by time segments; low latency in speech & visuals.

Training Strategy

  • Progressive multi‑modal fusion — start with text, add audio, then visuals.
  • Context expansion to 128K tokens — supports >8 minutes of multimodal interaction.
  • Modality‑Decoupled Parallel (MDP) training — independent optimization of LLM & encoders.

---

Achievements

LongCat‑Flash‑Omni combines:

  • Full‑modality coverage
  • Efficient inference
  • Real‑time experience comparable to closed‑source, but open‑source.

---

Meituan’s Broader Strategy

image

Two‑Pronged Approach:

  • Software — building a “world model”.
  • Hardware — accelerating embodied intelligence deployment.

Timeline Highlights

  • 2018–2020 — investments in local life services & consumer brands.
  • 2021 — shift to “Retail + Technology”, growth in robotics/autonomous driving.
  • 2022 onward — autonomous driving, AI chips, embodied robotics.
image

---

Embodied Intelligence Vision

image

Emphasis on Autonomy:

  • Drones, autonomous vehicles, indoor delivery robots.
  • Nationwide approval for drone ops — including nighttime flights.

Goal: Retail is the scenario, Technology is the enabler.

---

Future Direction

Connecting bits to atoms — using AI brains + robotic bodies to merge digital and physical worlds.

---

Try LongCat‑Flash‑Omni

  • LongCat Chat: https://longcat.ai
  • Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Omni
  • GitHub: https://github.com/meituan-longcat/LongCat-Flash-Omni

---

Note for Creators

Platforms like AiToEarn官网 let creators:

  • Generate AI content
  • Publish across multi‑channels (Douyin, Kwai, Bilibili, YouTube, Instagram, etc.)
  • Track analytics and rankings (AI模型排名)

Docs: AiToEarn文档

---

Bottom Line: LongCat‑Flash‑Omni is a fast, stable, fully multimodal open‑source model — demonstrating Meituan’s unique speed‑first, specialization, full‑coverage strategy in AI, and hinting at a larger ambition: to connect the virtual and physical worlds seamlessly.

Read more