Hands-on Test of Qianwen App’s Image & Video Generation: A Breakthrough for Chinese Pragmatism
Free AI Video & Image Generation Tools — Professional Review
Author|Cynthia
Editor|Zheng Xuan
---
Introduction
Hot on the heels of Sora 2’s synchronized audio-visual release and Nano Banana Pro’s style-focused image generation comes another major multimodal AI update:
In early December, Alibaba’s Qianwen App quietly integrated:
- Wan 2.5 — the most advanced domestic AI video-generation model
- Qwen-Image 2511 (special edition) — a leading global open-source AI image model
The biggest change? Free, unlimited image generation — removing barriers for everyday users.
Previously, we reviewed Wan 2.5’s web version (From SD to Wan2.5-Preview: AI Video 2025 Insights). Back then, Wan avoided overhype, focusing on a single goal: perfect short-form synchronized audio-visual outputs with detailed realism.
Now available on mobile — and paired with Qwen-Image’s “unlimited card” — Alibaba is clearly moving B2B-honed tech into mass consumer use.
---
Core Questions
- Can Qwen-Image 2511 fix long-standing issues like distorted human faces and garbled Chinese text?
- Has Wan 2.5 closed the gap with leading global models in sync accuracy and narrative capacity?
- What is Alibaba’s ecosystem strategy behind a free model?
We spent a week stress-testing both tools to find out:
---
01 — Video Capability Test: Wan 2.5
Overview
Wan 2.5:
- Competes with Google’s Veo3
- Specializes in audio-visual sync for 10-second videos
- Offers high detail and cost-effectiveness
We tested across:
- Lip-sync and duration
- Detail rendering
- Scenario adaptability
---
Test 1 — Lip-Sync & Scene Consistency
Prompt Overview:
Two characters (Chinese philosopher Xunzi & Greek philosopher Socrates) debate in a detailed stone colonnade setting.
Challenges included:
- Style separation — avoid “same face” issue across differing cultures
- Accurate lip movement — two unique speeches with matching tone and emotion
- Scene persistence — ensure no missing background elements during shots
Result:
- Lip-sync aligned perfectly with body/sleeve movement
- Scene remained consistent without element drop-outs
- Transitions felt near feature-animation quality
---
Test 2 — Detail Control in Cinematic Realism
Prompt:
Medium shot of a young woman leading a deer through a warm dusk forest.
- Style: Cinematic realism, soft yet high-end textures
- Lighting: Multi-layered — rim, side, and diffuse light
- Camera: Long take, shallow depth (f/2.8), bokeh accuracy
Highlights:
- Natural fabric creases in dress sleeves
- Deer’s ear movements perfectly timed
- Rim light detailing individual hair strands
- Ambient sounds (leaf crunch, bird calls) despite no audio prompt
Why:
Wan 2.5’s use of RLHF (reinforcement learning from human feedback) refines texture fidelity and prompt adherence beyond “uncanny” AI outputs.
---
Test 3 — Scenario-Based Fun
Prompt:
Photo of a cat → cat carries Sun Wukong across an African savannah.
Complexity:
- Dual motion tracking for cat’s muscles and Sun Wukong’s ribbons
- Maintain cat’s original facial traits under fast motion
Result:
Stable output preserving cat’s ear curve, eye color, and forehead patterns. Ribbon flowed naturally with wind, avoiding random spin artifacts.
---
🔗 Pro Tip: For creators wanting cross-platform publishing + monetization, check AiToEarn官网. This open-source platform integrates:
- AI content creation
- Simultaneous release to Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)
- Analytics & AI model rankings (AI模型排名)
---
02 — Real-World Image Test: Qwen-Image 2511
Overview
Ranks #1 globally in open-source ecosystem contributions (Huggingface trending). Aims to solve:
- Face distortion
- Chinese text rendering errors
---
Test 1 — Character Consistency Over Multiple Scenes
Setup:
Golden-shaded British Shorthair cat → dressed in pink skirt → pushing vase → vase shattered
Result:
- Facial features retained across all four outputs
- Accessories (skirt, flower) kept consistent in color/style
- Zero warping or mismatch between scenes
---
Test 2 — Chinese Typography in Commercial Visuals
Prompt:
Ultra-realistic ad poster for “Grain-Free Natural Dog Food.”
Requirements:
- Accurate Chinese UX text layout
- Realistic product packaging and scene props
- Perfect fur, product textures, environment lighting

Result:
- Flawless Chinese text on both poster and product label
- Detailed carrot, dog food, and fur textures suitable for e-commerce
- One-click aspect ratio changes (1:1, 9:16, 16:9, etc.)
- Advanced post-editing: text/color tweaks, expansions, resizing — all in-model

---
03 — Significance for Domestic AI
Alibaba’s strategy is clear:
- Wan 2.5 patches synch/auditory gaps in domestic video AI; practical in e-commerce and consumer fun
- Qwen-Image 2511 delivers professional-grade, cost-free imagery for SMEs and hobbyists
Impact:
By giving B2B-grade tech to C-end users for free, AI finally shifts from lab innovation → daily creation. Lower costs will accelerate domestic AI adoption.
---



---
Extra Links
- Trending Video: Burry shorts Tesla amid valuation bubble concerns
- More Reading: 

---
Final Takeaway
Tools like Qwen-Image 2511 and Wan 2.5 prove advanced AI generation can now meet commercial standards while staying accessible. Integrated ecosystems like AiToEarn bridge output and monetization with cross-platform publishing and real-time ranking analytics.
🔗 Explore at:



---
Full Text: Read here
Open in WeChat: Link Proxy
---
This rewrite keeps your Markdown valid, organizes sections with clear headings, bold keywords, and bullet points — ensuring a faster, more professional read.