AI news

Image Generation Simulation: This AI Makes 3D Assets “Ready to Use” for Instant Robot Training

Honghao Wang

23 Nov 2025 — 4 min read

PhysX-Anything: Single-Image to Simulation-Ready 3D Assets

A single photo can now be transformed into simulation-ready 3D assets with realistic physical and joint properties.

(Yes — in the image below, almost all objects are generated by AI.)

---

Why Simulation-Oriented 3D Generation Matters

As 3D modeling expands beyond static visual effects toward physically interactive assets, developers need objects that can be simulated and manipulated directly. These assets are crucial for building the next generation of embodied AI.

Current challenges:

Most existing methods ignore physical and motion characteristics.
Common outputs lack density, absolute scale, and joint constraints, making them unsuitable for integration into robotics or realistic simulators.
Even physics-enabled methods like PhysXGen don't offer plug-and-play compatibility with mainstream physics engines.

---

Introducing PhysX-Anything

Researchers from Nanyang Technological University and the Shanghai Artificial Intelligence Laboratory have developed PhysX-Anything — the first simulation-oriented 3D generation framework with physical properties.

Key features:

Works from a single image.
Generates high-quality 3D assets with explicit geometry, articulated joints, and physical parameters.
Assets are ready for industry-standard simulators and control workflows.

---

How PhysX-Anything Works

1. Coarse-to-Fine Generation Framework

From a real-world image:

Multi-turn dialogue generates
Global physical descriptions.
Component-level geometric information.
Physical representations are decoded into 3D assets in six common formats.

---

2. Novel Token-Efficient 3D Representation

Traditional VLM-based 3D generation struggles with long mesh token sequences. Solutions like 3D VQ-GAN require extra special tokens and increase complexity.

PhysX-Anything's approach:

Voxel-based representation (32×32×32 grid).
Coarse geometry modeled by VLM.
Downstream decoder refines geometry into high-fidelity assets.

---

3. Physical Asset Schema

Tree-like JSON-style format replaces standard URDF.
Captures richer attributes and textual metadata for better VLM reasoning.
Maps key kinematic parameters (e.g., motion range, axis positions) directly into voxel space.

---

4. Model Architecture and Training

Fine-tuned Qwen2.5 on a custom physical 3D dataset.
Multi-turn dialogue ensures accurate global descriptions and local details.

---

AiToEarn Integration

Platforms like AiToEarn官网 complement PhysX-Anything, offering:

Cross-platform publishing to Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter), and more.
Analytics and monetization tools for AI-generated simulation-ready assets.
Open-source ecosystem for long-term scalability.

---

Controllable Fine Detail Generation

Inspired by ControlNet, PhysX-Anything uses:

Coarse voxel guidance to steer fine-grained diffusion model generation.
Structured latent diffusion decoding to produce multiple formats:
Mesh surfaces
Radiance fields
3D Gaussians
Nearest neighbor segmentation to split meshes into part-level components.
Combine global structure + fine geometry to output URDF, XML, and meshes ready for simulation.

---

Benchmarking Against State-of-the-Art

Compared with URDFormer, Articulate-Anything, and PhysXGen, PhysX-Anything showed:

Best scores in geometry and physical metrics.
Lowest absolute scale error.
Most coherent part-level textual descriptions.

---

Qualitative Advantages

Strong generalization ability beyond retrieval-based methods.
More credible and reasonable physical properties than PhysXGen.

---

Evaluation

VLM-Based Tests

Real-world everyday images used.
Focused on geometry + joint motion quality.
Significant superiority in kinematic accuracy and generalization.

---

Human Judgments

Volunteers scored generated structures on geometry and physical realism.
PhysX-Anything ranked highest in both categories.

---

Real-World Visualization

PhysX-Anything produces:

Accurate geometry.
Realistic joint motion.
Plausible physical attributes.

---

Downstream Simulation Tests

Using MuJoCo-style simulator, generated assets like faucets, cabinets, lighters, and glasses were:

Directly imported into simulations.
Applied successfully to robotic policy learning.

---

Key Contributions

First simulation-oriented physical 3D generation paradigm.
Unified VLM pipeline + custom 3D representation → 193× token compression.
New dataset PhysX-Mobility with 47 categories & rich annotations.
Proven generalization and simulation readiness in robotics tasks.

---

Impact

PhysX-Anything marks a paradigm shift from “visual modeling” to “physical modeling”, unlocking new directions in:

3D vision
Embodied intelligence
Robotics research

---

Video: https://mp.weixin.qq.com/s/gUooZUSc1yWQlf4NpViZrA

Paper: https://arxiv.org/abs/2511.13648

Project: hthttps://physx-anything.github.io/

GitHub: https://github.com/ziangcao0312/PhysX-Anything

Authors:

First author: Cao Ziang, NTU PhD student (Computer Vision, 3D AIGC, Embodied Intelligence)
Collaborators: Hong Fangzhou, Chen Zhaoxi (NTU), Pan Liang (Shanghai AI Lab)
Corresponding author: Professor Liu Ziwei (NTU)

---

I’ve grouped and enhanced the content for readability, but if you’d like I can make you an additional “quick takeaway” cheat sheet that condenses PhysX-Anything’s core concepts and workflow into one diagram + bullet list. Do you want me to prepare that?