VLA models

Dexmal ForceLing Opensources Dexbotic: A “Transformers” Library for Embodied Intelligence

Honghao Wang

22 Oct 2025 — 4 min read

📦 One-Stop Open-Source VLA Toolbox – Dexbotic

The Challenge of Real-World AI Actions

AI can now write code and create art — but why is something as simple as asking an AI to twist open a bottle cap still so difficult?

Opening a bottle cap requires three abilities working together in real time:

Eyes – Vision (V)
Recognize the object: Which one is the bottle? Where is its cap? What’s the cap’s texture?
Brain – Language (L)
Understand instructions: What does “twist open the cap” mean? Clockwise or counterclockwise? How much force?
Body – Action (A)
Execute precisely: At what angle should fingers grip the cap? How much torque should be applied?

These three capabilities must work seamlessly together — that’s the job of the VLA (Vision–Language–Action) model, the core of embodied intelligence.

---

The State of VLA Research – Like Deep Learning in 2015

Modern VLA research feels like deep learning circa 2015:

Rapid algorithmic innovation (e.g., OpenVLA, RT-2, Pi0)
But engineering environments are fragmented

Problems faced by researchers:

Multiple environments: PyTorch + LLaMA2 here, JAX + PaLI there, TensorFlow + custom VLM elsewhere
Inconsistent datasets and loading scripts
Unfair benchmark settings — different training epochs, learning rates, architectures
Difficulty upgrading models: Codebases deeply tied to outdated VLMs (usually LLaMA2) instead of newer versions like Qwen2.5 or LLaMA3

The result: a lot of duplicated engineering work.

---

Lessons from NLP & CV Tooling

Other AI fields handle standardization better:

Frameworks like PyTorch, TensorFlow
Toolkits like MMDetection (CV) & Transformers (NLP)

Example — load BERT in 3 lines without touching internals:

from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base")

---

Introducing Dexbotic – An Open-Source VLA Framework

Dexbotic by Dexmal 原力灵机 is:

Open-source
PyTorch-based
Designed to end “reinvent-the-wheel” VLA research

Previously, Dexmal released RoboChallenge — the world’s first large-scale real-robot benchmark platform.

Now, Dexbotic aims to solve the “lack of training standards” problem.

---

🚀 What Dexbotic Delivers

Three core improvements:

Unified Framework – run multiple mainstream VLA algorithms in one environment
Unified Data Format (Dexdata) – standard media + metadata storage
Stronger Pretrained Models – latest architecture (Qwen2.5-based)

---

1️⃣ Unified Framework

Supports Pi0, OpenVLA-OFT, CogACT — all switchable via one line:

class MyExp(BaseExp):
    model = "CogACT"  # Switch from Pi0 to CogACT

Model abstraction:

Vision-Language Model (VLM) – perception + understanding
Action Expert – execution strategy

Benefit: Swap VLM or Action Expert without rewriting pipelines.

---

2️⃣ Unified Data Format: Dexdata

Consistency eliminates loader headaches:

MP4 for videos
JSONL for per-frame metadata (robot states, instructions)

Advantages:

Compact storage via video compression (cuts size by >50%)
Train any supported VLA algorithm without format conversions

---

3️⃣ Stronger Pretrained Models

Built from scratch on Qwen2.5 → DexboticVLM

Retrained Pi0, CogACT, OpenVLA-OFT, MemoryVLA → better benchmarks

Example – SimplerEnv Tasks:

CogACT: 51.3% → 69.5% (+18%)
OFT: 30.2% → 76.4% (+46%)

Example – CALVIN Long-Horizon Tasks:

CogACT: 3.25 steps → 4.06 steps (+25%)

---

🌍 Simulation vs Real Robots

Real robot tests on UR5e, Franka, ALOHA, ARX5:

High success rate tasks:

UR5e placing plates – 100%
ALOHA stacking bowls – 90%
ARX5 searching green box – 80%

Low success rate tasks:

Paper tearing, fries pouring – 20–40%

Reason: Physical factors like friction & deformation.

---

Three-Layer Architecture

Data Layer – standard formats
Model Layer – supports all major VLAs
Experiment Layer – minimal researcher friction

Cloud + Local Ready:

Alibaba Cloud PAI, Volcano Engine for large-scale training
Local GPU ready — single RTX 4090 runs most models

---

Experiment-Centric Config System

No more YAML hassle — use Python class inheritance:

# Base config
class BaseExp:
    model = "DexboticVLM"
    lr = 1e-4
    epochs = 100

# Experiment override
class MyExp(BaseExp):
    lr = 5e-5  # Change only what’s needed

Follows Open–Closed Principle — safe, targeted changes.

---

🔮 Designed for Future "Full-Body Control"

Current split:

Manipulation – robotic arms
Navigation – movement

Dexbotic supports both:

Manipulation: Pi0, CogACT, OpenVLA-OFT
Navigation: MUVLA

Vision: Unified training for robots that can walk and work.

---

⚙️ Open Source Hardware – DOS-W1

Low-cost open robotic arms:

Consumer-grade motors/sensors
Public design files → accessible for labs

Paired with:

Dexbotic (software brain)
RoboChallenge (testing arena)

---

📅 Announcement

On October 23, 19:00, Dexmal founding member livestream on Dexbotic.

Scan QR in image:

---

For AI-centric content sharing and monetization:

AiToEarn官网 integrates AI creation, distribution & analytics
Publishes across Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X
AiToEarn文档 enables research demo publishing + monetization → ideal for robotics/open-source workflows

Similar to how Dexbotic standardizes VLA research, AiToEarn standardizes AI content production pipelines — enabling researchers to share VLA experiments, benchmarks, and demos globally within minutes.

---

Summary:

Dexbotic =

Unified Framework
Unified Data Format
Latest Pretrained Models
Cloud + Local Ready
Future-Proof Architecture

> Links:

> 🔗 Official Site

> 🔗 Tech Report

> 🔗 GitHub Repo

> 🔗 Hugging Face Models

---

Would you like me to add a comparison table showing how Dexbotic addresses common VLA pain points? That could make the rewritten Markdown even more useful for researchers.

Dexmal ForceLing Opensources Dexbotic: A “Transformers” Library for Embodied Intelligence

Honghao Wang

📦 One-Stop Open-Source VLA Toolbox – Dexbotic

The Challenge of Real-World AI Actions

The State of VLA Research – Like Deep Learning in 2015

Lessons from NLP & CV Tooling

Introducing Dexbotic – An Open-Source VLA Framework

🚀 What Dexbotic Delivers

1️⃣ Unified Framework

Benefit: Swap VLM or Action Expert without rewriting pipelines.

2️⃣ Unified Data Format: Dexdata

3️⃣ Stronger Pretrained Models

Example – SimplerEnv Tasks:

Example – CALVIN Long-Horizon Tasks:

🌍 Simulation vs Real Robots

Three-Layer Architecture

Experiment-Centric Config System

🔮 Designed for Future "Full-Body Control"

Current split:

⚙️ Open Source Hardware – DOS-W1

📅 Announcement

Read more

These College Students Are Helping OPPO Build AI Products

Ilya’s Shocking Testimony: Altman’s Wrongdoing, Mira’s Drama, and OpenAI’s Near-Merger with Anthropic

Reasons Against pgvector: Technical Challenges at Scale

Elimination Game’s New Innovative Gameplay Hits $1M Monthly Revenue in 70 Days

📦 One-Stop Open-Source VLA Toolbox – Dexbotic

The Challenge of Real-World AI Actions

The State of VLA Research – Like Deep Learning in 2015

Lessons from NLP & CV Tooling

Introducing Dexbotic – An Open-Source VLA Framework

🚀 What Dexbotic Delivers

1️⃣ Unified Framework

Benefit: Swap VLM or Action Expert without rewriting pipelines.

2️⃣ Unified Data Format: Dexdata

3️⃣ Stronger Pretrained Models

Example – SimplerEnv Tasks:

Example – CALVIN Long-Horizon Tasks:

🌍 Simulation vs Real Robots

Three-Layer Architecture

Experiment-Centric Config System

🔮 Designed for Future "Full-Body Control"

Current split:

⚙️ Open Source Hardware – DOS-W1

📅 Announcement

📢 Related Ecosystem – AiToEarn

Read more

These College Students Are Helping OPPO Build AI Products

Ilya’s Shocking Testimony: Altman’s Wrongdoing, Mira’s Drama, and OpenAI’s Near-Merger with Anthropic

Reasons Against pgvector: Technical Challenges at Scale

Elimination Game’s New Innovative Gameplay Hits $1M Monthly Revenue in 70 Days