Dexmal ForceLing Opensources Dexbotic: A “Transformers” Library for Embodied Intelligence
📦 One-Stop Open-Source VLA Toolbox – Dexbotic


The Challenge of Real-World AI Actions
AI can now write code and create art — but why is something as simple as asking an AI to twist open a bottle cap still so difficult?
Opening a bottle cap requires three abilities working together in real time:
- Eyes – Vision (V)
- Recognize the object: Which one is the bottle? Where is its cap? What’s the cap’s texture?
- Brain – Language (L)
- Understand instructions: What does “twist open the cap” mean? Clockwise or counterclockwise? How much force?
- Body – Action (A)
- Execute precisely: At what angle should fingers grip the cap? How much torque should be applied?
These three capabilities must work seamlessly together — that’s the job of the VLA (Vision–Language–Action) model, the core of embodied intelligence.
---
The State of VLA Research – Like Deep Learning in 2015
Modern VLA research feels like deep learning circa 2015:
- Rapid algorithmic innovation (e.g., OpenVLA, RT-2, Pi0)
- But engineering environments are fragmented
Problems faced by researchers:
- Multiple environments: PyTorch + LLaMA2 here, JAX + PaLI there, TensorFlow + custom VLM elsewhere
- Inconsistent datasets and loading scripts
- Unfair benchmark settings — different training epochs, learning rates, architectures
- Difficulty upgrading models: Codebases deeply tied to outdated VLMs (usually LLaMA2) instead of newer versions like Qwen2.5 or LLaMA3
The result: a lot of duplicated engineering work.
---
Lessons from NLP & CV Tooling
Other AI fields handle standardization better:
- Frameworks like PyTorch, TensorFlow
- Toolkits like MMDetection (CV) & Transformers (NLP)
Example — load BERT in 3 lines without touching internals:
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base")---
Introducing Dexbotic – An Open-Source VLA Framework
Dexbotic by Dexmal 原力灵机 is:
- Open-source
- PyTorch-based
- Designed to end “reinvent-the-wheel” VLA research
Previously, Dexmal released RoboChallenge — the world’s first large-scale real-robot benchmark platform.
Now, Dexbotic aims to solve the “lack of training standards” problem.
---
🚀 What Dexbotic Delivers
Three core improvements:
- Unified Framework – run multiple mainstream VLA algorithms in one environment
- Unified Data Format (Dexdata) – standard media + metadata storage
- Stronger Pretrained Models – latest architecture (Qwen2.5-based)
---
1️⃣ Unified Framework
Supports Pi0, OpenVLA-OFT, CogACT — all switchable via one line:
class MyExp(BaseExp):
model = "CogACT" # Switch from Pi0 to CogACTModel abstraction:
- Vision-Language Model (VLM) – perception + understanding
- Action Expert – execution strategy
Benefit: Swap VLM or Action Expert without rewriting pipelines.
---
2️⃣ Unified Data Format: Dexdata
Consistency eliminates loader headaches:
- MP4 for videos
- JSONL for per-frame metadata (robot states, instructions)
Advantages:
- Compact storage via video compression (cuts size by >50%)
- Train any supported VLA algorithm without format conversions
---
3️⃣ Stronger Pretrained Models
Built from scratch on Qwen2.5 → DexboticVLM
Retrained Pi0, CogACT, OpenVLA-OFT, MemoryVLA → better benchmarks
Example – SimplerEnv Tasks:
- CogACT: 51.3% → 69.5% (+18%)
- OFT: 30.2% → 76.4% (+46%)
Example – CALVIN Long-Horizon Tasks:
- CogACT: 3.25 steps → 4.06 steps (+25%)
---
🌍 Simulation vs Real Robots
Real robot tests on UR5e, Franka, ALOHA, ARX5:
High success rate tasks:
- UR5e placing plates – 100%
- ALOHA stacking bowls – 90%
- ARX5 searching green box – 80%
Low success rate tasks:
- Paper tearing, fries pouring – 20–40%
Reason: Physical factors like friction & deformation.
---
Three-Layer Architecture
- Data Layer – standard formats
- Model Layer – supports all major VLAs
- Experiment Layer – minimal researcher friction
Cloud + Local Ready:
- Alibaba Cloud PAI, Volcano Engine for large-scale training
- Local GPU ready — single RTX 4090 runs most models
---
Experiment-Centric Config System
No more YAML hassle — use Python class inheritance:
# Base config
class BaseExp:
model = "DexboticVLM"
lr = 1e-4
epochs = 100
# Experiment override
class MyExp(BaseExp):
lr = 5e-5 # Change only what’s neededFollows Open–Closed Principle — safe, targeted changes.
---
🔮 Designed for Future "Full-Body Control"
Current split:
- Manipulation – robotic arms
- Navigation – movement
Dexbotic supports both:
- Manipulation: Pi0, CogACT, OpenVLA-OFT
- Navigation: MUVLA
Vision: Unified training for robots that can walk and work.
---
⚙️ Open Source Hardware – DOS-W1
Low-cost open robotic arms:
- Consumer-grade motors/sensors
- Public design files → accessible for labs
Paired with:
- Dexbotic (software brain)
- RoboChallenge (testing arena)
---
📅 Announcement
On October 23, 19:00, Dexmal founding member livestream on Dexbotic.
Scan QR in image:




---
📢 Related Ecosystem – AiToEarn
For AI-centric content sharing and monetization:
- AiToEarn官网 integrates AI creation, distribution & analytics
- Publishes across Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X
- AiToEarn文档 enables research demo publishing + monetization → ideal for robotics/open-source workflows
Similar to how Dexbotic standardizes VLA research, AiToEarn standardizes AI content production pipelines — enabling researchers to share VLA experiments, benchmarks, and demos globally within minutes.
---
Summary:
Dexbotic =
- Unified Framework
- Unified Data Format
- Latest Pretrained Models
- Cloud + Local Ready
- Future-Proof Architecture
> Links:
> 🔗 Official Site
> 🔗 Tech Report
> 🔗 GitHub Repo
---
Would you like me to add a comparison table showing how Dexbotic addresses common VLA pain points? That could make the rewritten Markdown even more useful for researchers.