Using AI to “Unlock” the Financial Market Black Box: How Microsoft Research Asia Built an Order-Level Simulation Engine
AI for Finance: Building Deterministic Market Simulation with Foundation Models & Agents
In the era of large AI models and intelligent Agents, the financial industry is undergoing a profound transformation — from investment decision-making to market simulation. At the AICon Global Artificial Intelligence Development and Application Conference, Liu Weiqing, Principal Researcher at Microsoft Research Asia’s Machine Learning Group, presented:
> MarS: A Financial Market Simulation Engine Driven by Generative Foundation Models
Their work focuses on leveraging order-level native financial data and an automated iterative Agent workflow to achieve high-fidelity market simulations and efficient decision optimization.
---
Why MSRA Invests in AI for Finance
Nine years ago — around the time of AlphaGo’s debut — MSRA launched the AI for Industry initiative to apply AI across high-impact domains. Finance became a focal area for several reasons:
- Gap Between Theory and Practice: Algorithms in academic papers often fail to reflect real-world constraints.
- Need for Tools & Frameworks: A tool-driven approach ensures research outcomes match operational performance.
- Absence of Suitable Open-Source Solutions: Motivated the creation of Qlib, evolving from supervised learning to reinforcement learning, meta learning, and now Agent-based automated workflows.
---
Large Models, Domain Data & Decision Certainty
Challenges in Financial AI
- Financial data is structured, domain-specific, and often non-linguistic, making it ill-suited to pure NLP fine-tuning.
- LLM-based decision Agents tend to be non-deterministic — unacceptable in finance, where identical inputs must yield identical outputs.
MSRA’s Goal: Build relatively deterministic Agents capable of self-iteration while integrating finance-specific evaluation metrics.
---
Approach Overview
Two Key Efforts:
- Foundation Model — Trained using domain-native data.
- Iterative Agent Workflow — Code-driven automation for stronger determinism.
---
Modeling Native Financial Data
Order-Level Market Data
- Captures microstructure beyond price-level trends.
- Exhibits scaling law effects similar to LLMs.
Outcome
- Realistic, controllable order-generation model.
- A digital twin financial market platform enabling dynamic, scenario-based evaluation.
---
Agent-Based Automated Iteration
Using Code as the core:
- Model-generated code is directly executable.
- Training scripts can be produced to build deep models, integrated back into the workflow.
- Iterations proceed until optimal reproducible results are achieved.
---
Quant Research Automation: Qlib + R&D-Agent
Components
- Qlib — Open-source quantitative research platform.
- R&D-Agent — Automates iterative workflows.
Two Agent Roles
- Research Agent — Generates high-quality strategies and ideas.
- Development Agent — Implements and optimizes engineering execution.
Feedback Loops:
- Engineering feedback — Bugs, training time, resource usage.
- Performance feedback — Strategy effectiveness, model metrics.
---
Results in Quantitative Research
- Fully automated 52 iterations in ~18 hours.
- Surpassed expert-designed baselines in four evaluation metrics.
---
Large Market Model (LMM)
Traditional Approach Limitations
- Relies on abstract features/factors, missing hidden market information.
- Treats market as a black box.
LMM Approach
- Models each market order at finest granularity.
- Converts order data into video-like sequences for CV-based modeling.
---
Model Training & Scaling Laws
- Tokenization of individual orders.
- Orders aggregated into minute-level groups.
- Transformer architecture shows clear scaling law effects.
---
Applications Beyond Next-Token Prediction
- Predictive simulations at order level.
- Monte Carlo-style rollouts for future indicator forecasting.
- Superior minute-level predictions compared to supervised baselines.
---
Deployment Challenges & Efficiency Optimization
Main Bottleneck:
- Order-by-order auto-regression slows generation.
Solution:
- Optimized modeling reduced rollout time from 15 minutes to ~1 minute, paving the way for real-world use.
---
Digital Twin Market Simulation
Advantages Over Traditional Models
- Simulates order-level behavior, not just prices.
- Enables study of rare/high-risk events (e.g., “Golden Finger” incident).
---
Interactive & Controllable Interfaces
- Order Interaction Interface: Submit orders & observe impact.
- Scenario Control Interface: Generate market conditions (bull/bear) on demand.
Platform: MarS System — merges LMM with interfaces, allowing Agent-based multi-round optimization.
---
Order Generation Model Types
- Order-Group (Minute-Level) Generation
- Order-Level Generation
Combined, these achieve:
- Scenario fitting
- Real-time user interaction response
---
Natural Language Control Signals
- User describes target scenario (e.g., downturn).
- System generates code to locate historical patterns and guide future order flow simulation.
---
Validating Generated Data
- Measured with 11 financial indicators → strong statistical alignment with real markets.
- Macro-level formula (√q / v) emerges from micro-simulation, even without direct training on it.
---
Broader Applications & Integration
While targeted at finance, this world-model + iterative Agent paradigm applies to:
- Healthcare
- Industrial optimization
- Creative industries
Example: AiToEarn官网 — integrates AI model generation, cross-platform publishing, analytics, and monetization.
---
References
- Qlib: https://github.com/microsoft/qlib
- R&D-Agent: https://github.com/microsoft/rd-Agent
- MarS: https://github.com/microsoft/mars
---
Would you like me to also create a clean visual diagram summarizing the Foundation Model + Agent Workflow architecture so readers grasp the workflow faster? That could significantly improve readability and retention.