Using AI to “Unlock” the Financial Market Black Box: How Microsoft Research Asia Built an Order-Level Simulation Engine

Using AI to “Unlock” the Financial Market Black Box: How Microsoft Research Asia Built an Order-Level Simulation Engine

AI for Finance: Building Deterministic Market Simulation with Foundation Models & Agents

In the era of large AI models and intelligent Agents, the financial industry is undergoing a profound transformation — from investment decision-making to market simulation. At the AICon Global Artificial Intelligence Development and Application Conference, Liu Weiqing, Principal Researcher at Microsoft Research Asia’s Machine Learning Group, presented:

> MarS: A Financial Market Simulation Engine Driven by Generative Foundation Models

Their work focuses on leveraging order-level native financial data and an automated iterative Agent workflow to achieve high-fidelity market simulations and efficient decision optimization.

---

Why MSRA Invests in AI for Finance

Nine years ago — around the time of AlphaGo’s debut — MSRA launched the AI for Industry initiative to apply AI across high-impact domains. Finance became a focal area for several reasons:

  • Gap Between Theory and Practice: Algorithms in academic papers often fail to reflect real-world constraints.
  • Need for Tools & Frameworks: A tool-driven approach ensures research outcomes match operational performance.
  • Absence of Suitable Open-Source Solutions: Motivated the creation of Qlib, evolving from supervised learning to reinforcement learning, meta learning, and now Agent-based automated workflows.

---

Large Models, Domain Data & Decision Certainty

Challenges in Financial AI

  • Financial data is structured, domain-specific, and often non-linguistic, making it ill-suited to pure NLP fine-tuning.
  • LLM-based decision Agents tend to be non-deterministic — unacceptable in finance, where identical inputs must yield identical outputs.

MSRA’s Goal: Build relatively deterministic Agents capable of self-iteration while integrating finance-specific evaluation metrics.

---

Approach Overview

Two Key Efforts:

  • Foundation Model — Trained using domain-native data.
  • Iterative Agent Workflow — Code-driven automation for stronger determinism.

---

Modeling Native Financial Data

Order-Level Market Data

  • Captures microstructure beyond price-level trends.
  • Exhibits scaling law effects similar to LLMs.

Outcome

  • Realistic, controllable order-generation model.
  • A digital twin financial market platform enabling dynamic, scenario-based evaluation.

---

Agent-Based Automated Iteration

Using Code as the core:

  • Model-generated code is directly executable.
  • Training scripts can be produced to build deep models, integrated back into the workflow.
  • Iterations proceed until optimal reproducible results are achieved.

---

Quant Research Automation: Qlib + R&D-Agent

Components

  • Qlib — Open-source quantitative research platform.
  • R&D-Agent — Automates iterative workflows.

Two Agent Roles

  • Research Agent — Generates high-quality strategies and ideas.
  • Development Agent — Implements and optimizes engineering execution.

Feedback Loops:

  • Engineering feedback — Bugs, training time, resource usage.
  • Performance feedback — Strategy effectiveness, model metrics.

---

Results in Quantitative Research

  • Fully automated 52 iterations in ~18 hours.
  • Surpassed expert-designed baselines in four evaluation metrics.

---

Large Market Model (LMM)

Traditional Approach Limitations

  • Relies on abstract features/factors, missing hidden market information.
  • Treats market as a black box.

LMM Approach

  • Models each market order at finest granularity.
  • Converts order data into video-like sequences for CV-based modeling.

---

Model Training & Scaling Laws

  • Tokenization of individual orders.
  • Orders aggregated into minute-level groups.
  • Transformer architecture shows clear scaling law effects.

---

Applications Beyond Next-Token Prediction

  • Predictive simulations at order level.
  • Monte Carlo-style rollouts for future indicator forecasting.
  • Superior minute-level predictions compared to supervised baselines.

---

Deployment Challenges & Efficiency Optimization

Main Bottleneck:

  • Order-by-order auto-regression slows generation.

Solution:

  • Optimized modeling reduced rollout time from 15 minutes to ~1 minute, paving the way for real-world use.

---

Digital Twin Market Simulation

Advantages Over Traditional Models

  • Simulates order-level behavior, not just prices.
  • Enables study of rare/high-risk events (e.g., “Golden Finger” incident).

---

Interactive & Controllable Interfaces

  • Order Interaction Interface: Submit orders & observe impact.
  • Scenario Control Interface: Generate market conditions (bull/bear) on demand.

Platform: MarS System — merges LMM with interfaces, allowing Agent-based multi-round optimization.

---

Order Generation Model Types

  • Order-Group (Minute-Level) Generation
  • Order-Level Generation

Combined, these achieve:

  • Scenario fitting
  • Real-time user interaction response

---

Natural Language Control Signals

  • User describes target scenario (e.g., downturn).
  • System generates code to locate historical patterns and guide future order flow simulation.

---

Validating Generated Data

  • Measured with 11 financial indicators → strong statistical alignment with real markets.
  • Macro-level formula (√q / v) emerges from micro-simulation, even without direct training on it.

---

Broader Applications & Integration

While targeted at finance, this world-model + iterative Agent paradigm applies to:

  • Healthcare
  • Industrial optimization
  • Creative industries

Example: AiToEarn官网 — integrates AI model generation, cross-platform publishing, analytics, and monetization.

---

References

---

Would you like me to also create a clean visual diagram summarizing the Foundation Model + Agent Workflow architecture so readers grasp the workflow faster? That could significantly improve readability and retention.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.