big data

Save 20+ Person-Months! Huolala’s Offline Big Data Cross-Cloud Migration Final Review

Honghao Wang

22 Oct 2025 — 4 min read

1. Preface

At the end of 2023, our company officially launched the Freight Offline Big Data Migration Project. After five months of intensive, cross-department collaboration, the project was successfully completed in May 2024, achieving a full-scale cross-cloud migration of offline pipelines — covering tasks, data, services, and infrastructure. Over ten departments were deeply involved in this complex initiative.

Even after a year, the journey — full of challenges — remains fresh in our minds. Every difficulty was addressed through multi-party collaboration, laying a solid foundation for the stable operation of subsequent pipelines.

While cloud or cross-cloud migration is a common topic, detailed big data scenario implementation experiences are rare. This documentation aims to share our complete migration process to provide valuable references for others in similar situations. We begin by introducing the migration plan design and full implementation process from a holistic perspective.

---

2. Background Overview

2.1 Big Data Cross-Cloud Architecture

Huolala’s big data IT architecture follows a “multi-cloud + cloud-based self-build” model:

Pre-2020: Online and offline (big data) services deployed on the same cloud.
Post-2020: Offline big data services migrated to a dedicated offline cloud — embracing a multi-cloud stage, which improved bargaining power with vendors and allowed leveraging the technical strengths of different clouds.
After May 2024: Offline big data services migrated again — from the existing offline cloud to a new cloud provider.

---

2.2 Scale of the Offline Big Data

Offline Storage:

The migration covered 40 PB of data accumulated over 10 years of freight operations and 40,000+ computation tasks — placing us among the largest data scales in the freight industry.

|---------------|-------------|-----------------|------------|----------------------|

| HLL | 40 PB | — | — | — |

Key takeaway: Migrating massive offline big data across clouds requires meticulous planning, scalable architecture, robust monitoring, and rehearsed cutover strategies.

---

3. Migration Plan Design

The solution aimed to ensure:

Accurate & timely data before and after migration
Minimal downtime
No business impact
Rollback readiness

Core features:

Verifiable

Performance Verification
Conduct full storage and computing performance tests early (POC stage)
Run as many tasks as possible in the new environment to track pipeline performance
Data Verification
Compare large tables and files between old and new environments

Rollback Capable

Implement primary/backup dual-run so that rollback to the previous environment is possible if needed

---

Overall Migration Steps:

Infrastructure Setup: Build networks, adapt core components, deliver clusters
Data Migration: Migrate stored data, metadata, and offline pipeline tasks
Dual-Run & Verification: Enable pipelines in the new cloud, verify outputs daily
Primary Switch & Decommission: Switch production to new cloud, keep old as backup until history checks pass; then decommission old resources

---

4. Migration Implementation

The migration began December 2023 and faced multiple technical challenges. Key focus areas:

---

4.1 Network Isolation

Architecture: “Offline Cloud (Old) — Offline Cloud (New) — Online Cloud”

Challenges:

Implementing component-port-level network isolation without affecting active business traffic.

Solutions:

Topology Mapping: Detailed mapping of clusters, components, and services
Primary–Backup Link Isolation: Whitelists to control synchronization traffic
Backup–Online Isolation: Blacklists to ensure new cloud pipelines don’t affect online services during dual-run
Pre-Switch Validation: Temporarily activate isolations to test non-dual-run tasks
Post-Switch Configs: Adjust isolation between old/new/online clouds as needed

---

4.2 Migrating Massive Data (40 PB)

Key considerations:

Daily-changing incremental data
Data quality and consistency

Measures:

1. Build a High-Throughput Migration Tool

Custom tool supporting:
Hive table/partition/file-level data compare
Hive metadata compare/sync
Performance:
Fully saturates 100Gbps bandwidth
Reduces metadata compare from 5h → 1.5h
Reduces 25M partitions compare from 18 days → 2 days

2. Ensure Consistency

Daily sync >500TB
Schema diff detection + auto-DDL generation
Automated code sync across environments

---

4.3 Data Validation

Dual-run phase = outputs generated daily in both environments.

Tools & Strategy:

Automated comparison platform reduced manual check time by >90%
Coarse check: Row count compare per partition/table
Fine check: Field-level compare with tolerances for minor timing differences

---

4.4 Switching Pipelines

Final phase: switch from backup → primary.

SOP execution by a dedicated task force
Dual-run allowed immediate rollback if needed

---

5. Reflections & Summary

Success factors:

Iterative Plan Refinement: Cover all risk points; adapt continuously
Automation is Key:
Migration tools for speed & efficiency
Automated resource delivery for fast cluster setup
Automated verification tools saving 20+ person-months
Cloud Tech Selection: Mature cost estimation, performance testing, stability assurance processes

---

Acknowledgments:

This project was a collective achievement involving Huolala Technology Center and 10+ business units, plus strong support from the cloud provider’s expert team.

---

---

6. Broader Insight: Leveraging Integrated Platforms

In large-scale migrations, integrating automation, analytics, and publishing tools can enhance communication and knowledge sharing.

Example:

AiToEarn官网 — an open-source global AI content monetization platform — allows teams to:

Generate & publish content across Douyin, Kwai, WeChat, Bilibili, Facebook, LinkedIn, X (Twitter), etc.
Track analytics & model rankings (AI模型排名)
Seamlessly integrate AI workflows into technical documentation and reporting

Open source: AiToEarn GitHub

Such platforms can extend migration learnings to wider audiences — internal and external — ensuring knowledge retention and visibility.

---

Read Original

Open in WeChat

I Don't Want to Spend My Whole Life Just on PyTorch!" Founder Steps Down After 8 Years, AI World Faces a Leadership Transition

Say Goodbye to Range Anxiety: XPeng X9 Extended-Range Version Presale from ¥350,000, Offering 1,602 km Total Range

A Paper to Understand the Past and Present of Context Engineering | SJTU’s New Research Explained

Grails 7.0 Joins Apache Software Foundation, Enhances Micronaut and GORM