How Shopify Handles 30TB of Data per Minute with a Monolithic Architecture

How Shopify Handles 30TB of Data per Minute with a Monolithic Architecture

Shopify’s Billion-User Monolith: How Simplicity Scales

image

If you’ve ever worked on a web application that slows down with just a few thousand users, imagine handling billions—and doing it with a single, elegantly designed monolith, not hundreds of microservices or annual rewrites.

This is the real story of Shopify:

Processing 20+ TB of data every minute, powering one of the world’s largest e-commerce platforms, and keeping its architecture simple, scalable, and human-friendly.

---

Black Friday at Shopify

For most companies, Black Friday means chaos.

For Shopify, it’s a well-oiled machine in overdrive.

When midnight strikes, traffic floods in from all over the globe. Millions shop at stores like Gymshark, Kylie Cosmetics, and Allbirds.

2021 Black Friday Weekend Stats

  • 30 TB of data processed per minute
  • 32M+ requests per minute
  • 11M MySQL queries per second
  • $3.9M sales per minute
  • Zero downtime, zero crashes

---

The Modular Monolith Approach

Shopify’s architecture is a modular monolith—one codebase, mainly Ruby on Rails, split into clear logical domains.

Think of it as one city with distinct districts:

  • Checkout
  • Payments
  • Orders
  • Admin Backend
  • Inventory
  • Analytics
image

Each module has:

  • Exclusive data ownership
  • Public API for interaction
  • Dedicated maintenance team

---

Enforcing Boundaries with Packwerk

Modules live in the same codebase but are strictly isolated.

  • Shopify uses Packwerk, an internal tool, to detect forbidden module access.
  • This discipline keeps the 10+ year-old monolith clean and maintainable for thousands of engineers.

---

Hexagonal Architecture (Ports & Adapters)

Why?

Modular boundaries decide where features live.

Hexagonal architecture decides how they talk to the outside world.

Core principles:

  • Core business logic sits at the center (unchanging)
  • Adapters at the edge handle external communication (API, DB, queues)
image

Example: Creating an Order

Tight‑Coupling (Traditional)

  • Controller directly calls DB
  • Business logic lives in controllers
  • API changes risk breaking everything

Hexagonal Method (Shopify Way)

────────────────────────┐
│    Web Layer           │
│ (GraphQL, REST, etc.)  │
└───────────┬────────────┘
            ▼
 ┌──────────────────────┐
 │ Application Service  │
 │ (CreateOrderUseCase) │
 └───────────┬──────────┘
      (via Interface)
            │
 ┌──────────────────────┐
 │ Adapters             │
 │ (MySQL, Kafka, Redis)│
 └──────────────────────┘

Workflow:

  • API adapter receives request
  • Passes to `CreateOrderUseCase`
  • Core logic runs (inventory check, payment validation, discounts)
  • Adapter persists data to MySQL or sends event to Kafka

Key Benefit: Core logic is agnostic to the input/output source.

Shopify can swap tech without touching business logic.

---

Pods: Horizontal Scaling for Monoliths

When a viral launch threatens to overload the platform, Pods isolate the impact.

Each Pod is a mini-Shopify:

  • Separate DB shard
  • Separate cache
  • Separate queues
  • Separate workers

Routing is handled by the Sorting Hat service, sending requests to the right Pod.

image

If Pod A crashes due to a mega-launch, Pod B handles other 100K stores unaffected.

---

Data Flow Evolution

Every click (e.g., "add to cart") generates massive data streams.

From Batch to Real-Time:

  • Old: Longboat (hourly batch queries)
  • New: Debezium + Kafka (real-time CDC)
image
image

Result: Real-time dashboards, instant fraud detection, and fast analytics at PB scale.

---

Handling Traffic Spikes Gracefully

When 1M people click “add to cart” in the same second, brute force won’t work.

Shopify relies on:

  • Edge caching via CDN for static pages/media
  • Redis/Memcached for sessions, precomputed data, fast reads
  • Background queues for heavy tasks
  • Graceful degradation of non-core features during spikes

Example: Pause recommendations during traffic surges—but keep checkout running flawlessly.

---

MySQL at Massive Scale

Despite trend shifts, Shopify stays with MySQL—at extreme scale:

  • Hundreds of shards across Pods
  • 10M+ queries/sec
  • Automatic replica balancing
  • Snapshot backups with 30‑minute restore window
  • Online schema changes (zero downtime)
  • Dynamic re‑sharding to avoid hotspots

This is boring excellence—and it works.

---

Black Friday On‑Call: Calm Under Pressure

> “You prepare for battle. Expect alarms. Imagine chaos.

> But as traffic surges—charts spike, Pods hum, caches hold—nothing breaks.

> You sip coffee. You smile.”

The magic of robust architecture is serenity even on the busiest day of the year.

---

Lessons We Can All Apply

  • Start Monolithic, Modularize Over Time
  • Avoid splitting systems unless necessary—complexity is expensive.
  • Follow Hexagonal Architecture Principles
  • Keep business logic clean and decoupled from I/O.
  • Isolate Failures
  • Use Pods, shards, or domains to limit blast radius.
  • Prefer Stream Processing Over Batch
  • Real-time data = faster feedback loops.
  • Plan Graceful Degradation
  • Protect core features during failures.
  • Make “Boring” Beautiful
  • Great architecture feels uneventful.

---

Final Thoughts

Shopify’s success is about clarity, craftsmanship, and composure at global scale.

They prove a disciplined hexagonal monolith can outperform an unruly mess of microservices—even at internet-scale.

When you’re processing 20 TB every minute, simplicity is not a weakness—it’s the mark of true excellence.

Architecture Summary

image

---

In today’s AI-powered content ecosystem, tools like AiToEarn官网 enable creators to:

  • Produce AI-generated content
  • Publish across platforms (Douyin, Kwai, WeChat, YouTube, Instagram, X, etc.)
  • Integrate analytics
  • Monetize at scale

Just as Shopify scales commerce, AiToEarn scales content—keeping processes unified yet flexible.

---

Source: Medium Article

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.