How Disney Hotstar (Now JioHotstar) Scaled Its Infrastructure to Support 60 Million Concurrent Users

How Disney Hotstar (Now JioHotstar) Scaled Its Infrastructure to Support 60 Million Concurrent Users

Disclaimer

The details in this post are based on publicly shared information by the Disney+ Hotstar (now JioHotstar) Engineering Team.

All technical credit belongs to them. Original articles and sources can be found in the References section at the end.

Our analysis builds upon their shared insights — if you find inaccuracies or omissions, please leave a comment for revisions.

---

Overview: A Record-Scale Streaming Challenge

In 2023, Disney+ Hotstar tackled one of the most ambitious engineering challenges in streaming history — supporting

50–60 million concurrent live streams during the Asia Cup and Cricket World Cup.

For perspective, earlier Hotstar peaks were around 25 million concurrent users handled on two self-managed Kubernetes clusters.

---

“Free on Mobile” — Raising the Stakes

Adding complexity, the platform introduced Free on Mobile streaming, removing subscription requirements for millions.

This drastically increased expected load and forced a fundamental infrastructure rethink.

---

Why Architecture Needed to Change

Simply adding more servers wasn’t viable. The team needed structural change to maintain:

  • Reliability
  • High speed
  • Efficiency under unprecedented demand

---

Introducing the “X Architecture”

The solution was a server-driven “X architecture”, engineered for:

  • Flexibility
  • Scalability
  • Cost-effectiveness
  • Global operational efficiency

This reduced excessive dependence on brute-force scaling.

---

> Related for content creators:

If you operate at scale across multiple platforms, AiToEarn官网 offers an open-source AI monetization stack — integrating creation tools, cross-platform publishing, analytics, and model rankings — similar to how streaming teams integrate their operational layers for efficiency.

---

Technical Journey — From Gateways to Kubernetes Overhaul

The overhaul included:

  • Network redesign
  • API gateway optimization
  • Migration to Amazon EKS
  • Data Center Abstraction
  • ...all to keep millions of streams running smoothly in real time.

---

Architecture Overview

Disney+ Hotstar serves mobile (Android/iOS), web, and smart TV users. Traffic flows:

  • User action → request sent to external API gateway via CDN
  • CDN layer → performs security + traffic filtering → forwards to internal API gateway
  • Internal gateway (ALBs) → load-balances across backend services
  • Backend services → deliver features (video, scorecards, chat, profiles) using managed or self-hosted databases

Key point: Each layer must scale together. Any bottleneck affects millions of users.

---

CDN Optimization

During scale-up, CDNs did more than caching: they also acted as API gateways with rate limiting and token verification.

Findings from past events:

  • Top 10 APIs dominated load
  • API requests split into:
  • Cacheable — data stable for seconds/minutes (scoreboards, highlights)
  • Non-cacheable — personalized/time-sensitive

Solution:

  • Separate domains for cacheable APIs
  • Lighter security + faster routing for cached data
  • → Reduced edge server load and boosted throughput.

---

Traffic Frequency Tweaks

By reducing refresh intervals for certain features (e.g., “Watch More” suggestions), overall traffic dropped without reducing user experience quality.

---

Infrastructure Scaling Layers

Two main focus areas:

1. NAT Gateway Scaling

  • Problem: Imbalanced bandwidth use — one AZ’s NAT gateway overloaded
  • Fix: Shift from 1 per AZ to 1 per subnet, distributing load more evenly

2. Kubernetes Worker Node Tuning

  • Problem: Internal API Gateway pods hogging bandwidth (8–9 Gbps on single node)
  • Fixes:
  • Use 10 Gbps-capable nodes
  • Apply topology spread constraints — one gateway pod per node

---

Migration to Amazon EKS

Reason: Self-managed clusters couldn’t reliably scale beyond ~25M users. Control plane fragility under high load.

Benefits of EKS:

  • AWS-managed control plane
  • Focus engineering on workloads and data plane
  • Latest instance families (e.g., Graviton)
  • Faster cluster provisioning

---

Scaling Strategy Refinement

  • Avoid API server throttling by stepwise scaling (100–300 nodes per step)
  • Stable test results with >400 nodes

---

The Next Phase — Beyond 50M Users

Legacy constraints:

  • Port exhaustion (NodePort range)
  • Older Kubernetes version blocking modern tooling
  • IP address limits in VPC subnets
  • Manual load balancer pre-warming

---

Datacenter Abstraction Architecture

Concept: Logical “data center” = multiple Kubernetes clusters in a region functioning as one.

Benefits:

  • Failover between clusters without redeployment complexity
  • Uniform scaling + observability
  • Central routing + security
  • Reduced operational overhead

---

Key Innovations

Central Proxy Layer — Envoy-based Internal API Gateway

  • Unified fleet replaced >200 ALBs
  • Routing, authentication, rate-limiting centralized
  • Developers cluster-location agnostic

EKS Across All Clusters

  • Unified orchestration via managed Kubernetes
  • Access to newest EC2 types
  • Rapid, on-demand provisioning

---

Unified Service Endpoints

  • Consistent internal pattern:
  • `service.internal.domain` or `service.internal.dc.domain`
  • External/public pattern:
  • `service.public-domain`

---

One Unified Manifest

  • Single config template with overrides for environment-specific tweaks
  • Load balancer/DNS/security abstracted by platform

---

NodePort Eliminated

  • ClusterIP services with AWS ALB Ingress Controller
  • Removed port range limits
  • Simplified traffic flow and network scaling

---

Final Transformation Impact

Production environment:

  • Six balanced EKS clusters
  • Only one P0-critical service per cluster
  • Central Envoy gateway replaced hundreds of ALBs
  • >200 microservices migrated into Datacenter Abstraction setup
  • During the 2023 Cricket World Cup:
  • 61M concurrent users
  • No major incidents or disruptions

---

References

---

Takeaway

Hotstar’s success reveals three timeless lessons:

  • Architecture > Raw hardware scaling
  • Unified orchestration simplifies ops
  • Stepwise scaling & abstraction prevent bottlenecks

---

Would you like me to produce a visual summary diagram of the Datacenter Abstraction and scaling flow as part of this rewrite? That way your Markdown will have both improved readability and clearer technical illustration.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.