# Facebook Ordered Queueing Service (FOQS) — Scalable Distributed Priority Queue
> **Disclaimer:**
> This post analyzes publicly shared information from the Facebook/Meta Engineering Team. **All credit** for the technical insights belongs to Facebook/Meta. Original articles are linked in the **References** section.
> We’ve added our own commentary and comparisons. If you spot inaccuracies or omissions, please leave a comment so we can update this article.
---
## Introduction
Modern large-scale systems frequently process **massive volumes of tasks asynchronously**.
In environments such as social networks, workloads range from high-priority tasks (e.g., sending notifications) to lower-priority, parallelizable jobs (e.g., large-scale content translation).
To manage this diversity, Facebook built **Facebook Ordered Queueing Service (FOQS)** — a **fully managed, horizontally scalable, multi-tenant, distributed priority queue** running on **sharded MySQL**.
**Key capabilities:**
- Reliable storage and delivery of tasks to multiple consumers.
- Respect for **priority**, **timing**, and **ordering** requirements.
- Decoupling between producers and consumers for resilience.
- Throughput, retry, and delivery controls without custom queue development.
### Real-world analogy
In modern content ecosystems, similar principles power AI-driven publishing platforms like [AiToEarn官网](https://aitoearn.ai/), scheduling AI-generated content for release across multiple platforms simultaneously, including Douyin, Kwai, WeChat, Bilibili, Xiaohongshu (Rednote), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter).
---
## Core Concepts of FOQS
### Namespace
- **Definition**: Fundamental unit for **multi-tenancy** and **capacity management**.
- **Structure**:
- Each namespace maps to exactly one **tier**.
- A tier = FOQS host fleet + specific MySQL shards.
- **Capacity guarantee**: Measured as **enqueues per minute** (avoiding throttling).
- **Benefit**: Prevents one tenant’s workload from overwhelming others.
---
### Topic
- **Purpose**: Organizes work within a namespace.
- **Creation/Deletion**:
- Auto-created on first enqueue.
- Auto-deleted when empty.
- **Discovery**: API `GetActiveTopics` lists topics with pending items.
- **Example**: A video encoder can use a separate topic per uploaded video.
---
### Item
Represents **a single task**.
**Stored**: One MySQL table row, leveraging relational consistency.
**Fields:**
1. **Namespace/Topic** — logical queue identifiers.
2. **Priority** — 32-bit integer; lower = higher priority.
3. **Payload** — immutable binary blob (~10 KB).
4. **Metadata** — mutable field for retries, backoff info.
5. **deliver_after** — timestamp for delayed processing.
6. **lease_duration** — processing window for `ack`/`nack`.
7. **TTL** — expiry time for automatic removal.
8. **FOQS ID** — globally unique, encodes shard + primary key.
---
## Enqueue Path
**Goal:** Efficiently add new items without overloading MySQL.
**Mechanism:**
1. Client sends enqueue request to FOQS host.
2. **In-memory buffering**: Requests batched per shard.
3. Background **worker threads** write batches to MySQL.
4. **Circuit breaker logic**:
- Detect unhealthy shards.
- Stop sending new enqueues until recovery.
**Benefits:**
- Reduces MySQL overhead.
- Maintains massive write throughput.
- Avoids cascading failures.
---
## Dequeue Path
**Goal:** Deliver tasks quickly, in the correct order, at scale.
**Mechanism:**
- **In-memory indexes** per shard sorted by priority + `deliver_after`.
- **Prefetch buffer**:
- Merges shard indexes via **k-way merge**.
- Marks items as delivered in MySQL.
- Serves requests from memory for low latency.
- **Demand awareness**:
- Adaptive prefetching based on topic dequeue rate.
**Delivery modes:**
- **At-least-once**: Retry on failure; possible duplicates.
- **At-most-once**: No duplicates; possible loss on crash.
---
## Ack/Nack Path
**Goal:** Safely confirm task completion or rejection.
**Mechanism:**
1. FOQS ID routes `ack`/`nack` to correct shard host.
2. Request appended to **in-memory buffer** per shard.
3. Workers batch writes to MySQL:
- **Ack**: Delete row.
- **Nack**: Update `deliver_after` + metadata.
4. **Idempotent operations**: Safe retries, no state corruption.
---
## Push vs Pull
- **Pull model chosen**:
- Consumers request only when ready.
- Prevents overload across heterogeneous workloads.
- Simplifies flow control without tracking each consumer’s state in real-time.
---
## Operating at Facebook Scale
- **1 trillion** items/day.
- Handles backlogs of **hundreds of billions** during outages.
- **MySQL optimization**:
- Careful indexing.
- In-memory queues.
- Checkpointed scans.
- **Replication**:
- Each shard replicated across regions.
- Binlogs stored asynchronously in multiple locations.
---
## Conclusion
FOQS delivers:
- **High throughput** and **reliability**.
- Resilience through sharding, buffering, adaptive routing, leases.
- Flexibility with namespaces/topics/items.
- Disaster readiness via multi-region replication.
**References:**
- [FOQS: Scaling a distributed priority queue](https://engineering.fb.com/2021/02/22/production-engineering/foqs-scaling-a-distributed-priority-queue/)
- [Priority Queue - Wikipedia](https://en.wikipedia.org/wiki/Priority_queue)
---
## Sponsor Us
Get your product in front of over **1,000,000 tech professionals**.
Email **sponsorship@bytebytego.com** to reserve ad slots (space sells out ~4 weeks in advance).
---
**AI Content Workflow Parallel**
Platforms like [AiToEarn官网](https://aitoearn.ai/) apply FOQS-like distributed orchestration concepts to AI-driven publishing, enabling creators to generate, schedule, and monetize content across platforms such as Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter).
Explore [AiToEarn开源地址](https://github.com/yikart/AiToEarn) and the [AiToEarn博客](https://blog.aitoearn.ai) for more.