# Tencent News PUSH Platform Architecture Optimization
## Table of Contents
1. **Introduction to the News PUSH Platform**
2. **Problems with the Old Architecture**
3. **Architecture Optimization Plan**
4. **Results of the Architecture Upgrade**
---
From **680,000 lines of code down to 86,000**, most C++ modules rewritten in Golang, and resolving extreme microservice fragmentation — these are the technical wins achieved by our News PUSH architecture team.
PUSH is one of **Tencent News’ premium content distribution channels**, critical for user engagement in the news app. As the PUSH architecture team, we safeguard core news operations while **continuously upgrading** the PUSH framework to improve:
- **System stability and quality**
- **Development efficiency**
- **Operational cost reduction**
This article outlines the **strategies** and **results** of our recent News PUSH architecture optimizations.
---
## 01 Introduction to the Tencent News PUSH Platform
PUSH ensures curated news reaches users **promptly**, satisfying their demand for **quality, timely information**.

The News PUSH process includes two key components:
### 1.1 PUSH Trigger Types
- **Manual PUSH**: Operators select articles and audiences via Push CMS, then trigger delivery for hot events or breaking news.
- **Automated PUSH**: Periodically calculates user-interest content and triggers backend delivery automatically.
- **Functional PUSH**: Sent by business systems for operational notifications (e.g., comment alerts, follow notifications).
### 1.2 PUSH Delivery Process
- Handles scheduling (**avoidance**, **scatter distribution**, **frequency control**).
- Delivers via **proprietary channels** or **vendor channels**.
---
### Key Requirements
1. **Timely Delivery of Premium News Content**
- Fastest delivery wins — delays reduce click-through rates and perceived value.
- Cutting latency by 50% can raise CTR by 10%.
- Goal: Consistently *first-to-push* across the network.
2. **High-Quality User Experience & Engagement**
- Prevent user opt-out by managing timing, content relevance, frequency, and variety.
#### The Balancing Act
- **Timeliness** → Minimal processing.
- **Experience** → More checks and computations (adding latency).
- **Cost control** → Deliver faster and better with fewer resources.
---
### 1.3 PUSH Delivery Speed Challenges
- **Situation**: Team takeover in mid-2022 during an **S-tier international breaking news event**.
- **Action**: Added servers hoping to absorb peak.
- **Result**: Delay up to **1 hour** post-peak interest.
- **Evaluation**: *P90 latency for engaged users at 20 minutes* — escalated to senior management.
---
### 1.4 Development & Debugging Inefficiencies
- **Pipeline length**: 30+ internal modules + dependencies on cross-business teams.
- **Process bottlenecks**: Multiple module changes per requirement; slow scheduling integrations.
- **Troubleshooting pain**: Tracing across 20 modules for a single case; multi-day resolutions.
**Challenge**: Improve timeliness, UX, reactivation efficiency, and lower costs.
---
## 02 Problems in the Old Architecture
### 2.1 Excessive Module Chain
Example: `scheduler` split into multiple microservices (`filter`, `policy`, `channel`, `worker`).
Issues:
- High network RPC overhead.
- Low cohesion / high coupling.
- Slow iterations and testing.
- Complicated troubleshooting.
---
### 2.2 Dependent Service Bottlenecks
Identified main bottleneck: **phone number package retrieval**. Doubling delivery service machines didn’t help — bottleneck was upstream.
---
### 2.3 Poor Link Stability
#### 2.3.1 Weak Fault Tolerance
- Upstream protocol errors could disable service for hours (P0 incident).
#### 2.3.2 No Automatic Failover
- Rigid sharding rules → No rerouting for overloaded nodes.
- Leads to message loss/delay.
---
### 2.4 No Priority Differentiation
- Automated PUSH competed equally with urgent breaking news.
- Hot events slowed by low-priority content.
---
### 2.5 Non-unified Tech Stack
- Mix of **C++** and **Golang** hindered code reuse and maintainability.
---
### 2.6 Low Testing Efficiency
- Reliance on slow, manual smoke tests.
- Regression coverage gaps.
- Long small-traffic experiments (e.g., 2-month tests).
---
## 03 Architecture Optimization Plan
### 3.1 Full-Link Business Closed Loop
Built **self-owned message channel** → Direct vendor integration.
**Key impacts:**
- Reduced modules from **15 to 6**, code from **680k to 86k lines**.
- Unified client-server interaction → Registration success rate up from **90%** to **99.9%**.
- Adopted Tencent News' internal tech stack.
---
### 3.2 Unified Tech Stack
Rewrote all C++ link modules in **Golang** (except recommendation service).
---
### 3.3 Link Integration
Followed principle: *One requirement → One module change*.
**Merges:**
- Trigger side: 5 → 1
- Scheduler side: 5 → 1
- Channel side: 15 → 5
**Impact:** From 18 modules / 17 RPC hops → 3 modules / 2 RPC hops.
---
### 3.4 Custom Number Package Service
- Offline interest segmentation → COS storage.
- API for specific page retrieval.
- Cluster consistency checks.

---
### 3.5 Moving Filtering Upstream
Split number packages offline by **brand** and **OS** → Removed online filtering delays.
---
### 3.6 Batch I/O Processing
- Created async queues per I/O type.
- Workers consume batches → Returns delivered sequentially.
- Business logic unchanged; throughput improved.
---
### 3.7 Push Prioritization
- **Task-level**: Hot news delays lower-priority pushes.
- **User-level**: Serve high-activity, high-value users first.
---
### 3.8 Automatic Fault Recovery
- 4 fixed backups per shard node.
- Traffic rerouted if failure rate/latency exceeds threshold.
---
### 3.9 Automated Pipeline Testing
- Regression automation for core APIs.
- Diff testing with recorded traffic + dependent data snapshots.
---
## 04 Architecture Upgrade Results
### 4.1 Cost Reduction
Push operations costs cut by **70%**.
### 4.2 Performance Boost
Peak throughput up by **3.5×**.
### 4.3 Latency Reduction
- **Internal P90 latency** for breaking news down **90%**.
- **Full-link latency** reduced **90%** (vendor + internal).
### 4.4 Engagement Gains
- Breaking news click PV up **10%**.
- User complaints down to **0 cases**.
### 4.5 Stability
Zero link failures since Feb 2025.
---
**Conclusion:**
Our architectural overhaul improved speed, stability, efficiency, and user engagement while drastically lowering costs. Lessons here apply equally to tech ecosystems managing **large-scale, time-sensitive content delivery**.
---