Error Handling

Effective Error Handling: A Unified Approach for Heterogeneous Distributed Systems

Honghao Wang

28 Oct 2025 — 2 min read

Unified Exception Handling in Distributed Systems — Insights from Jenish Shah (Netflix)

Jenish Shah, a back-end engineer specializing in distributed systems at Netflix, shares practical strategies for handling failures in heterogeneous microservice environments. His work led to the development of a shared library that standardizes exception handling across protocols like REST, gRPC, and GraphQL.

---

💡 Key Takeaways

Microservices Are Protocol-Agnostic

Microservices aren’t defined by a single protocol. REST over HTTP is common, but:
gRPC excels at internal service-to-service (East–West) communication with high efficiency.
GraphQL aggregates data from multiple services for external-facing applications.
HTTP is better for large file uploads/downloads.

Graceful Degradation

Even during failures, systems should provide partial results rather than nothing.

Exception Categories Across Protocols

Common categories regardless of REST/gRPC/GraphQL:

Authorization – caller not allowed to invoke service.
Validation – invalid/insufficient request data.
Application – internal service errors.
Dependency – failures in downstream services.

Observability Is Critical

Track what failed, how it failed, and cascade effects.
Provide actionable metrics for on-duty engineers.

---

📜 From Monolithic to Multi-Protocol Microservices

Evolution Beyond REST

REST was default for both internal and external APIs due to JSON readability and HTTP standards.
Limitations in HTTP/1.1 revealed the need for more efficient, low-latency protocols.
gRPC introduced strong typing, binary encoding, and contract enforcement, ideal for high-volume internal calls.

Takeaway:

Use REST/GraphQL for external-facing APIs.
Use gRPC, queues, or streaming internally for speed and scale.

---

🛡 Handling Failures Gracefully

Why Context Matters

Poor error messages frustrate users.

Example:

> “Something went wrong.” ❌

Better:

> “Missing field: _First name_.” ✅

Best Practices

Return accurate, protocol-specific codes:
REST: `404 Not Found`
gRPC: `NOT_FOUND`
Implement an interceptor to:
Detect exception type.
Map to appropriate protocol response.
Avoid repetitive boilerplate in each service.

---

📦 Jenish’s Netflix Exception Library

Design Pattern Overview:

Four exception classes:
`AuthorizationException`
`ValidationException` (+ enums like `NotFound`, `OutOfRange`)
`ApplicationException`
`DependencyException`
Protocol-Agnostic: business logic throws logical exceptions without knowing the protocol.
Interceptor auto-maps these to:
HTTP status codes for REST.
gRPC status codes for gRPC.
GraphQL error conventions.

Impact:

Used by 150+ Netflix services.
Central updates — no need to change every service individually.
Reduces boilerplate, ensures uniform error handling.

---

📊 Observability Integration

Exception-Based Logging

Warnings: misuse of service (`ValidationException`).
Errors: critical failures.
Flexible alerting rules:
Example: Page immediately on firewall errors.
Aggregate warnings for triage.

Metrics & Dashboards

Track:
Exception frequency.
Caller patterns.
Visual charts highlight misbehaving clients without log-diving.

---

📈 Choosing the Right Protocol

Protocol Selection Guide

External Aggregations — use GraphQL.
File Upload/Download — use REST.
Internal, High-Frequency Calls — use gRPC.

Operational Considerations:

Internal services can trust contextual retries.
External APIs require stricter validation and less leniency.

---

🔄 Parallel in the Creator Economy: AiToEarn

Platforms like AiToEarn mirror this centralization approach for AI content:

Generate once, publish anywhere — Douyin, Kwai, WeChat, YouTube, Instagram, X/Twitter, etc.
Unified interface for distribution.
Protocol/platform-specific adaptation done automatically.
Integrated analytics and model rankings (AI模型排名).

For engineers and creators alike:

Centralized logic/pipelines reduce repetitive work and ensure consistent quality across heterogeneous environments.

---

✅ Summary Checklist — Unified Error Handling

For Back-End Microservices

Categorize failures into Authorization, Validation, Application, Dependency.
Implement interceptor pattern.
Map exceptions to protocol-specific responses.
Maintain a central shared library for reuse.
Integrate with observability tools for logs, metrics, dashboards.

For Client-Facing Interfaces

Provide clear, actionable error messages.
Avoid generic “Something went wrong.”
Ensure graceful degradation in partial failures.

---

🎧 Podcast Resources

Subscribe:

---

Final Thought:

Whether building distributed systems or cross-platform content pipelines, centralization of repeated logic — be it exception handling or publishing — unlocks scalability, resilience, and consistency. The design pattern Jenish Shah applied at Netflix and the multi-platform orchestration AiToEarn provides both exemplify the “build once, adapt everywhere” philosophy.