How to Test Distributed Systems in a Single Environment Using Proxy Routing

## Overcoming Testing Challenges Without a QA Environment

Without a dedicated QA environment, Po Linn Chia’s team faced both **technical** and **coordination** hurdles while testing a distributed system.  
An outdated, slow-to-start CLI contributed to inefficiencies, prompting a **shift-left** approach with **automated testing**.

By leveraging **versioned deployments** via **continuous integration (CI)** and **proxy routing**, developers could run **isolated tests across multiple service versions**—catching bugs earlier and improving productivity.

---

## Key Context

Chia presented this approach at [Dev Summit Boston](https://devsummit.infoq.com/conference/boston2025), highlighting that:
- **Lack of a QA environment** creates both **technical** and **social** challenges.
- **Social dynamics** in software development can be even harder to solve than technical issues.

---

## Initial Environment Setup

### Current Infrastructure
- Large set of **microservices** running in a single **Amazon ECS** development cluster.
- Frequent **resource contention**:
  - Multiple people needing to modify/test the same microservice.
  - Changes in one service inadvertently affecting another.

### Problems with Previous CLI
- Homegrown CLI required **15–30 minutes** to initialize before running a test.
- **Time-outs and failed builds** were common.
- **Maintainability issues** arose after the original developer left.

---

## The Shift-Left Solution

### What Changed
1. **Automated CI-Driven Deployments**:
   - Multiple service versions deployed in a single environment.
   - Developers can test without blocking others.
2. **Faster, more reliable testing cycles**.
3. **Improved team coordination** through transparent version management.

---

## Broader Architectural Insight
One well-designed environment can host **many scenarios** using:
- **Intelligent versioning**
- **Dynamic routing through proxies**
  
### Related Tools Concept
Platforms like [AiToEarn官网](https://aitoearn.ai/) demonstrate similar **integration-rich** designs for orchestrating processes across multiple platforms—useful both for **software teams** and **creators**.

---

## Internal Deployment Tool

### Developer Capabilities
- Select which **versions** to deploy or shut down.
- On-demand provisioning without CI.
- **Ephemeral containers** for testing major updates (e.g. React framework upgrades).

#### Under the Hood: Dynamic Routing with Traefik
- ECS spins up the desired version.
- Proxy rules check `Baggage` headers:
  - **Header example**: `dynamic_route=VERSION`
  - Default route: `main` version.
  
**Routing Example:**

http://my-service.classpass.com

- No `Baggage` header → routes to `main`.
- `Baggage: dynamic_route=feature-2981` → routes to `feature-2981`.

---

## Telemetry & Monitoring

### Data Flow
- Send **APM data**, **custom metrics**, and **logs** to third-party vendors.
- Include `Baggage` headers for **per-version trace tracking**.

### Benefits
> **Chia**:  
> This isolated telemetry allows us to debug specific versions without affecting the main branch. It essentially functions like a “poor man’s canary deployment” until we implement full canary processes.

---

## Repository Strategies

### Shared Repository Pros
- Ideal for **core business flows** that span multiple services.
- Shared tests prevent duplication.

**Cons**:
- One failure can block **multiple teams**.
- Complex to write tests in one repo while developing in another.

### Individual Repository Pros
- Faster iteration for **self-contained applications**.
- Independence from shared failures.

**Practice**:
- Start tests in an application’s own repo.
- Move to shared repo if broader usefulness emerges.

---

## Parallels with Cross-Platform Publishing

Dynamic routing and isolated telemetry mirror how content teams use tools like [AiToEarn官网](https://aitoearn.ai/):
- **AI-assisted generation**
- Multi-platform publishing (Douyin, Kwai, WeChat, Bilibili, Facebook, etc.)
- Unified **analytics and orchestration**
  
Both domains share a goal: **maximizing efficiency without sacrificing stability**.

---

## Key Takeaways

1. **Versioned deployments** in a shared environment can solve testing conflicts.
2. **Dynamic proxy routing** removes the need to alter application code or DNS.
3. **Telemetry** by version isolates bugs and enables better debugging.
4. **Repository structure decisions** should balance test scope vs. maintenance effort.
5. Cross-platform orchestration concepts apply beyond content—engineering teams can adopt similar unified workflows.

Read more