How to Test Distributed Systems in a Single Environment Using Proxy Routing
## Overcoming Testing Challenges Without a QA Environment
Without a dedicated QA environment, Po Linn Chia’s team faced both **technical** and **coordination** hurdles while testing a distributed system.
An outdated, slow-to-start CLI contributed to inefficiencies, prompting a **shift-left** approach with **automated testing**.
By leveraging **versioned deployments** via **continuous integration (CI)** and **proxy routing**, developers could run **isolated tests across multiple service versions**—catching bugs earlier and improving productivity.
---
## Key Context
Chia presented this approach at [Dev Summit Boston](https://devsummit.infoq.com/conference/boston2025), highlighting that:
- **Lack of a QA environment** creates both **technical** and **social** challenges.
- **Social dynamics** in software development can be even harder to solve than technical issues.
---
## Initial Environment Setup
### Current Infrastructure
- Large set of **microservices** running in a single **Amazon ECS** development cluster.
- Frequent **resource contention**:
- Multiple people needing to modify/test the same microservice.
- Changes in one service inadvertently affecting another.
### Problems with Previous CLI
- Homegrown CLI required **15–30 minutes** to initialize before running a test.
- **Time-outs and failed builds** were common.
- **Maintainability issues** arose after the original developer left.
---
## The Shift-Left Solution
### What Changed
1. **Automated CI-Driven Deployments**:
- Multiple service versions deployed in a single environment.
- Developers can test without blocking others.
2. **Faster, more reliable testing cycles**.
3. **Improved team coordination** through transparent version management.
---
## Broader Architectural Insight
One well-designed environment can host **many scenarios** using:
- **Intelligent versioning**
- **Dynamic routing through proxies**
### Related Tools Concept
Platforms like [AiToEarn官网](https://aitoearn.ai/) demonstrate similar **integration-rich** designs for orchestrating processes across multiple platforms—useful both for **software teams** and **creators**.
---
## Internal Deployment Tool
### Developer Capabilities
- Select which **versions** to deploy or shut down.
- On-demand provisioning without CI.
- **Ephemeral containers** for testing major updates (e.g. React framework upgrades).
#### Under the Hood: Dynamic Routing with Traefik
- ECS spins up the desired version.
- Proxy rules check `Baggage` headers:
- **Header example**: `dynamic_route=VERSION`
- Default route: `main` version.
**Routing Example:**http://my-service.classpass.com
- No `Baggage` header → routes to `main`.
- `Baggage: dynamic_route=feature-2981` → routes to `feature-2981`.
---
## Telemetry & Monitoring
### Data Flow
- Send **APM data**, **custom metrics**, and **logs** to third-party vendors.
- Include `Baggage` headers for **per-version trace tracking**.
### Benefits
> **Chia**:
> This isolated telemetry allows us to debug specific versions without affecting the main branch. It essentially functions like a “poor man’s canary deployment” until we implement full canary processes.
---
## Repository Strategies
### Shared Repository Pros
- Ideal for **core business flows** that span multiple services.
- Shared tests prevent duplication.
**Cons**:
- One failure can block **multiple teams**.
- Complex to write tests in one repo while developing in another.
### Individual Repository Pros
- Faster iteration for **self-contained applications**.
- Independence from shared failures.
**Practice**:
- Start tests in an application’s own repo.
- Move to shared repo if broader usefulness emerges.
---
## Parallels with Cross-Platform Publishing
Dynamic routing and isolated telemetry mirror how content teams use tools like [AiToEarn官网](https://aitoearn.ai/):
- **AI-assisted generation**
- Multi-platform publishing (Douyin, Kwai, WeChat, Bilibili, Facebook, etc.)
- Unified **analytics and orchestration**
Both domains share a goal: **maximizing efficiency without sacrificing stability**.
---
## Key Takeaways
1. **Versioned deployments** in a shared environment can solve testing conflicts.
2. **Dynamic proxy routing** removes the need to alter application code or DNS.
3. **Telemetry** by version isolates bugs and enables better debugging.
4. **Repository structure decisions** should balance test scope vs. maintenance effort.
5. Cross-platform orchestration concepts apply beyond content—engineering teams can adopt similar unified workflows.