Go’s New “Green Tea” Garbage Collector Could Boost Performance by up to 40%
Go 1.25: Introducing the Green Tea Garbage Collector
Go 1.25 debuts a new experimental garbage collector, code-named Green Tea, delivering up to 40% faster performance for GC-intensive workloads compared to the current implementation.
---
How Green Tea Works
Green Tea retains Go’s mark-sweep strategy, but changes the key unit of operation:
- Traditional GC: Operates on individual objects.
- Green Tea GC: Operates on memory pages.
Page-Level Processing
- Performs global scanning and tracking per page.
- Manages objects locally within each page, rather than globally across the heap.
Benefits:
> By scanning objects stored close together in memory, CPU cache utilization improves, reducing reliance on slower main memory.
> Page-level metadata is more likely to stay cached, shortening work lists and reducing CPU stalls.
According to Go contributors Michael Knyszek and Austin Clements:
> ~90% of GC cost is in marking; only ~10% is in sweeping.
Minimizing total scans greatly reduces GC overhead.
---
Why CPU Architecture Matters
Modern CPUs both pose challenges and create opportunities for GC design.
Challenges
- NUMA (Non-Uniform Memory Access) — cores have faster access to specific memory segments.
- Reduced memory bandwidth per core — more cores increase contention.
- Rising core count — parallel GC work becomes harder to scale.
Opportunities
- Vector instructions and wide registers allow faster parallel operations.
- New bit-vector instructions (AMD Zen 4, Intel Ice Lake) enable key scanning steps in just a few CPU cycles, dramatically speeding up Green Tea’s scan loop.
---
Performance Results
Gains
- 10–40% reduction in GC overhead, depending on workload.
- If GC uses 10% of total CPU time, Green Tea cuts overall CPU time by 1–4%.
Limitations
Some workloads see no improvement or even small regressions:
> If a workload scans only one object per page in each pass, accumulation overhead outweighs benefits — sometimes making it less efficient than traditional GC.
---
Real-World Feedback
Dolthub's Perspective
Dolthub opted not to adopt Green Tea in production:
> No measurable gains in latency benchmarks; slight regression in mark time.
Other Observations
- Fewer GC runs, but each consumes more CPU.
- Can lead to significant latency increases in certain memory-heavy applications.
- Latency issue fixed for Go 1.26.
---
Trying Green Tea
Green Tea is production-ready but disabled by default in Go 1.25.
To enable it:
GOEXPERIMENT=greenteagc---
Trend Toward Runtime Optimization
Green Tea reflects a broad effort to:
- Optimize runtime for modern hardware efficiencies.
- Reduce contention and stalls on high-core CPU systems.
For developers working with high-performance Go workloads, sharing benchmark data and adoption strategies across communities is key.
---
Cross-Platform Content & Monetization for Technical Creators
Platforms like AiToEarn官网 can help engineering teams:
- Generate AI-assisted articles, videos, and posts.
- Publish simultaneously to Douyin, Bilibili, LinkedIn, YouTube, and other channels.
- Track engagement and monetize technical knowledge.
By integrating performance benchmarks, observations, and code experiments into such workflows, technical teams can document findings and reach diverse audiences efficiently.
---
Summary:
Green Tea GC offers substantial performance potential on modern CPUs, but impact is workload-dependent. Early adopters report both gains and drawbacks, so careful benchmarking is advised before enabling in production builds.
---
Do you want me to prepare a step-by-step Green Tea benchmarking checklist so developers can compare results before adoption? That would make this article more actionable.