Discovery and Analysis of a Go arm64 Compiler Bug
Hunting Down a Rare Go arm64 Compiler Bug
Every second, 84 million HTTP requests hit Cloudflare’s network of data centers in 330 cities. At this scale, even rare bugs surface frequently — and recently, that scale helped us uncover a race condition in Go’s arm64 compiler.
This write-up covers:
- How we encountered the bug
- How we investigated and narrowed it down
- How we traced it to its unusual root cause
---
1. First Encounter — A Strange Panic
We operate a kernel-configuring service for products such as Magic Transit and Magic WAN. On arm64 hosts, we began seeing occasional runtime panics.
The first report included:
> fatal error: traceback did not unwind completely
> (link to traceback.go)
This indicated likely stack corruption. Initially, given the low frequency and negligible service impact, we monitored without urgent action… until the panics reappeared repeatedly.
---
2. Coredump Clues

Our observations:
- Fatal errors correlated with recovered panics in legacy panic/recover error handling code.
- All fatal panics occurred during stack unwinding.
- Recovered panics unwind stacks to run deferred calls.
- A related Go issue (#73259) described an arm64 unwinding crash.
Initial mitigation: Removed panic/recover from error handling → Fatal panics stopped.
But: A month later, they returned in higher volume, without matching any recovered panic spike.
We now had two failure modes:
- Crash on invalid memory access
- Crash with explicit fatal error
---
3. Fatal Error Analysis — Go Runtime / GC
Key points from traces:
- Crashes happened in the GC background mark worker.
- Incomplete stack unwinding was reported.
- Platform: ARM64 (`asm_arm64.s`).
Possible causes:
- Unsafe/CGo pointer misuse
- Go runtime bug specific to ARM64
- Memory overwrite from native code
- OS/runtime version incompatibility
Investigation essentials:
go version # confirm latest Go release
go run -race yourapp.go # detect data races
CGO_ENABLED=0 go build # remove Cgo to isolate
GODEBUG=gctrace=1 ./app # GC debug output---
4. Segmentation Fault Case
SIGSEGV log showed:
- Context: `runtime.scanstack` during GC mark phase.
- Invalid address: `addr=0x118`.
- Also inside GC worker, not app code.
Likely sources:
- Pointer corruption (often from unsafe)
- Stack corruption during goroutine scheduling
- Runtime bug
---
5. Pattern Recognition
Both failures hit `(*unwinder).next`:
- Null return address → fatal error (“stack not fully unwound”)
- Non-null, invalid return address → bad `m` pointer deref at offset `0x118` (field `incgo`) → segfault
- (See traceback.go:L458)
Conclusion: Unwinding a corrupt goroutine stack.
---
6. Scheduler Context
Go’s scheduler types:
- `g` — goroutine
- `m` — machine (kernel thread)
- `p` — processor (logical exec context)
A goroutine runs when an `m` holds a `p`.
Our faults implicated reading `m` of a currently running `g` — but with bad `sp`.
---
7. Spotting Netlink in the Traces
Sample:
github.com/vishvananda/netlink/nl.(*NetlinkSocket).ReceiveEvery segfault trace showed preemption inside this function.
Async Preemption (≥ Go 1.14):
Runtime sends `SIGURG` to interrupt long-running goroutines anywhere — even in assembly epilogues — by faking a call to `asyncPreempt`.
---
8. Mid-Epilogue Preemption
Disassembly showed:
ADD $80, RSP, RSP
ADD $(16<<12), RSP, RSP ← preempted here
RETPreemption between split `ADD` instructions left the stack pointer pointing inside a frame → unwinder misread caller data.
---
9. Minimal Reproducer
We wrote a reproducer creating a >64 KB stack frame to force split `ADD` ops:
//go:noinline
func big_stack(val int) int {
buf := make([]byte, 1<<16)
sum := 0
for i := range buf { buf[i] = byte(val) }
for _, b := range buf { sum ^= int(b) }
return sum
}Running with constant GC cycles eventually crashed exactly as seen in production.
---
10. Why ARM64 Emits Split ADDs
ARM64 `ADD (immediate)` supports only 12-bit imm, plus optional `<< 12` shift. Larger immediates need additional instructions. Go’s compiler emitted:
ADD $small, RSP, RSP
ADD $(16<<12), RSP, RSPPreemption between those → invalid `sp`.
---
11. The Fix (Go 1.23.12+, 1.24.6+, 1.25.0)
Compiler now builds the offset in a temp register, does one `ADD` to `rsp`.
No more mid-adjustment preemption:
MOVD $32, R27
MOVK $(1<<16), R27
ADD R27, RSP, RSP---
12. Takeaways
- Async preemption + split stack teardown = rare race window.
- Scale accelerates discovery of “rare” runtime/compiler bugs.
- Root-cause required assembly-level reading + reproducible test.
---
References:
---
Note: This investigation required deep runtime knowledge and patience — exactly the kind of systems puzzle that makes compiler/runtime engineering exciting. And yes — we’re hiring engineers who enjoy chasing bugs like this.