Discovery and Analysis of a Go arm64 Compiler Bug

Discovery and Analysis of a Go arm64 Compiler Bug

Hunting Down a Rare Go arm64 Compiler Bug

Every second, 84 million HTTP requests hit Cloudflare’s network of data centers in 330 cities. At this scale, even rare bugs surface frequently — and recently, that scale helped us uncover a race condition in Go’s arm64 compiler.

This write-up covers:

  • How we encountered the bug
  • How we investigated and narrowed it down
  • How we traced it to its unusual root cause

---

1. First Encounter — A Strange Panic

We operate a kernel-configuring service for products such as Magic Transit and Magic WAN. On arm64 hosts, we began seeing occasional runtime panics.

The first report included:

> fatal error: traceback did not unwind completely

> (link to traceback.go)

This indicated likely stack corruption. Initially, given the low frequency and negligible service impact, we monitored without urgent action… until the panics reappeared repeatedly.

---

2. Coredump Clues

image

Our observations:

  • Fatal errors correlated with recovered panics in legacy panic/recover error handling code.
  • All fatal panics occurred during stack unwinding.
  • Recovered panics unwind stacks to run deferred calls.
  • A related Go issue (#73259) described an arm64 unwinding crash.

Initial mitigation: Removed panic/recover from error handling → Fatal panics stopped.

But: A month later, they returned in higher volume, without matching any recovered panic spike.

We now had two failure modes:

  • Crash on invalid memory access
  • Crash with explicit fatal error

---

3. Fatal Error Analysis — Go Runtime / GC

Key points from traces:

  • Crashes happened in the GC background mark worker.
  • Incomplete stack unwinding was reported.
  • Platform: ARM64 (`asm_arm64.s`).

Possible causes:

  • Unsafe/CGo pointer misuse
  • Go runtime bug specific to ARM64
  • Memory overwrite from native code
  • OS/runtime version incompatibility

Investigation essentials:

go version                   # confirm latest Go release
go run -race yourapp.go      # detect data races
CGO_ENABLED=0 go build       # remove Cgo to isolate
GODEBUG=gctrace=1 ./app      # GC debug output

---

4. Segmentation Fault Case

SIGSEGV log showed:

  • Context: `runtime.scanstack` during GC mark phase.
  • Invalid address: `addr=0x118`.
  • Also inside GC worker, not app code.

Likely sources:

  • Pointer corruption (often from unsafe)
  • Stack corruption during goroutine scheduling
  • Runtime bug

---

5. Pattern Recognition

Both failures hit `(*unwinder).next`:

  • Null return address → fatal error (“stack not fully unwound”)
  • Non-null, invalid return address → bad `m` pointer deref at offset `0x118` (field `incgo`) → segfault
  • (See traceback.go:L458)

Conclusion: Unwinding a corrupt goroutine stack.

---

6. Scheduler Context

Go’s scheduler types:

  • `g` — goroutine
  • `m` — machine (kernel thread)
  • `p` — processor (logical exec context)

A goroutine runs when an `m` holds a `p`.

Our faults implicated reading `m` of a currently running `g` — but with bad `sp`.

---

Sample:

github.com/vishvananda/netlink/nl.(*NetlinkSocket).Receive

Every segfault trace showed preemption inside this function.

Async Preemption (≥ Go 1.14):

Runtime sends `SIGURG` to interrupt long-running goroutines anywhere — even in assembly epilogues — by faking a call to `asyncPreempt`.

---

8. Mid-Epilogue Preemption

Disassembly showed:

ADD $80, RSP, RSP
ADD $(16<<12), RSP, RSP   ← preempted here
RET

Preemption between split `ADD` instructions left the stack pointer pointing inside a frame → unwinder misread caller data.

---

9. Minimal Reproducer

We wrote a reproducer creating a >64 KB stack frame to force split `ADD` ops:

//go:noinline
func big_stack(val int) int {
    buf := make([]byte, 1<<16)
    sum := 0
    for i := range buf { buf[i] = byte(val) }
    for _, b := range buf { sum ^= int(b) }
    return sum
}

Running with constant GC cycles eventually crashed exactly as seen in production.

---

10. Why ARM64 Emits Split ADDs

ARM64 `ADD (immediate)` supports only 12-bit imm, plus optional `<< 12` shift. Larger immediates need additional instructions. Go’s compiler emitted:

ADD $small, RSP, RSP
ADD $(16<<12), RSP, RSP

Preemption between those → invalid `sp`.

---

11. The Fix (Go 1.23.12+, 1.24.6+, 1.25.0)

Compiler now builds the offset in a temp register, does one `ADD` to `rsp`.

No more mid-adjustment preemption:

MOVD $32, R27
MOVK $(1<<16), R27
ADD R27, RSP, RSP

---

12. Takeaways

  • Async preemption + split stack teardown = rare race window.
  • Scale accelerates discovery of “rare” runtime/compiler bugs.
  • Root-cause required assembly-level reading + reproducible test.

---

References:

---

Note: This investigation required deep runtime knowledge and patience — exactly the kind of systems puzzle that makes compiler/runtime engineering exciting. And yes — we’re hiring engineers who enjoy chasing bugs like this.

Read more