Skip to content

fix: arm64 io_uring detection and H2 flow control throughput#40

Merged
FumingPower3925 merged 2 commits intomainfrom
fix/sustained-load-death-spiral
Mar 9, 2026
Merged

fix: arm64 io_uring detection and H2 flow control throughput#40
FumingPower3925 merged 2 commits intomainfrom
fix/sustained-load-death-spiral

Conversation

@FumingPower3925
Copy link
Contributor

Summary

Fixes two performance issues identified in official benchmarks:

  • arm64 io_uring 20x slowdown: Replace MSG_PEEK-based protocol detection with inline detection in handleRecv, eliminating arm64 kernel behavior differences. Remove dead code (prepRecvPeek, udPeek, msgPeek, handlePeek).
  • H2 body 66-74% throughput drop: Lower windowUpdateThreshold from 16384→1024, eliminate per-DATA-frame map allocation in FlushWindowUpdates, clean up stale pendingStreamUpdates entries on stream deletion.

Closes #39

Changes

File Change
engine/iouring/worker.go Replace handlePeek+MSG_PEEK with detectProtocol inside handleRecv
engine/iouring/sqe.go Remove prepRecvPeek
engine/iouring/cqe.go Remove udPeek constant
engine/iouring/consts.go Remove msgPeek constant
protocol/h2/stream/flowcontrol.go Lower threshold, eliminate map alloc in FlushWindowUpdates
protocol/h2/stream/manager.go Clean up pendingStreamUpdates in DeleteStream

Test plan

  • GOOS=linux GOARCH=amd64 go build ./...
  • GOOS=linux GOARCH=arm64 go build ./...
  • go test -race ./... — all core packages pass
  • Sustained load benchmark on arm64 VM (verify io_uring Auto mode parity)
  • H2 POST /upload benchmark (verify throughput improvement)

- Replace MSG_PEEK-based protocol detection with inline detection in
  handleRecv, fixing 20x slowdown on arm64 where MSG_PEEK behaves
  differently. Detection now happens on first received data, matching
  epoll's approach.

- Remove dead code: prepRecvPeek, udPeek, msgPeek, handlePeek.

- Eliminate per-DATA-frame map allocation in FlushWindowUpdates by
  writing WINDOW_UPDATE frames directly while iterating pending updates.

- Lower windowUpdateThreshold from 16384 to 1024 so small bodies (4KB)
  get timely flow control credits on multiplexed connections.

- Clean up pendingStreamUpdates in DeleteStream to prevent unbounded
  map growth over long-lived connections.
@FumingPower3925 FumingPower3925 self-assigned this Mar 9, 2026
@FumingPower3925 FumingPower3925 merged commit 05429a7 into main Mar 9, 2026
8 checks passed
@FumingPower3925 FumingPower3925 added this to the v0.3.3 milestone Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: arm64 io_uring detection slowdown and H2 body throughput drop

1 participant