Summary
Under HTTP/2, the server retains a small amount of memory on every request, proportional to the response body size, and never releases it — the growth survives both connection close and a forced GC.gc(), so it is live, reachable memory, not heap retention. The identical workload over HTTP/1.1 is completely flat. This makes any long-running HTTP/2 server that returns non-trivial response bodies grow without bound.
We hit this in a production service that returns ~90 KB JSON responses at a few requests/second over a persistent HTTP/2 connection; RSS climbed continuously until restart. Downgrading the same server code to HTTP.jl 1.x eliminated it.
Environment
- HTTP.jl 2.0.0
- Reseau 1.1.x
- Julia 1.11.3 / 1.11.5
- Linux (x86_64)
Reproduction
Two processes so we can measure the server's own gc_live_bytes (via a /mem endpoint that GCs and reports the server process's live bytes) — this rules out the client as the source.
server.jl:
using HTTP, JSON3
const BIG = Vector{UInt8}(JSON3.write(Dict("col$(c)" => collect(1:200) .+ 0.5 for c in 1:60))) # ~90 KB
handler = function (req::HTTP.Request)
if req.target == "/mem"
GC.gc(); GC.gc()
return HTTP.Response(200; body = string(Base.gc_live_bytes()))
end
return HTTP.Response(200, ["Content-Type" => "application/json"]; body = BIG)
end
server = HTTP.serve!(handler, "127.0.0.1", 18080)
println("ready"); wait(server)
client.jl:
using HTTP
const PROTO = Symbol(get(ENV, "PROTO", "h2")) # h2 or h1
client = HTTP.Client()
server_live() = parse(Int, String(HTTP.get(client, "http://127.0.0.1:18080/mem"; protocol=PROTO, retry=false).body))
fire(n) = for _ in 1:n; HTTP.get(client, "http://127.0.0.1:18080/"; protocol=PROTO, retry=false); end
fire(100); base = server_live()
println("protocol=$PROTO baseline=$(round(base/1048576,digits=2))MB")
for r in 1:8
fire(3000); live = server_live()
println(" after $(r*3000) reqs: server live=$(round(live/1048576,digits=2))MB Δ$(round((live-base)/1024))KB")
end
Run: julia server.jl in one terminal, then PROTO=h2 julia client.jl and PROTO=h1 julia client.jl in another.
Results (server process live bytes, after forced GC)
| requests |
h2 server live Δ |
h1 server live Δ |
| 3,000 |
+992 KB |
0 KB |
| 6,000 |
+1,981 KB |
0 KB |
| 9,000 |
+2,970 KB |
0 KB |
| 12,000 |
+3,961 KB |
0 KB |
| 15,000 |
+4,952 KB |
0 KB |
| 18,000 |
+5,940 KB |
0 KB |
| 21,000 |
+6,928 KB |
0 KB |
| 24,000 |
+7,917 KB |
0 KB |
h2: dead-linear ~330 bytes/request, never recovered. h1 (same server, same body): flat.
Additional facts from bisecting
- Scales with response body size. A tiny (~50 byte) response body leaks negligibly; the 90 KB body leaks ~330 B/request. Suggests per-DATA-frame or per-stream retention rather than full-body retention.
- Independent of connection lifecycle. Churning a fresh connection per request leaks at the same rate, and the server's
active_conns set stays bounded (1–2). So it is not connection accumulation, and closing/recycling connections does not free it.
- Survives
GC.gc() — it is live memory, measured via Base.gc_live_bytes() in the server process.
- Affects both
serve! (buffered) and listen! (streaming) handlers identically — so it's in the shared HTTP/2 server/transport path, not the handler layer.
- HTTP/1.1 is unaffected under all of the above.
Impact
Any long-lived HTTP/2 server returning non-trivial bodies grows unbounded. The only mitigations we found are forcing HTTP/1.1, shrinking responses (proportional slowdown), or periodic restart.
Happy to test patches or provide more diagnostics.
Summary
Under HTTP/2, the server retains a small amount of memory on every request, proportional to the response body size, and never releases it — the growth survives both connection close and a forced
GC.gc(), so it is live, reachable memory, not heap retention. The identical workload over HTTP/1.1 is completely flat. This makes any long-running HTTP/2 server that returns non-trivial response bodies grow without bound.We hit this in a production service that returns ~90 KB JSON responses at a few requests/second over a persistent HTTP/2 connection; RSS climbed continuously until restart. Downgrading the same server code to HTTP.jl 1.x eliminated it.
Environment
Reproduction
Two processes so we can measure the server's own
gc_live_bytes(via a/memendpoint that GCs and reports the server process's live bytes) — this rules out the client as the source.server.jl:client.jl:Run:
julia server.jlin one terminal, thenPROTO=h2 julia client.jlandPROTO=h1 julia client.jlin another.Results (server process live bytes, after forced GC)
h2: dead-linear ~330 bytes/request, never recovered. h1 (same server, same body): flat.
Additional facts from bisecting
active_connsset stays bounded (1–2). So it is not connection accumulation, and closing/recycling connections does not free it.GC.gc()— it is live memory, measured viaBase.gc_live_bytes()in the server process.serve!(buffered) andlisten!(streaming) handlers identically — so it's in the shared HTTP/2 server/transport path, not the handler layer.Impact
Any long-lived HTTP/2 server returning non-trivial bodies grows unbounded. The only mitigations we found are forcing HTTP/1.1, shrinking responses (proportional slowdown), or periodic restart.
Happy to test patches or provide more diagnostics.