Skip to content

HTTP/2 server leaks memory proportional to response body size (HTTP/1.1 does not) #1271

@canemon-markov

Description

@canemon-markov

Summary

Under HTTP/2, the server retains a small amount of memory on every request, proportional to the response body size, and never releases it — the growth survives both connection close and a forced GC.gc(), so it is live, reachable memory, not heap retention. The identical workload over HTTP/1.1 is completely flat. This makes any long-running HTTP/2 server that returns non-trivial response bodies grow without bound.

We hit this in a production service that returns ~90 KB JSON responses at a few requests/second over a persistent HTTP/2 connection; RSS climbed continuously until restart. Downgrading the same server code to HTTP.jl 1.x eliminated it.

Environment

  • HTTP.jl 2.0.0
  • Reseau 1.1.x
  • Julia 1.11.3 / 1.11.5
  • Linux (x86_64)

Reproduction

Two processes so we can measure the server's own gc_live_bytes (via a /mem endpoint that GCs and reports the server process's live bytes) — this rules out the client as the source.

server.jl:

using HTTP, JSON3
const BIG = Vector{UInt8}(JSON3.write(Dict("col$(c)" => collect(1:200) .+ 0.5 for c in 1:60))) # ~90 KB
handler = function (req::HTTP.Request)
    if req.target == "/mem"
        GC.gc(); GC.gc()
        return HTTP.Response(200; body = string(Base.gc_live_bytes()))
    end
    return HTTP.Response(200, ["Content-Type" => "application/json"]; body = BIG)
end
server = HTTP.serve!(handler, "127.0.0.1", 18080)
println("ready"); wait(server)

client.jl:

using HTTP
const PROTO = Symbol(get(ENV, "PROTO", "h2"))    # h2 or h1
client = HTTP.Client()
server_live() = parse(Int, String(HTTP.get(client, "http://127.0.0.1:18080/mem"; protocol=PROTO, retry=false).body))
fire(n) = for _ in 1:n; HTTP.get(client, "http://127.0.0.1:18080/"; protocol=PROTO, retry=false); end
fire(100); base = server_live()
println("protocol=$PROTO baseline=$(round(base/1048576,digits=2))MB")
for r in 1:8
    fire(3000); live = server_live()
    println("  after $(r*3000) reqs: server live=$(round(live/1048576,digits=2))MB  Δ$(round((live-base)/1024))KB")
end

Run: julia server.jl in one terminal, then PROTO=h2 julia client.jl and PROTO=h1 julia client.jl in another.

Results (server process live bytes, after forced GC)

requests h2 server live Δ h1 server live Δ
3,000 +992 KB 0 KB
6,000 +1,981 KB 0 KB
9,000 +2,970 KB 0 KB
12,000 +3,961 KB 0 KB
15,000 +4,952 KB 0 KB
18,000 +5,940 KB 0 KB
21,000 +6,928 KB 0 KB
24,000 +7,917 KB 0 KB

h2: dead-linear ~330 bytes/request, never recovered. h1 (same server, same body): flat.

Additional facts from bisecting

  • Scales with response body size. A tiny (~50 byte) response body leaks negligibly; the 90 KB body leaks ~330 B/request. Suggests per-DATA-frame or per-stream retention rather than full-body retention.
  • Independent of connection lifecycle. Churning a fresh connection per request leaks at the same rate, and the server's active_conns set stays bounded (1–2). So it is not connection accumulation, and closing/recycling connections does not free it.
  • Survives GC.gc() — it is live memory, measured via Base.gc_live_bytes() in the server process.
  • Affects both serve! (buffered) and listen! (streaming) handlers identically — so it's in the shared HTTP/2 server/transport path, not the handler layer.
  • HTTP/1.1 is unaffected under all of the above.

Impact

Any long-lived HTTP/2 server returning non-trivial bodies grows unbounded. The only mitigations we found are forcing HTTP/1.1, shrinking responses (proportional slowdown), or periodic restart.

Happy to test patches or provide more diagnostics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions