fix: raise phantom container memory limit from 2G to 8G#52
Merged
Conversation
The 2 GiB cgroup ceiling OOM-killed Claude Code judge subprocesses under evolution load. A post-session evolution cycle spawns up to five concurrent bun + cli.js subprocesses via runJudgeQuery (observation, regression, constitution, safety, consolidation), each holding 300 to 500 MiB RSS, on top of the main phantom process and whatever agent query subprocess is in flight. Peak concurrent demand is 2.5 to 4 GiB, which crossed the 2 GiB ceiling and triggered SIGKILLs that phase 1's runJudgeQuery caught and reported as "Claude Code process terminated by signal SIGKILL", failing closed on safety and constitution gates and dropping to heuristics on observation and regression. Raising the limit to 8 GiB gives generous headroom for peak judge concurrency on a host with 30 GiB total (Hetzner CX53 default), leaving 14 GiB free after phantom (8G), qdrant (4G) and ollama (4G) caps. Reservation bumped from 256 MiB to 512 MiB to match the healthier steady-state baseline. Root cause observed on the wehshi VM: the SIGKILL cascade began within 20 minutes of enabling LLM judges, journalctl kernel log showed "Memory cgroup out of memory" events charged to the phantom container's memcg, and docker stats reported phantom pinned at 2 GiB / 2 GiB at 99.98 percent while the host sat at 27 GiB free.
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docker-compose.yamlanddocker-compose.user.yamlRoot cause
The 2 GiB cgroup ceiling could not hold peak LLM-judge concurrency. A post-session evolution cycle spawns up to five concurrent
bun+cli.jssubprocesses viarunJudgeQuery(observation, regression, constitution, safety, consolidation). Each judge subprocess holds 300 to 500 MiB RSS, and they run on top of the main phantom process plus whatever agent query subprocess is in flight. Peak concurrent demand lands between 2.5 and 4 GiB, which crosses the 2 GiB ceiling and triggers the container's memcg OOM killer.Phase 1's
runJudgeQuerycatches the resultingSIGKILL, the engine correctly fails closed on safety and constitution gates, and the other judges fall back to heuristic, so the main phantom process never crashes. But every LLM judge call after the first kill fails, which defeats the point of having judges enabled at all.Evidence captured live
Observed on the wehshi Specter VM within 20 minutes of enabling LLM judges (post
claude login+ restart):docker stats:phantom 2GiB / 2GiB 99.98% 178.54%docker inspect phantom .HostConfig.Memory:2147483648journalctl -k: repeatedMemory cgroup out of memory: Killed process <pid> (bun)events charged to the phantom container'soom_memcg, withanon-rssper killed subprocess ranging 99 MiB to 502 MiBClaude Code process terminated by signal SIGKILLmessages from observation, regression, constitution, safety, and consolidation judgesfree -h: 30 GiB total, 27 GiB available at the time of the kills, so this was strictly a container cap, not a VM sizing problemSizing rationale
Hetzner CX53 (the Specter default) ships with 30 GiB RAM. With phantom at 8 GiB, qdrant at 4 GiB, and ollama at 4 GiB, total committed ceilings are 16 GiB, leaving 14 GiB of host headroom for the OS, Docker daemon, and any transient bursts. Actual steady-state phantom RSS is well under 1 GiB, so the 8 GiB cap is a generous upper bound rather than a sustained reservation.
Test plan
docker compose up -d phantomon wehshi against the new composedocker inspect phantom --format '{{.HostConfig.Memory}}'reports8589934592docker statssteady-state shows phantom usage under the new ceilingSIGKILLlog lines fromrunJudgeQueryjournalctl -k --since "10 min ago"shows no newMemory cgroup out of memoryevents charged to the phantom memcgdocker compose up -d phantomon eachNotes
Both compose files are updated because new Specter deploys use
docker-compose.user.yaml(Docker Hub image), while source-built deploys usedocker-compose.yaml. Keeping them consistent means every future deployment path inherits the new ceiling.