Deflake test_linux_network_stacksmash_64 by DanielBotnik · Pull Request #122 · angr/rex

DanielBotnik · 2026-05-28T22:17:03Z

Summary

test_linux_network_stacksmash_64 was wrapped in @flaky(max_runs=3, min_passes=1) to paper over two races in the phase-2 "run the exploit" step. This replaces the retry wrapper with deterministic synchronization.

The exploit itself is not the flaky part: network_overflow is a non-PIE binary whose talk() memcpys the received bytes to a fixed .bss address (0x4040c0) and the exploit overwrites the saved return address to jump there, so it works independently of ASLR. The flakiness was entirely in how the test launched and connected to the target:

Random ports (random.randint(...)) could already be in use. The target server does not set SO_REUSEADDR, so its bind() then fails and the test errors out through no fault of the exploit.
time.sleep(.5) was used to wait for the server before launching the exploit, whose generated script connects once with no retry. On a loaded CI runner the server can take longer than 0.5s to reach listen(), so the connection is refused.

(Phase 1 doesn't suffer this — archr connects with retry=30.)

Changes

_get_free_tcp_port() — bind to port 0 to get a port the OS reports as free, instead of a random one that may be taken.
_wait_until_listening() — poll the socket's LISTEN state (via psutil, already a transitive dependency through angr/pwntools) instead of sleeping a fixed interval. We check state rather than connecting because the target accept()s exactly one connection, so a probe connection would be consumed instead of the exploit's.
Remove the @flaky decorator and the now-unused flaky dependency.

Testing

Ran the test with flaky retries disabled (-p no:flaky):

30/30 passes back-to-back.
5/5 passes under full-CPU stress (the slow-startup condition that triggered the original flake).

Also verified directly that the old sleep(0.5) + no-retry connect fails with ConnectionRefusedError against a server that takes >0.5s to listen(), while _wait_until_listening handles it.

🤖 Generated with Claude Code

ltfish · 2026-05-29T00:25:26Z

Ask Claude Code to try harder ;)

@flaky

The test was wrapped in @flaky(max_runs=3) to paper over several timing races in driving the network exploit. Fix the underlying races and drop the wrapper. Root cause of the CI failure (exploit subprocess times out, empty stdout): the target services a connection with a single recv(), but the generated call_shellcode exploit sends the shellcode payload and the follow-up shell commands (e.g. "echo hello\n", "exit\n") back to back. Under load these coalesce into that one recv(), so the commands are consumed before the shell is up, leaving the popped shell to block forever on an empty stdin. Fixes: - call_shellcode: wait SHELL_SPAWN_DELAY seconds after sending the shellcode before sending shell commands, so the payload is the only thing in the target's first read() and the shell is reading by the time the commands arrive. This makes generated network shellcode exploits reliable in general, not just this test. - test: pick guaranteed-free ports (bind to port 0) instead of random ports, which may already be in use (the target sets no SO_REUSEADDR, so its bind() then fails). - test: wait until the target is actually listening (via psutil) before launching the exploit, instead of a fixed time.sleep() that races the server's startup under load. - Drop the now-unnecessary @flaky wrapper and the flaky dependency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DanielBotnik · 2026-05-29T09:26:49Z

Took another pass at this and found the real root cause behind the flakiness:

The target services a connection with a single recv(), but the generated call_shellcode exploit sent the shellcode payload and the follow-up commands (echo hello\n, exit\n) back to back — under load they coalesce into that one read, so the commands get consumed before the shell is up and it blocks forever on empty stdin (the 30s timeout you saw).
Fix: pace the exploit — wait SHELL_SPAWN_DELAY after the payload before sending commands, so the payload is alone in the first read.
Also made the test deterministic: guaranteed-free ports (no EADDRINUSE) and wait-until-listening via psutil instead of a fixed sleep(.5).
Dropped the @flaky wrapper and dependency since the races are now fixed at the source.
Reproduced the hang deterministically and verified the paced exploit works; the call_shellcode change also makes real network shellcode exploits more reliable, not just this test.

DanielBotnik force-pushed the fix-flaky-network-stacksmash-test branch from 9d086e2 to 7938586 Compare May 29, 2026 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deflake test_linux_network_stacksmash_64#122

Deflake test_linux_network_stacksmash_64#122
DanielBotnik wants to merge 1 commit into
angr:masterfrom
DanielBotnik:fix-flaky-network-stacksmash-test

DanielBotnik commented May 28, 2026

Uh oh!

ltfish commented May 29, 2026

Uh oh!

DanielBotnik commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DanielBotnik commented May 28, 2026

Summary

Changes

Testing

Uh oh!

ltfish commented May 29, 2026

Uh oh!

DanielBotnik commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants