Deflake test_linux_network_stacksmash_64#122
Open
DanielBotnik wants to merge 1 commit into
Open
Conversation
Member
|
Ask Claude Code to try harder ;) |
The test was wrapped in @flaky(max_runs=3) to paper over several timing races in driving the network exploit. Fix the underlying races and drop the wrapper. Root cause of the CI failure (exploit subprocess times out, empty stdout): the target services a connection with a single recv(), but the generated call_shellcode exploit sends the shellcode payload and the follow-up shell commands (e.g. "echo hello\n", "exit\n") back to back. Under load these coalesce into that one recv(), so the commands are consumed before the shell is up, leaving the popped shell to block forever on an empty stdin. Fixes: - call_shellcode: wait SHELL_SPAWN_DELAY seconds after sending the shellcode before sending shell commands, so the payload is the only thing in the target's first read() and the shell is reading by the time the commands arrive. This makes generated network shellcode exploits reliable in general, not just this test. - test: pick guaranteed-free ports (bind to port 0) instead of random ports, which may already be in use (the target sets no SO_REUSEADDR, so its bind() then fails). - test: wait until the target is actually listening (via psutil) before launching the exploit, instead of a fixed time.sleep() that races the server's startup under load. - Drop the now-unnecessary @flaky wrapper and the flaky dependency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9d086e2 to
7938586
Compare
Author
|
Took another pass at this and found the real root cause behind the flakiness:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_linux_network_stacksmash_64was wrapped in@flaky(max_runs=3, min_passes=1)to paper over two races in the phase-2 "run the exploit" step. This replaces the retry wrapper with deterministic synchronization.The exploit itself is not the flaky part:
network_overflowis a non-PIE binary whosetalk()memcpys the received bytes to a fixed.bssaddress (0x4040c0) and the exploit overwrites the saved return address to jump there, so it works independently of ASLR. The flakiness was entirely in how the test launched and connected to the target:random.randint(...)) could already be in use. The target server does not setSO_REUSEADDR, so itsbind()then fails and the test errors out through no fault of the exploit.time.sleep(.5)was used to wait for the server before launching the exploit, whose generated script connects once with no retry. On a loaded CI runner the server can take longer than 0.5s to reachlisten(), so the connection is refused.(Phase 1 doesn't suffer this — archr connects with
retry=30.)Changes
_get_free_tcp_port()— bind to port0to get a port the OS reports as free, instead of a random one that may be taken._wait_until_listening()— poll the socket'sLISTENstate (viapsutil, already a transitive dependency throughangr/pwntools) instead of sleeping a fixed interval. We check state rather than connecting because the targetaccept()s exactly one connection, so a probe connection would be consumed instead of the exploit's.@flakydecorator and the now-unusedflakydependency.Testing
Ran the test with flaky retries disabled (
-p no:flaky):Also verified directly that the old
sleep(0.5)+ no-retry connect fails withConnectionRefusedErroragainst a server that takes >0.5s tolisten(), while_wait_until_listeninghandles it.🤖 Generated with Claude Code