Skip to content

Some pipeline fixes in Bitreq#584

Open
ErikDeSmedt wants to merge 4 commits into
rust-bitcoin:masterfrom
ErikDeSmedt:bitreq-pipelining-fixes
Open

Some pipeline fixes in Bitreq#584
ErikDeSmedt wants to merge 4 commits into
rust-bitcoin:masterfrom
ErikDeSmedt:bitreq-pipelining-fixes

Conversation

@ErikDeSmedt
Copy link
Copy Markdown

@ErikDeSmedt ErikDeSmedt commented May 6, 2026

We switched recently to bitcoind-asyn-client and I saw some panics and requests getting stuck.. See #583

I identified two issues

  • A counter overflow on next_request_id which results in panics
  • If a connection goes bad we throw-away some data. The result is that open requests stay open. They never succeed nor fail.

I didn't take the time to get deeply familiar with how bitreq works.
I'm happy to see suggestions on other approaches to fix the problem.

Comment thread bitreq/src/connection.rs Outdated

request_id = conn.next_request_id.fetch_add(1, Ordering::Relaxed);
request_id = conn
.next_request_id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of weird platform are you running on that you overflowed this? ISTM the right solution is just make it an AtomicU64 for the broken machines trying to use this on 32-bit.

Copy link
Copy Markdown
Author

@ErikDeSmedt ErikDeSmedt May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using amd64. I don't see how this can be platform related.

When some failures occur conn.next_request_id is set to Usize::MAX.
I understand the intent of setting usize::Max is to stop using this connection.

However, AtomicUsize overflows on any platform according to the docs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I misunderstood that this was counting past to overflow, forgot it was writing MAX elsewhere.

Comment thread bitreq/src/connection.rs Outdated
request_id = conn.next_request_id.fetch_add(1, Ordering::Relaxed);
request_id = conn
.next_request_id
.fetch_update(Ordering::Relaxed, Ordering::Relaxed, |current| {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, wouldn't this be simpler by just changing the sentinel to usize::MAX / 2 and accepting the race isn't going to manage to run through usize::MAX / 2 elements before the first thread can get into retry_new_connection and re-write the sentinel value? Avoids the read+CAS in the optimistic case.

@ErikDeSmedt ErikDeSmedt force-pushed the bitreq-pipelining-fixes branch from 2f25cfc to 729cbd6 Compare May 7, 2026 14:20
Adds an integration test that spins up a raw-TCP server returning a
malformed HTTP response where `keep-alive: max 1`.

Marked `#[ignore]` for now: it hangs on the current code.

Subsequent commits in this series fix the underlying bugs and the final
commit removes the `#[ignore]`.

HINT: You can run

cargo test -p bitreq --features async --test pipelining -- --ignored

to run this test and verify that it fails on master.
Pipelined `send_async` calls on a single `Client` could hang
indefinitely on responses the server had already sent.

`Response::create_async` wrapped its input in a fresh `BufReader`. On a
busy socket, that buffer could prefetch bytes for the next pipelined
response; when the function returned, those bytes were dropped with it,
stranding the next reader.

Move the `BufReader` into `AsyncConnectionState::read` so it lives as
long as the connection.
@ErikDeSmedt ErikDeSmedt force-pushed the bitreq-pipelining-fixes branch from 729cbd6 to ee002eb Compare May 7, 2026 14:30
@ErikDeSmedt
Copy link
Copy Markdown
Author

I added a test to reproduce some of the issues I observed.
I created a server that only allows one request per connection using a Keep-Alive: max=1-header.

The old code set next_request_id to usize::Max - 1 to indicate that we can only make one request on that connection.
I often ended up in an infinite loop where we kept waiting for readable_request_id to catch-up to this very high value.

I switched the approach a bit.

The next_request_id is now just a counter that increments on every request.
I introduced permits which is the number of requests that we are still allowed to make.

Comment thread bitreq/src/connection.rs Outdated
readable_request_id: AtomicUsize,
/// Set when the read half is in an indeterminate state and pending pipelined readers must
/// abort and retry on a fresh connection.
reads_dead: AtomicBool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why should we not just keep using readable_request_id? I'm not sure "more atomics" is really more readable since it now means we have to carefully review for cases where the atomic reads splitbrain.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It doesn't help readability.

Comment thread bitreq/src/connection.rs Outdated
/// Remaining permits for sending new requests. Acquired via `fetch_sub(1)`; an underflow is
/// detected as `prev == 0 || prev > MAX_SEND_PERMITS`. Poisoned by storing `0`; capped by
/// keep-alive `max=N` via `fetch_min(N)`.
permits: AtomicUsize,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be more readable as an AtomicIsize where we can just check against 0.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That is a very good suggestion

ErikDeSmedt and others added 2 commits May 12, 2026 15:30
Track per-connection send admission in a dedicated permits counter on
AsyncConnectionState. permits starts at usize::MAX/2 and is consumed
via fetch_sub(1); a wrap (returned previous value 0, or already
> MAX_PERMITS) signals the connection is exhausted or poisoned.
Poisoning becomes permits.store(0) and the keep-alive max=N directive
maps to permits.fetch_min(N).

next_request_id is now a pure monotonic counter used only for
pipelining order. Indeterminate read state moves to a sibling
reads_dead AtomicBool so readable_request_id can also stay a pure
counter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ErikDeSmedt ErikDeSmedt force-pushed the bitreq-pipelining-fixes branch from ee002eb to 11098da Compare May 12, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants