fix(gl): paginate gl-clone Arweave recovery and make /encrypted-blobs parsing schema-strict (#49)#70
Conversation
Deserialize the /encrypted-blobs 200 body into a typed EncryptedBlobsResponse instead of poking at a serde_json::Value and defaulting blobs to [] via unwrap_or_default(). A server schema regression now fails closed (a hard parse error) rather than silently yielding nothing to recover, mirroring the existing WithheldPathsResponse convention. The sole caller still does not abort the clone but now surfaces the error instead of swallowing it into an empty result. Part of #49.
…eight Discovery hard-coded a single first:100 GraphQL query with no pagination, so a repo with more than 100 encrypted-manifest anchors (one per push) silently lost recovery coverage for any blob whose most recent seal fell outside that window. Follow cursor pagination (after + pageInfo) until coverage is complete, deduping transaction ids and bounding the loop (1000 pages / 10000 ids, degenerate-cursor guard) so a hostile or buggy gateway cannot grow the downstream per-id fetch loop without limit. A full page returned without pageInfo, a cap hit, or a failure after discovery began now warns; a page-1 failure (unconfigured/unreachable gateway) stays silent as before. Pagination concatenates pages in gateway order, which made merge_manifests' equal-timestamp tie-break (previously insertion-order, i.e. accidentally newest on single-page HEIGHT_DESC) gateway-dependent. Rank candidates by the composite key (timestamp, Arweave block height, cid): timestamp primary keeps the existing distinct-timestamp latest-wins; block height breaks equal-timestamp ties by anchor recency (a not-yet-mined anchor counts as newest), preserving the pre-pagination behavior; cid is a final deterministic tiebreak for same-block anchors. Height is parsed defensively so a present-but-unreadable height (e.g. a gateway stringifying large ints) cannot masquerade as a pending/newest anchor. Part of #49.
…/dedup Code-review follow-ups on the #49 recovery change: - Cap edges parsed per GraphQL page at ARWEAVE_PAGE_SIZE so a hostile gateway returning one oversized page cannot defeat the MAX_TX_IDS budget or flood the downstream per-id manifest fetch loop (adversarial review). - Collapse the two identical gateway-failure arms in the discovery loop into a single warn site, and hoist PAGE_SIZE to a named const tied to the query's first:100 (maintainability). - Add tests: per-page edge bound, non-advancing-cursor termination (infinite-loop guard), and cross-page tx-id dedup fetched exactly once (testing). Part of #49.
…-size invariant Round-2 review follow-ups: - The non-advancing-cursor test bound its mock to a _-prefixed handle with expect_at_most(3) and never asserted it; mockito defaults assert_on_drop=false, so the test passed even if the guard were removed. Bind it, tighten to expect(2) (the guard breaks on the second page, so discovery makes exactly two POSTs), and assert_async. - Correct the parse_tx_page docstring: the per-page take() bounds the TxRef vec and fetch loop, not the response-body deserialization (an unbounded-body concern shared by every gateway read here). - Hoist the discovery query to a const and add query_first_matches_page_size so the query first:N literal and ARWEAVE_PAGE_SIZE are kept equal by a test rather than a comment. Part of #49.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthrough
Changesgl clone recovery hardening
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
gl clone's Arweave fallback recovery dropped private blobs whose most recent seal fell outside the first 100 manifest anchors, and treated an/encrypted-blobsschema regression as "nothing to recover". This paginates discovery and makes the parse fail closed.Motivation & context
Closes #49
Surfaced by CodeRabbit on #46; this is the B3 Arweave-recovery code from #36. Two defects in
crates/gl/src/clone.rs:first:100GraphQL query with no pagination. One manifest is anchored per push, so a repo with more than 100 pushes silently lost recovery coverage for any blob whose latest seal fell outside that window, even though the blob is live at HEAD./encrypted-blobsparsed the 200 body into aserde_json::Valueand defaultedblobsto[]viaunwrap_or_default(), so a missing or renamed field yielded a silent empty recovery instead of an error. The sibling/withheld-pathspath already deserializes a typed struct and errors on drift.Kind of change
What changed
Crate touched:
gl./encrypted-blobs200 body (EncryptedBlobsResponse) so schema drift is a hard error instead of a silent empty result, mirroring the existingWithheldPathsResponseconvention. The sole caller surfaces the error rather than swallowing it intounwrap_or_default().pageInfo+after) until coverage is complete, replacing the singlefirst:100query.How a reviewer can verify
cargo test -p gl --bin gl clone:: cargo clippy --workspace --all-targets -- -D warnings30 clone tests cover the strict-parse matrix, multi-page discovery, cross-page dedup (fetched once), non-advancing-cursor termination, the per-page bound, and the block-height tie-break cases (distinct height, pending anchor, same-block, empty timestamp).
Before you request review
cargo test --workspacepasses locallycargo fmt --allandcargo clippy --workspace --all-targets -- -D warningsare cleanfeat(...),fix(...),docs(...)).env.exampleupdated if behavior or config changed (or N/A)Notes for reviewers
/encrypted-blobsendpoint are unchanged; this is read-path only.clone.rswas already over 1000 lines before this change; extracting the discovery loop into a helper to bring it back down was raised in review and deferred to keep this PR focused.Summary by CodeRabbit
Release Notes
Bug Fixes
Tests