Skip to content

Copycat L3 indexer and GraphQL#903

Draft
speeddragon wants to merge 61 commits into
edgefrom
feat/new-copycat
Draft

Copycat L3 indexer and GraphQL#903
speeddragon wants to merge 61 commits into
edgefrom
feat/new-copycat

Conversation

@speeddragon
Copy link
Copy Markdown
Collaborator

@speeddragon speeddragon commented May 11, 2026

This PR continues #837. It extends the Arweave copycat L3 indexer with pending TX indexing, parent containment lookups, and Arweave-compatible GraphQL queries backed by locally indexed TX headers.

Summary

Copycat L3 indexer

  • Adds ~copycat@1.0/arweave block-range indexing for Arweave L1 TXs, L2 bundle items, and nested bundle items.
  • Records per-block index depth and per-depth item IDs so indexed blocks can be inspected without re-fetching block contents.
  • Adds mode=list and mode=inventory for local index inspection.
  • Adds explicit L1 TX indexing with id=..., optional query-l1-offset=true, and L1-level owner/tag filtering on that explicit id=... path.
  • Adds parallel block processing and per-block TX processing with configurable worker counts.
  • Adds a shared copycat memory budget for full L1 bundle reads.
  • Adds a parent containment index and ~arweave@2.9/parent=<id> for looking up whether an item is contained by a block or bundle.

Pending TX / mempool indexing

  • Adds mode=mempool to scan /tx/pending and index reachable unconfirmed TXs.
  • Adds optional mempool sender=<address> filtering.
  • Writes pending TX offsets as pending roots and pending bundle children as parent-relative offsets.
  • Discovers bundle children from raw pending bytes.
  • Normalizes chunk reads across confirmed offsets and pending relative offsets.
  • Adds retry/progress hooks for transient pending chunk failures.

Offset index format

  • Updates the Arweave offset index encoding to support:
    • confirmed entries with a global offset
    • pending TX roots
    • relative entries that point at a parent ID plus an offset inside that parent
  • Keeps the compact binary encoding for confirmed entries and reduces storage for child items that can be represented relative to a parent.

GraphQL

  • Extends ~query@1.0/graphql Arweave transaction support.
  • Adds fee { winston ar }.
  • Supports owner-based transaction filtering from local commitments.
  • Makes GraphQL tag-name matching case-insensitive after key normalization.
  • Isolates filter failures so one unsupported or failing filter does not abort the rest of the match pipeline.
  • Writes TX/data-item headers into the local store during copycat indexing so GraphQL filters can be served locally without a separate header-indexing pass.

Storage / infrastructure

  • Moves shared Arweave-index helpers into hb_store_arweave.
  • Adds parent-index read/write helpers and compact parent entry decoding.
  • Adds block marker, block item, and marker cutover helpers to the Arweave store layer.
  • Improves malformed item/tag handling so bad data is logged and skipped instead of crashing the whole scan path.
  • Stops latest_height from silently returning 0 on network errors.
  • Adds LMDB monitor shutdown handling and an overlay_count Prometheus gauge for pending LMDB overlay writes.
  • Fails index metadata writes fast instead of continuing after a failed block item or marker write.

Operator Notes

  • The offset index now writes relative and pending-root entries. Current code can read confirmed entries in the old compact shape, but rollback to older code after writing new relative/pending entries is unsafe.

How To Use

Index blocks

Index a range of blocks at depth 3:

curl "http://localhost:8005/~copycat@1.0/arweave?from=1890000&to=1889000&depth=3"

Depth 2 indexes L1 TXs plus direct L2 bundle items through the lightweight block path. Depth 3 adds nested bundle recursion. If depth is omitted, copycat uses depth=full, capped by copycat-depth-recursion-cap.

For long-running indexing, run it through cron so the HTTP request does not own the indexing job lifetime:

curl "http://localhost:8005/~cron@1.0/once?cron-path=~copycat@1.0/arweave&from=-1&to=1862995&depth=3"

Index one L1 TX

curl "http://localhost:8005/~copycat@1.0/arweave?id=<txid>&depth=3&query-l1-offset=true"

Optional owner filter:

curl "http://localhost:8005/~copycat@1.0/arweave?id=<txid>&depth=3&include-owner=<address>"

Optional tag filters:

curl "http://localhost:8005/~copycat@1.0/arweave?id=<txid>&depth=3&include-tag=Content-Type:text/html"
curl "http://localhost:8005/~copycat@1.0/arweave?id=<txid>&depth=3&exclude-tag=Bundle-Format:binary"

Index the mempool

curl "http://localhost:8005/~copycat@1.0/arweave?mode=mempool"

Optional sender filter:

curl "http://localhost:8005/~copycat@1.0/arweave?mode=mempool&sender=<address>"

Inspect indexed blocks

Counts per depth:

curl "http://localhost:8005/~copycat@1.0/arweave?from=1890000&to=1889990&mode=list"

Full item IDs per depth:

curl "http://localhost:8005/~copycat@1.0/arweave?from=1890000&to=1889990&mode=inventory"

Example inventory response:

{
  "1890000": {
    "depth": 3,
    "items": {
      "1": ["txid1", "txid2"],
      "2": ["bundleitem1", "bundleitem2"],
      "3": ["nesteditem1"]
    }
  }
}

Look up an item's parent

curl "http://localhost:8005/~arweave@2.9/parent=<item-id>"

Bundle-contained item:

{"parents":[{"type":"bundle","id":"<parent-txid>"}]}

L1 transaction contained by a block:

{"parents":[{"type":"block","height":1890000}]}

Query via GraphQL

curl -X POST "http://localhost:8005/~query@1.0/graphql" \
  -H 'content-type: application/json' \
  -d '{"query":"{ transactions(owners:[\"<address>\"]) { edges { node { id fee { winston ar } tags { name value } } } } }"}'

What Gets Indexed

  • Offset entries: <item-id> to codec, offset, and length. Offsets may be global, pending-root, or relative to another indexed ID.
  • Block marker: block/<height>/depth records the achieved indexed depth.
  • Block item index: block/<height>/items/<depth> stores concatenated raw item IDs for each depth.
  • Parent index: parent/<item-id> stores compact entries decoded as either {block, height} or {bundle, parent-id}.
  • Header cache entries: indexed TX/data-item headers are written to the local store when index-headers is enabled.
  • Marker cutover: block/marker-cutover-height marks when block depth markers become authoritative over legacy per-TX fallback checks.

Configuration

Key Default Description
arweave-block-workers 3 Max concurrent blocks processed in a batch
arweave-index-workers 1 Max concurrent TXs processed within a block or mempool scan
copycat-memory-budget 6 GiB Shared byte budget for full L1 bundle reads
copycat-depth-recursion-cap 6 Max depth used when depth=full or an oversized depth is requested
copycat-scope [offset,parent] Which copycat indexes to write
index-headers true Write indexed TX/data-item headers to local store for GraphQL
arweave-mempool-progress false Emit verbose mempool/pending-chunk progress events
arweave-pending-chunk-poll-attempts 0 Retry count for pending chunk reads
arweave-pending-chunk-poll-ms 500 Delay between pending chunk retries
arweave-pending-chunk-poll-min-ms 20000 Minimum retry window when pending chunk polling is enabled

Use the hyphenated config keys shown above in node config.

Pending chunk polling is disabled unless arweave-pending-chunk-poll-attempts is greater than 0; the delay and minimum-window keys only matter once polling is enabled.

Validation

  • rebar3 device test --device-roots dev_copycat passes: all 65 tests.
  • Range/list/auto-stop tests pin setup writes to depth=2 because they validate block markers, listing, and stop behavior, not full nested recursion.

speeddragon and others added 28 commits May 20, 2026 01:11
- Return tagged tuples from latest_height and normalize_height
- Propagate errors through parse_range using maybe block
- Return {error, unavailable} (HTTP 503) on upstream failures
- Validate resolved heights are non-negative in parse_range
- Log original upstream error reason before collapsing to unavailable
- Add regression tests with mock server for both failure paths
charmful0x and others added 29 commits May 20, 2026 01:11
- dev_copycat_arweave: cap bundle_header reads at Size
- dev_copycat_arweave: route through lib_arweave_common
- hb_store: dedupe start/stop/scope via COMMON_POLICIES
- the previous code called into a module that no longer exists
- routes the tag lookup to the existing helper instead
- tag include and exclude filtering works again
- keep range/list/auto-stop setup writes at direct bundle-item depth
- avoid exercising full recursive chunk indexing in marker-focused tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants