Skip to content

Copycat L3 indexer with parallel processing and parent lookups#837

Closed
nikooo777 wants to merge 28 commits into
edgefrom
feat/ln-copycat
Closed

Copycat L3 indexer with parallel processing and parent lookups#837
nikooo777 wants to merge 28 commits into
edgefrom
feat/ln-copycat

Conversation

@nikooo777
Copy link
Copy Markdown
Collaborator

Summary

  • Add L1 TX filtering with owner/tag support, offset loading, and block depth indexing (Rani)
  • Add tests, refactor internals, improve logging, and fix operational issues (James)
  • Add per-block item index with depth tracking and inventory mode
  • Add parallel block processing with shared memory budget (configurable workers + byte-level throttling)
  • Add parent containment index: track which block or bundle contains each item
  • Add ~arweave@2.9/parent=<id> endpoint for parent lookups

How to use

Index blocks

Index a range of blocks at depth 3 (L1 TXs → L2 bundle items → L3 nested items):

curl "http://localhost:8005/~copycat@1.0/arweave?from=1890000&to=1889000&depth=3"

For long-running indexing, use the cron wrapper to avoid HTTP timeout killing the job:

curl "http://localhost:8005/~cron@1.0/once?cron-path=~copycat@1.0/arweave&from=-1&to=1862995&depth=3"

Query the inventory

See what was indexed per block, grouped by depth level:

curl "http://localhost:8005/~copycat@1.0/arweave?from=1890000&to=1889990&mode=inventory"

Example response:

{
  "1890000": {
    "depth": 3,
    "items": {
      "1": ["txid1", "txid2"],
      "2": ["bundleitem1", "bundleitem2"],
      "3": ["nesteditem1"]
    }
  }
}

Look up an item's parent

Find which block or bundle contains a given item:

curl "http://localhost:8005/~arweave@2.9/parent=CwxY--7bsqjtw2lneMUjkmYT9CWAYZvsmsO06dY232g"

Response:

{"parents": [{"type": "bundle", "id": "Rve4-grgOw8jXLVw3f6nUhQvlVn6AYm7wA4I5HeKW74"}]}

What gets indexed

The copycat indexer writes these entries to the index store:

  • Offset index (<item-id> → codec + offset + length): maps each item to its location in the Arweave weave
  • Block marker (block/<height>/depth → integer): records that a block was indexed and to what depth
  • Block item index (block/<height>/items/<depth> → item IDs): lists items found at each depth level
  • Parent index (parent/<item-id> → type + parent ref): maps each item to its containing block (height) or bundle (ID)

Configuration

Key Default Description
arweave_block_workers 3 Max concurrent blocks being processed
arweave_index_workers 1 Max concurrent TXs within a block
copycat_memory_budget 6 GB Global memory pool for concurrent downloads
copycat_memory_cap 6 GB Per-TX hard ceiling (skip if larger)

@nikooo777 nikooo777 force-pushed the feat/ln-copycat branch 5 times, most recently from 9392f30 to ff6f8d4 Compare April 20, 2026 17:33
@nikooo777 nikooo777 changed the base branch from neo/edge to edge April 20, 2026 18:40
@nikooo777 nikooo777 force-pushed the feat/ln-copycat branch 2 times, most recently from dbd6d82 to 6ef9669 Compare April 22, 2026 14:36
Copy link
Copy Markdown
Collaborator

@speeddragon speeddragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review the PR, I will fix this later.

Comment thread src/hb_store_arweave.erl
Comment thread src/dev_copycat_arweave.erl Outdated
Comment thread src/hb_event.erl
Comment thread src/hb_opts.erl Outdated
Comment thread src/hb_store_lmdb_stress.erl Outdated
Comment thread src/dev_copycat_arweave.erl Outdated
Comment thread src/dev_copycat_arweave.erl Outdated
Comment on lines +776 to +782
case is_block_indexed(H, TargetDepth, Opts) of
true -> ok;
false ->
observe_event(<<"block_indexed">>, fun() ->
fetch_and_process_block(H, To, TargetDepth, Opts)
end)
end
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're indexing most up to date block, we should keep indexing and ignore this is_block_indexed.

Comment thread src/dev_copycat_arweave.erl Outdated
%% requested safe depth (defaults to full recursion till the set
%% copycat_depth_recursion_cap).
process_l1_request(TXID, Request, Opts) ->
Depth = request_depth(Request, <<"safe_max">>, Opts),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add force to force requests to be re-indexed in some edge cases.

Comment thread src/dev_copycat_arweave.erl Outdated
Comment thread src/dev_copycat_arweave.erl Outdated
@speeddragon speeddragon force-pushed the feat/ln-copycat branch 2 times, most recently from 0f5341d to 40f1de9 Compare May 4, 2026 19:36
@speeddragon
Copy link
Copy Markdown
Collaborator

speeddragon commented May 11, 2026

Closed in favor of #903, which contain this and other work merged in a new branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants