Skip to content

Comments

feat/pdp shovel indexer#617

Open
parkan wants to merge 12 commits intomainfrom
feature/pdp-shovel-indexer
Open

feat/pdp shovel indexer#617
parkan wants to merge 12 commits intomainfrom
feature/pdp-shovel-indexer

Conversation

@parkan
Copy link
Collaborator

@parkan parkan commented Feb 19, 2026

No description provided.

Embed Shovel as a Go library to index PDPVerifier contract events into
Postgres tables, replacing the linear scan of all proof sets that made
4+ RPC calls per proof set per cycle.

The tracker now reads from Shovel's inbox tables (transient event rows),
materializes state into a new pdp_proof_sets table and existing Deal
records, then deletes processed rows. Remaining RPC calls are bounded:
one getDataSetListener per new proof set, one getActivePieces per
PiecesAdded/Removed event.
@parkan parkan force-pushed the feature/pdp-shovel-indexer branch from 0e68f4e to 60def83 Compare February 19, 2026 16:13
parkan and others added 5 commits February 19, 2026 18:07
also add ./service/pdptracker/... to CI integration test path
Shovel only defaults PollDuration to 1s during JSON unmarshal,
not when the struct is constructed directly. Zero value propagates
to jrpc2.Client and panics in time.NewTicker(0).
@parkan
Copy link
Collaborator Author

parkan commented Feb 19, 2026

NOT READY TO MERGE, we have P0s in here, review only for general architecture not specific landmines

clears shovel cursor and truncates integration tables so the
indexer restarts from the PDPVerifier deployment block. requires
an archival RPC endpoint since public nodes limit lookback depth.

also fixes Start block propagation to shovel tasks: the integration
source refs were only carrying the Name field, so WithRange always
got Start=0 and shovel fell through to "start at latest".

deploy blocks: mainnet 5441432, calibnet 3140755 (from filfox).
@parkan parkan force-pushed the feature/pdp-shovel-indexer branch from eb3a4c8 to c49adb9 Compare February 20, 2026 07:58
@parkan parkan marked this pull request as ready for review February 20, 2026 08:08
@parkan
Copy link
Collaborator Author

parkan commented Feb 20, 2026

ok, this is ready to review, tradeoffs:

  • shovel requires postgres or a postgres-like API; to be honest you should really be using that anyway in prod, the only feature this blocks is pdp tracking for mysql users which I think is a fair tradeoff
  • the --full-sync options does require access to an archival node, since glif only allows ~17hrs of lookback; new deals are unaffected by this and any downtime <16:40 is also unaffected (can catch up)

the upside is that this is... scalable and performant, we only make 1 bare RPC call per proofset creation and subsequently rely only on logs

in my mind this is a valid set of tradeoffs

@parkan parkan requested a review from anjor February 20, 2026 08:20
}

// failed rows are retained for retry on next cycle
func processInbox[R inboxRow](db *gorm.DB, query, table string, fn func(R) error) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inbox pattern (SELECT rows, process, then broad DELETE) can race with concurrent inserts from Shovel and delete rows we never processed.
Can we switch to deleting only the exact claimed rows (by deterministic event key) in the same transaction, instead of DELETE FROM table / NOT IN failed?


func (r nextProvingPeriodRow) setID() uint64 { return r.SetID_ }

func processNextProvingPeriod(ctx context.Context, db *gorm.DB) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pdp_proof_sets doesn’t contain set_id yet, this update affects 0 rows but still returns nil, and the inbox row gets deleted.
That can lose ordering-dependent events when DataSetCreated is delayed.
Can we retain this row by checking existence / rows affected and returning an error when proof set is missing?


func (r possessionProvenRow) setID() uint64 { return r.SetID_ }

func processPossessionProven(ctx context.Context, db *gorm.DB) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as NextProvingPeriod: missing proof set currently results in “successful no-op update” and row deletion.
Can we make missing proof set a retry condition (return error), so these events are retained until DataSetCreated materializes the set?


func processDataSetCreated(ctx context.Context, db *gorm.DB, rpcClient *ChainPDPClient) error {
return processInbox(db,
"SELECT set_id, storage_provider, block_num FROM pdp_dataset_created",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add deterministic ordering (ORDER BY block_num, tx_idx, log_idx, abi_idx) to preserve on-chain event order?
Without ordering, state outcomes can depend on DB row return order when multiple events are pending.

(this applies to all the sql queries)

@anjor
Copy link
Collaborator

anjor commented Feb 20, 2026

Strong architectural improvement; not approving yet due to event-loss edge cases. Please fix atomic dequeue + missing-proofset retention, and add deterministic ordering. Ready after that i think

@parkan
Copy link
Collaborator Author

parkan commented Feb 20, 2026

yep all good points, have been hunting these down (similar issues but under slightly different scenarios) but there's a lot -- will update shortly

@parkan
Copy link
Collaborator Author

parkan commented Feb 20, 2026

ok I have cleanup but I'm working on 3 separate projects at once and my head is hurting so I'm prone to make mistakes, it being friday afternoon I am going to take the safe route and defer fixes until monday

- delete by exact event key (block_num, tx_idx, log_idx) instead of
  broad DELETE to prevent racing with concurrent shovel inserts
- check RowsAffected on proofset updates; retain inbox rows when
  proofset not yet materialized
- add ORDER BY block_num, tx_idx, log_idx to all inbox queries
@parkan
Copy link
Collaborator Author

parkan commented Feb 21, 2026

@anjor well I lied and apparently I like working on weekends 🫠

feeling ok about current state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants