Skip to content

consensus/bor: guard nil sprint header in CommitStates#2181

Open
LarryArnault45 wants to merge 1 commit into0xPolygon:developfrom
LarryArnault45:fix/bor-commitstates-nil-sprint-header
Open

consensus/bor: guard nil sprint header in CommitStates#2181
LarryArnault45 wants to merge 1 commit into0xPolygon:developfrom
LarryArnault45:fix/bor-commitstates-nil-sprint-header

Conversation

@LarryArnault45
Copy link
Copy Markdown
Contributor

Description

In the pre-Indore path of CommitStates, the state-sync upper timestamp was derived from the sprint-start header time. The previous code dereferenced GetHeaderByNumber(...).Time directly, without checking whether the header existed.

Under ancient pruning conditions, that sprint-start header may be unavailable. In that case, the old flow could hit a nil dereference and panic on the consensus finalization path.

This update adds only the required safety checks:

  1. compute sprintLength once;
  2. guard number < sprintLength and return a clear error to avoid underflow;
  3. load the sprint-start header, validate it is non-nil, and return an explicit error when missing before reading Time.

Behavior is unchanged when data is present; only crash-prone branches now fail with explicit errors instead of panicking.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

N/A

Nodes audience

No node-segmented rollout is required.

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
  • In case link the PR here:
  • This PR requires changes to matic-cli
  • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

  1. gofmt -w consensus/bor/bor.go
  2. go test ./consensus/bor -run TestCommitStates -count=1
  3. go vet ./consensus/bor
  4. go test ./consensus/bor/... -count=1
  5. make lint

Additional comments

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 2, 2026

Copy link
Copy Markdown
Member

@manav2401 manav2401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you encountered errors in pre-indore blocks due to this?

I don't think this case will ever happen due to following reasons.

  1. The first condition of number being less than sprintLength will never occur as first state-sync occurs at the start of first sprint. Hence, the condition is never less than 0.
  2. CommitStates is called when your node is syncing and processing that block which has state-sync event. Fetching a slightly old block (16 blocks before current block) shouldn't fail due to ancient pruning - as ancient pruning is for blocks older than 90k from latest block. Hence, pruning a block which is just 16 blocks older is not possible.

@LarryArnault45
Copy link
Copy Markdown
Contributor Author

Hi @manav2401 , thanks for the review.

I agree this is unlikely in normal pre-Indore sync, and I haven’t reproduced a real mainnet crash with default pruning. My goal here is just defensive hardening: if GetHeaderByNumber ever returns nil, the old code panics on .Time; this change turns that into a normal error path without affecting healthy flow.

If you want, I can remove the number < sprintLength check and keep only the nil-header guard.

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants