Skip to content

Deterministic state syncs#572

Open
marcello33 wants to merge 33 commits intodevelopfrom
mardizzone/POS-3441_deterministic-ss
Open

Deterministic state syncs#572
marcello33 wants to merge 33 commits intodevelopfrom
mardizzone/POS-3441_deterministic-ss

Conversation

@marcello33
Copy link
Copy Markdown
Collaborator

@marcello33 marcello33 commented Mar 31, 2026

Description

Implements deterministic, height-pinned state sync querying for the clerk module.
Events get a one-block-delayed visibility_time and visibility_height assigned in PreBlocker, so that during Heimdall halts, pending events stay excluded and all validators derive identical query results. A combined gRPC/REST endpoint allows Bor/Erigon to fetch deterministic state syncs in a single call.

Changes

  • Visibility system: Events stored in block H get visibility_time = blockTime(H+1) via ProcessPendingVisibilityEvents in PreBlocker. During halts, no new blocks arrive, so pending events stay invisible.
  • New endpoints:
    - GetBlockHeightByTime (/clerk/block-height-by-time): resolves cutoff_time → greatest committed Heimdall height via binary search on reverse index.
    - GetRecordListVisibleAtHeight (/clerk/state-syncs-at-height): fetches events visible at a specific height using immutable visibility_height indexes
    - GetStateSyncsByTime (/clerk/state-syncs-by-time): combined endpoint used by Bor/Erigon — resolves height from to_time internally and returns events in a single call. Includes a stability gate that returns empty if latestBlockTime <= cutoff, ensuring the resolved height is frozen before returning results
    • Stability gate: prevents non-deterministic results during rapid Heimdall catch-up (e.g. after a partition heals). Once latestBlockTime > cutoff, the resolved height is frozen — no new blocks can have time <= cutoff (monotonically increasing), and no new events can have visibility_height <= resolved_height
    • Immutable indexes: VisibilityTimeByID, VisibilityHeightByID, BlockTimeReverseIndex
    • Genesis export/import: All new collections included in GenesisState
    • Metrics: All new handlers instrumented with recordClerkQueryMetric
    • Feature gate: visibilityTimeHeight in helper/config.go (0 = disabled for all networks, set per-network when HF is scheduled)

Activation

  • visibilityTimeHeight: 0 (disabled) on all networks
  • Will be set to the agreed-upon block height when the HF is scheduled
  • Bor/Erigon (with EL HF) changes must be deployed after this is live and after heimdall HF triggered.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

HF needed.

Checklist

  • I have added at least two reviewers or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments — if any — by pushing each fix in a separate commit and linking the commit hash in the comment reply

Cross-repository changes

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on the local environment
  • I have tested this code manually on a remote devnet using express-cli
  • I have tested this code manually on amoy/mumbai
  • I have created new e2e tests into express-cli

Manual tests

Tested on kurtosis devnets with chaos testing (2-vs-2 network partition) and self-heal recovery. Many rounds of chaos, zero errors across all 7 nodes (4 bor vals + 3 erigon rpc).

Copilot AI review requested due to automatic review settings March 31, 2026 19:00
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 61.45455% with 106 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.9%. Comparing base (2166896) to head (c6a0d5e).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
x/clerk/keeper/genesis.go 48.4% 17 Missing and 17 partials ⚠️
x/clerk/keeper/grpc_query.go 73.1% 17 Missing and 9 partials ⚠️
x/clerk/keeper/keeper.go 66.2% 13 Missing and 13 partials ⚠️
x/clerk/keeper/side_msg_server.go 50.0% 6 Missing and 3 partials ⚠️
app/abci.go 0.0% 6 Missing and 1 partial ⚠️
helper/config.go 60.0% 4 Missing ⚠️

❌ Your patch check has failed because the patch coverage (61.4%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #572     +/-   ##
=========================================
+ Coverage     41.6%   41.9%   +0.2%     
=========================================
  Files          172     172             
  Lines        17499   17760    +261     
=========================================
+ Hits          7296    7458    +162     
- Misses        9124    9181     +57     
- Partials      1079    1121     +42     
Files with missing lines Coverage Δ
helper/config.go 20.2% <60.0%> (+0.6%) ⬆️
app/abci.go 57.4% <0.0%> (-0.8%) ⬇️
x/clerk/keeper/side_msg_server.go 65.3% <50.0%> (-0.1%) ⬇️
x/clerk/keeper/grpc_query.go 51.2% <73.1%> (+14.6%) ⬆️
x/clerk/keeper/keeper.go 64.1% <66.2%> (-0.7%) ⬇️
x/clerk/keeper/genesis.go 52.6% <48.4%> (-29.2%) ⬇️

... and 3 files with indirect coverage changes

Files with missing lines Coverage Δ
helper/config.go 20.2% <60.0%> (+0.6%) ⬆️
app/abci.go 57.4% <0.0%> (-0.8%) ⬇️
x/clerk/keeper/side_msg_server.go 65.3% <50.0%> (-0.1%) ⬇️
x/clerk/keeper/grpc_query.go 51.2% <73.1%> (+14.6%) ⬆️
x/clerk/keeper/keeper.go 64.1% <66.2%> (-0.7%) ⬇️
x/clerk/keeper/genesis.go 52.6% <48.4%> (-29.2%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces deterministic, height-pinned state sync querying for the clerk module by adding visibility-based indexing (time/height) and new query endpoints that Bor can use to derive deterministic results from the latest Heimdall state.

Changes:

  • Add visibility-time/visibility-height tracking with a one-block delay (pending list processed in PreBlocker) to make state-sync queries deterministic across halts.
  • Add new gRPC/REST query methods: GetBlockHeightByTime and GetRecordListVisibleAtHeight, plus protobuf + swagger/gateway wiring.
  • Add extensive keeper/query tests covering legacy/hybrid/post-upgrade behavior, pending-event exclusion, and block-time cutoff lookup.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
x/clerk/types/query.pb.gw.go Adds grpc-gateway HTTP handlers for the two new query endpoints.
x/clerk/types/query.pb.go Adds generated proto types + client/server interfaces for new queries.
x/clerk/types/keys.go Introduces new KV prefixes for visibility and block-time indexes.
x/clerk/keeper/visibility_time_test.go Adds comprehensive tests for visibility-time/height lifecycle and deterministic querying.
x/clerk/keeper/side_msg_server.go Adds post-handler logic to enqueue events for delayed visibility assignment and sets upgrade boundary.
x/clerk/keeper/keeper.go Implements pending processing, visibility indexes, block-time indexing, and cutoff height lookup.
x/clerk/keeper/grpc_query.go Updates GetRecordListWithTime hybrid filtering and adds two new query RPC implementations.
proto/heimdallv2/clerk/query.proto Defines new RPCs and request/response messages.
helper/config.go Adds an activation-height gate for the visibility-time feature.
client/docs/swagger-ui/swagger.yaml Documents the new REST endpoints and response types.
app/abci.go Runs pending visibility processing + block-time indexing in PreBlocker when enabled.
api/heimdallv2/clerk/query_grpc.pb.go Updates generated gRPC bindings with the new methods.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread x/clerk/keeper/keeper.go Outdated
Comment thread x/clerk/keeper/keeper.go
Comment thread x/clerk/keeper/keeper.go Outdated
Comment thread x/clerk/keeper/keeper.go Outdated
Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread x/clerk/keeper/grpc_query.go
Comment thread x/clerk/keeper/side_msg_server.go Outdated
Comment thread x/clerk/keeper/grpc_query.go
@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Mar 31, 2026

Claude finished @marcello33's task in 2m 27s —— View job


Code Review: Deterministic State Syncs

  • Gather context and understand PR changes
  • Review core keeper logic (keeper.go)
  • Review gRPC query handlers (grpc_query.go)
  • Review side message server (side_msg_server.go)
  • Review ABCI integration (abci.go)
  • Review config and keys changes
  • Review test coverage
  • Post final review

Summary

This PR introduces deterministic, height-pinned state sync querying for the clerk module. The core idea is sound: events get a one-block-delayed visibility_time and visibility_height assigned in PreBlocker, so that during halts, pending events stay excluded and all validators derive identical query results. Two new gRPC endpoints (GetBlockHeightByTime, GetRecordListVisibleAtHeight) allow Bor to pin queries to a specific Heimdall height.

Overall the architecture is well-designed. Below are the issues I found, categorized by severity.


CRITICAL / HIGH Issues

1. PreBlocker returns error on visibility processing failure — can halt the chain

File: app/abci.go:472-481

if helper.IsVisibilityTimeEnabled(req.Height) {
    if err := app.ClerkKeeper.ProcessPendingVisibilityEvents(ctx); err != nil {
        logger.Error("Error processing pending visibility events", "error", err)
        return nil, err // ← triggers CometBFT panic
    }
    if err := app.ClerkKeeper.StoreBlockTime(ctx); err != nil {
        logger.Error("Error storing block time", "error", err)
        return nil, err // ← triggers CometBFT panic
    }
}

Per the consensus-critical rules: "Every error returned by an ABCI method triggers a panic in CometBFT, which can halt the chain." If any store write fails (e.g., transient I/O), this halts the chain for all validators. Since this is non-critical bookkeeping (visibility assignment can be retried), consider logging and continuing rather than returning the error, or at minimum ensure these store operations cannot fail under normal conditions.

Fix this →

2. Upgrade boundary can be overwritten on store errors

File: x/clerk/keeper/side_msg_server.go:226-229

if _, err := srv.GetVisibilityTimeUpgradeID(ctx); err != nil {
    if setErr := srv.SetVisibilityTimeUpgradeID(ctx, record.Id); setErr != nil {

GetVisibilityTimeUpgradeID wraps collections.Item.Get() which returns an error both when the key doesn't exist and on store corruption/transient failures. If the upgrade ID is already set but Get fails for another reason, this silently overwrites the boundary with a later record ID, changing query semantics for the entire chain. Distinguish "not found" from other errors:

upgradeID, err := srv.GetVisibilityTimeUpgradeID(ctx)
if err != nil {
    if !errors.Is(err, collections.ErrNotFound) {
        logger.Error("Unable to get visibility time upgrade ID", "error", err)
        return err
    }
    // Not found — set it for the first time
    if setErr := srv.SetVisibilityTimeUpgradeID(ctx, record.Id); setErr != nil {
        ...
    }
}

Fix this →

3. Same error-handling issue in both query handlers

File: x/clerk/keeper/grpc_query.go:106-109 and grpc_query.go:335-338

upgradeId, err := q.k.GetVisibilityTimeUpgradeID(ctx)
if err != nil {
    upgradeId = ^uint64(0) // treats ALL errors as "not set"
}

A store failure silently switches all events to the legacy path, which could return non-deterministic results for post-upgrade events (filtered by record_time instead of visibility_time). This defeats the purpose of the feature. Should distinguish ErrNotFound from real errors.


MEDIUM Issues

4. fmt.Printf with %w verb in keeper code

File: x/clerk/keeper/keeper.go:416

fmt.Printf("failed to close iterator for BlockTimeReverseIndex: %w", err)

Two problems: %w is only valid in fmt.Errorf, not fmt.Printf (produces %!w garbage). Also, keeper code should use structured logging, not stdout. Use k.Logger(ctx) but note that ctx is not in scope in the deferred closure — you'd need to capture it or use sdkCtx.Logger().

Fix this →

5. Negative cutoffUnix silently overflows to huge uint64

File: x/clerk/keeper/keeper.go:402-403

func (k *Keeper) GetBlockHeightByTime(ctx context.Context, cutoffUnix int64) (int64, error) {
    cutoff := uint64(cutoffUnix)

While the gRPC handler validates cutoffTime > 0, the keeper method is public and could be called from other paths. A negative value wraps to ^uint64(0) - |value|, matching all stored entries and returning the latest height silently. Add a defensive check.

6. Iterator close error ignored in ProcessPendingVisibilityEvents

File: x/clerk/keeper/keeper.go:345

defer iterator.Close()

Other iterators in this file log close errors. This one silently ignores the return value. Should be consistent:

defer func() {
    if err := iterator.Close(); err != nil {
        sdkCtx.Logger().Error("failed to close pending visibility events iterator", "err", err)
    }
}()

7. GetRecordListVisibleAtHeight doesn't validate to_time

File: x/clerk/keeper/grpc_query.go:316-407

The handler uses request.ToTime to filter legacy (pre-upgrade) events (line 369: !value.RecordTime.Before(request.ToTime)), but never validates that to_time is set. If omitted/zero, the legacy iteration breaks immediately on the first event and returns empty results silently. Consider returning codes.InvalidArgument when request.ToTime.IsZero().

8. Unused RecordsWithVisibilityHeight index at query time

File: x/clerk/keeper/keeper.go:363-364

The RecordsWithVisibilityHeight secondary index is written for every processed event but never used in any read path — both GetRecordListWithTime and GetRecordListVisibleAtHeight iterate RecordsWithID and do point lookups on VisibilityHeightByID. This adds storage overhead and per-block write cost without benefit. Either use this index in the query path (would be more efficient) or remove the writes.


LOW Issues

9. TODO comments left in production config

File: helper/config.go:490,510

visibilityTimeHeight = 0 // TODO marcello decide block height

The activation heights for mainnet and amoy are set to 0 (disabled) with TODO comments. This is fine for now but should be tracked before merging to develop to ensure these get set before the feature is intended to go live. The default (local devnet) case has it set to 1000, which is good for testing.

10. visibilityTimeHeight is a mutable global variable

File: helper/config.go:241

var visibilityTimeHeight int64 = 0

This follows the existing pattern for other height gates (rioHeight, producerDowntimeHeight, etc.), so it's consistent with the codebase. However, it's worth noting that SetVisibilityTimeHeight is only used in tests, and the global is only set during InitHeimdallConfig. The Set function should probably only be exported for testing.

11. Test coverage is low (9.3% patch coverage)

Most of the new keeper and query code is covered in the dedicated visibility_time_test.go (1373 lines of tests, which is excellent), but CI reports 9.3% patch coverage. This suggests the test infrastructure may not be running or counting these tests properly. The tests themselves look thorough and cover the key scenarios (legacy, hybrid, halt simulation, pending exclusion, block-time lookup).


Design Observations

The one-block delay design is elegant. Events stored in block H get visibility_time = blockTime(H+1) because ProcessPendingVisibilityEvents runs in PreBlocker of the next block. During halts, no new blocks arrive, so pending events stay invisible. This is a clean solution to the determinism problem.

The hybrid query path is well-thought-out. Pre-upgrade events use record_time, post-upgrade events use visibility_time/visibility_height, with a clear boundary marker (VisibilityTimeUpgradeID). The iteration stops at the first non-visible event, preserving contiguous ID semantics that Bor depends on.

Proto changes look clean. New messages with new field numbers, no modifications to existing messages.


Action Items (prioritized)

  1. Fix error handling in side_msg_server.go:226 — distinguish "not found" from store errors to prevent upgrade boundary overwrite (Critical)
  2. Fix error handling in grpc_query.go:106-109 and 335-338 — same "not found" vs real error distinction (High)
  3. Evaluate returning errors from PreBlocker — consider log-and-continue for visibility processing failures to avoid chain halts (High)
  4. Fix fmt.Printf with %w in keeper.go:416 (Medium)
  5. Add defensive check for negative cutoffUnix in GetBlockHeightByTime (Medium)
  6. Validate to_time in GetRecordListVisibleAtHeight (Medium)
  7. Remove or use RecordsWithVisibilityHeight index (Low — storage optimization)

|

@claude
Copy link
Copy Markdown

claude Bot commented Mar 31, 2026

Code Review

Found 2 issues. Checked for bugs and CLAUDE.md compliance.

Issue 1: fmt.Printf with %w verb in GetBlockHeightByTime
File: x/clerk/keeper/keeper.go line 416
fmt.Printf does not support the %w verb (exclusive to fmt.Errorf). Produces garbled output. Also bypasses the structured logging framework.
Suggested fix: Use %v instead of %w, or use structured logging.

Issue 2: GetVisibilityTimeUpgradeID error conflates not-found with storage errors
File: x/clerk/keeper/side_msg_server.go lines 226-231
collections.Item.Get() returns an error both for not-found and for actual storage errors. The code treats any error as not-set-yet and overwrites the upgrade boundary. This is a post-handler running in PreBlocker which MUST be deterministic. A storage error on one validator but not another causes different VisibilityTimeUpgradeID values - a consensus-breaking state divergence. The rest of the codebase uses Has() for this pattern.
Suggested fix: Use Has() to check existence before calling Get().

Copilot AI review requested due to automatic review settings April 1, 2026 09:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread x/clerk/keeper/visibility_time_test.go
Comment thread x/clerk/keeper/visibility_time_test.go
Comment thread x/clerk/keeper/visibility_time_test.go
Comment thread x/clerk/keeper/visibility_time_test.go
Comment thread helper/config.go
Comment thread proto/heimdallv2/clerk/query.proto
Comment thread helper/config.go
@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude addressed your comments. Here's a recap:

  • Upgrade boundary overwrite: switched to Has() before Get(), propagated errors to prevent state divergence.
  • error conflation: now distinguishes collections.ErrNotFound from real errors and returns codes.Internal
  • fmt.Printf with %w: fixed both instances
  • negative cutoffUnix overflow: added a check.
  • Iterator close error ignored: fixed to log close errors.
  • to_time not validated in GetRecordListVisibleAtHeight: added request.ToTime.IsZero() validation.
  • Unused RecordsWithVisibilityHeight index: removed entirely

Not addressing the following:

  • PreBlocker returning errors: keeping return nil, err. These are consensus-critical state writes, hence log-and-continue would cause silent state divergence. The underlying KV writes should not fail under normal conditions; if they do, halting is correct. Consistent with all other state write errors in PreBlocker.
  • TODO comments for activation heights: intentionally disabled for the time being. Will be set when HF is scheduled.
  • visibilityTimeHeight mutable global: follows existing pattern for all other height gates, and only used in tests.

Re-review the changes.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 1, 2026

Claude finished @marcello33's task in 2m 37s —— View job


Re-review: Deterministic State Syncs

  • Verify upgrade boundary fix (side_msg_server.go)
  • Verify error conflation fix (grpc_query.go)
  • Verify fmt.Printf fix (keeper.go)
  • Verify negative cutoffUnix check (keeper.go)
  • Verify iterator close fix (keeper.go)
  • Verify to_time validation (grpc_query.go)
  • Verify RecordsWithVisibilityHeight removal
  • Check for any new/remaining issues
  • Post final review

Verification of Fixes

All six addressed items look correct:

  1. Upgrade boundary (side_msg_server.go:226-236) — Properly uses Has() before Get() now. Store errors from Has() are propagated. Only sets on !hasUpgradeID. Clean fix.

  2. Error conflation (grpc_query.go:108-113, 343-348) — Both GetVisibilityTimeUpgradeID call sites now distinguish collections.ErrNotFound from real errors and return codes.Internal for the latter. Correct.

  3. fmt.Printf with %w (keeper.go:416-418) — Fixed to use structured logging via k.Logger(ctx).Error(...). Good.

  4. Negative cutoffUnix (keeper.go:402-404) — Defensive check added at the keeper level. Both gRPC handler and keeper now validate.

  5. Iterator close (keeper.go:343-347) — Now logs close errors consistently with the rest of the file.

  6. to_time validation (grpc_query.go:337-339) — Returns codes.InvalidArgument when request.ToTime.IsZero().

  7. RecordsWithVisibilityHeight removal — Fully removed from keeper, keys, and all references. No orphaned writes.

Items not addressed (acknowledged, acceptable)

  • PreBlocker returning errors: Agreed. These are consensus-critical state writes; halting is the correct behavior if the KV store fails. Consistent with the rest of PreBlocker.
  • TODO activation heights: Fine for now, tracked for HF scheduling.
  • Mutable global: Follows existing codebase pattern.

Remaining Issues

1. new(types.NewMsgEventRecord(...)) in tests — compilation error

File: x/clerk/keeper/visibility_time_test.go — lines 385, 420, 452, 461, 656

new() in Go takes a type, not a function call. NewMsgEventRecord(...) returns a MsgEventRecord value, so new(types.NewMsgEventRecord(...)) is invalid Go and will not compile. The existing test pattern in side_msg_server_test.go:288 uses:

msg := types.NewMsgEventRecord(...)
postHandler(ctx, &msg, sidetxs.Vote_VOTE_YES)

All 5 occurrences need fixing. This means these tests have never been compiled or run.

Fix this →

2. GetVisibilityTimeForEvent / GetVisibilityHeightForEvent error handling in query paths

Files: x/clerk/keeper/grpc_query.go:149-152 and grpc_query.go:385-388

The GetVisibilityTimeUpgradeID error handling was fixed (distinguishes ErrNotFound from real errors), but the same pattern exists for visibility lookups in the iteration loop:

visTime, err := q.k.GetVisibilityTimeForEvent(ctx, value.Id)
if err != nil {
    // Event exists but has no visibility_time yet → still pending.
    break
}

A store corruption error here silently truncates the result set. Since collections.Item.Get() returns ErrNotFound for missing keys, this should distinguish:

visTime, err := q.k.GetVisibilityTimeForEvent(ctx, value.Id)
if err != nil {
    if errors.Is(err, collections.ErrNotFound) {
        break // genuinely pending
    }
    return nil, status.Errorf(codes.Internal, "failed to get visibility time for event %d: %v", value.Id, err)
}

Same applies to GetVisibilityHeightForEvent at line 385-388.

This is lower severity than the upgrade ID issue (it's a read-only query, not a state write), but for consistency with the fix already applied to GetVisibilityTimeUpgradeID, it's worth addressing.

Fix this →


Summary

The original critical/high issues are all properly fixed. Two remaining items: the test compilation error (prevents all new tests from running) and a minor error-handling inconsistency in the query iteration paths.

Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread app/abci.go
Comment thread x/clerk/keeper/keeper.go Outdated
Copilot AI review requested due to automatic review settings April 1, 2026 10:07
Comment thread x/clerk/keeper/visibility_time_test.go
Comment thread x/clerk/keeper/keeper.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread x/clerk/keeper/grpc_query.go Outdated
Comment thread x/clerk/keeper/visibility_time_test.go Outdated
@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude addressed your comments. Please re-review and solve the related comments if they are addressed

@claude
Copy link
Copy Markdown

claude Bot commented Apr 1, 2026

Claude finished @marcello33's task in 2m 0s —— View job


Re-review: Deterministic State Syncs

  • Read current code and verify all fixes
  • Verify upgrade boundary fix (side_msg_server.go)
  • Verify error conflation fix (grpc_query.go)
  • Verify fmt.Printf fix (keeper.go)
  • Verify negative cutoffUnix check (keeper.go)
  • Verify iterator close fix (keeper.go)
  • Verify to_time validation (grpc_query.go)
  • Verify RecordsWithVisibilityTime removal
  • Verify BlockTimeIndex status
  • Verify GetBlockHeightByTime error differentiation
  • Verify GetVisibilityTimeForEvent/GetVisibilityHeightForEvent error handling
  • Verify test compilation (new() usage)
  • Check for any remaining issues
  • Post final review

Verification of Previously Flagged Fixes

All previously flagged issues have been properly addressed:

  1. Upgrade boundary (side_msg_server.go:226-236) — Now uses Has() before Get(), propagates errors. Clean. ✅
  2. Error conflation in GetVisibilityTimeUpgradeID (grpc_query.go:108-113, 348-354) — Both call sites now distinguish collections.ErrNotFound from real errors, returning codes.Internal for the latter. ✅
  3. fmt.Printf with %w (keeper.go) — Fixed to use structured logging via k.Logger(ctx).Error(...). ✅
  4. Negative cutoffUnix (keeper.go:399-401) — Defensive check added at keeper level. ✅
  5. Iterator close error (keeper.go:340-344) — Now logs close errors consistently. ✅
  6. to_time validation (grpc_query.go:343-345) — Returns codes.InvalidArgument when request.ToTime.IsZero(). ✅
  7. RecordsWithVisibilityTime dead index — Fully removed from keeper, keys, and all references. ✅
  8. GetBlockHeightByTime error differentiation (grpc_query.go:314-317) — Properly uses sentinel error ErrNoBlockFound with errors.Is() and returns codes.NotFound. ✅
  9. GetVisibilityTimeForEvent / GetVisibilityHeightForEvent error handling (grpc_query.go:151-156, 393-398) — Now distinguishes collections.ErrNotFound (pending → break) from real errors (returns codes.Internal). ✅

Items Not Addressed (Acknowledged, Acceptable)

  • PreBlocker returning errors: Keeping return nil, err. These are consensus-critical state writes; halting is correct if the KV store fails. Consistent with other state write errors in PreBlocker. ✅
  • TODO activation heights: Intentionally disabled, will be set when HF is scheduled. ✅
  • Mutable global visibilityTimeHeight: Follows existing codebase pattern. ✅
  • new(types.NewMsgEventRecord(...)) in tests: Author says this is valid in Go 1.26.1 (the project's Go version). ✅

Remaining Observations

1. BlockTimeIndex (forward index) is written but never read in production

File: x/clerk/keeper/keeper.go:388-390

StoreBlockTime writes to both BlockTimeIndex (height → time) and BlockTimeReverseIndex ((time, height) → height). Only BlockTimeReverseIndex is used in production (GetBlockHeightByTime). BlockTimeIndex is only used in test assertions. This adds 1 extra IAVL entry per block from activation onward with no production reader.

This is low-severity (storage overhead only, no correctness impact). If a GetBlockTimeByHeight query is planned for a future PR, this is fine to keep. Otherwise worth removing for cleanliness.

2. Minor: hasPending error ignored in test

File: x/clerk/keeper/visibility_time_test.go:51

hasPending, _ := ck.PendingVisibilityEvents.Has(ctx, 1)

The error from Has() is discarded. Nitpick — other lines in the same test file (e.g., lines 382-383) correctly use require.NoError(err). This is test-only and extremely minor.


Summary

All critical and high-severity issues from the previous reviews have been properly fixed. The code now correctly distinguishes ErrNotFound from real errors in all query paths, uses Has() for the upgrade boundary check, defines a proper sentinel error for the not-found case in GetBlockHeightByTime, and removes the dead RecordsWithVisibilityTime index. The remaining items are low-severity observations. The PR looks good.


Comment thread x/clerk/keeper/grpc_query.go
Comment thread x/clerk/keeper/visibility_time_test.go Outdated
Comment thread x/clerk/keeper/side_msg_server.go
Copilot AI review requested due to automatic review settings April 4, 2026 20:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 23 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread x/clerk/keeper/side_msg_server.go
Comment thread x/clerk/keeper/visibility_time_test.go
Comment thread x/clerk/keeper/visibility_time_test.go Outdated
Comment thread x/clerk/keeper/visibility_time_test.go Outdated
Comment thread proto/heimdallv2/clerk/genesis.proto Outdated
Comment thread x/clerk/keeper/grpc_query.go
@claude
Copy link
Copy Markdown

claude Bot commented Apr 4, 2026

Code Review

1 issue found. Checked for bugs and CLAUDE.md compliance.


Issue: Personal TODO comments in production code

File: helper/config.go lines 490 and 510

The comment // TODO marcello decide block height appears on both MainnetChain (line 490) and AmoyChain (line 510). Per CLAUDE.md comment guidelines, comments must pass the 6-month test: Would someone reading just the current code find this helpful? This references a specific person and an undecided design decision. Every other height variable in this function is set to a concrete value or 0 with no comment.

Suggested fix: Remove the TODO comments or replace with: // activation height TBD - feature disabled until hard fork is scheduled

Copilot AI review requested due to automatic review settings April 4, 2026 21:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 23 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread x/clerk/keeper/grpc_query.go
Comment thread x/clerk/keeper/grpc_query.go
Comment thread x/clerk/keeper/visibility_time_test.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 5, 2026

Code Review

Found 4 issues. Checked for bugs and CLAUDE.md compliance.

Issue 1 (compile error): visibility_time_test.gonew(types.NewMsgEventRecord(...)) won't compile. new() takes a type, not a value. 5 occurrences (lines 370, 405, 437, 446, 627).

Issue 2 (consensus-critical): app/abci.go:473-480 — PreBlocker returns errors from ProcessPendingVisibilityEvents and StoreBlockTime, triggering CometBFT panics. These are internal state ops, not provably-invalid proposals. Should log and continue per .claude/rules/consensus-critical.md.

Issue 3 (state loss): side_msg_server.go:222-241 — Visibility errors in PostHandleMsgEventRecord return an error, which rolls back the entire cache-wrapped post-handler including SetEventRecord. An approved clerk event is silently lost.

Issue 4 (DoS): grpc_query.go:402-403continue should be break in recordListVisibleAtHeight. Record times are monotonic, so once the filter fails all subsequent events will also fail, but the loop scans every remaining event. Same issue at lines 416 and 419. Existing GetRecordListWithTime correctly uses break.

See inline comments for details.

@0xPolygon 0xPolygon deleted a comment from claude Bot Apr 5, 2026
@0xPolygon 0xPolygon deleted a comment from claude Bot Apr 5, 2026
@0xPolygon 0xPolygon deleted a comment from claude Bot Apr 5, 2026
@0xPolygon 0xPolygon deleted a comment from claude Bot Apr 5, 2026
@marcello33
Copy link
Copy Markdown
Collaborator Author

@claude checked your comments here

Issue 1 (compile error): visibility_time_test.go — new(types.NewMsgEventRecord(...)) won't compile. new() takes a type, not a value. 5 occurrences (lines 370, 405, 437, 446, 627).
This is not valid in newer go version. Code compiles fine.
Issue 2 (consensus-critical): app/abci.go:473-480 — PreBlocker returns errors from ProcessPendingVisibilityEvents and StoreBlockTime, triggering CometBFT panics. These are internal state ops, not provably-invalid proposals. Should log and continue per .claude/rules/consensus-critical.md.
We discussed this many times. This is not an issue here.
Issue 3 (state loss): side_msg_server.go:222-241 — Visibility errors in PostHandleMsgEventRecord return an error, which rolls back the entire cache-wrapped post-handler including SetEventRecord. An approved clerk event is silently lost.
Won't address it here
Issue 4 (DoS): grpc_query.go:402-403 — continue should be break in recordListVisibleAtHeight. Record times are monotonic, so once the filter fails all subsequent events will also fail, but the loop scans every remaining event. Same issue at lines 416 and 419. Existing GetRecordListWithTime correctly uses break.
Not valid

@claude
Copy link
Copy Markdown

claude Bot commented Apr 5, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copilot AI review requested due to automatic review settings April 6, 2026 09:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread x/clerk/keeper/visibility_time_test.go
Comment thread x/clerk/keeper/grpc_query_test.go
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 6, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants