HYPERFLEET-706 - fix: lastUpdateTime for Ready#81
HYPERFLEET-706 - fix: lastUpdateTime for Ready#81rh-amarin wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughThis PR changes how ResourceCondition.LastUpdatedTime is computed and propagated. It adds helpers (findAdapterStatus, adapterConditionsHasAvailableTrue) and computeReadyLastUpdated, which returns the evaluation time when Ready is false or the minimum LastReportTime among required adapters that report Available=True at the current generation when Ready is true. applyConditionHistory centralizes copying CreatedTime/LastTransitionTime and accepts an explicit lastUpdatedTime to set LastUpdatedTime. BuildSyntheticConditions now supplies computed lastUpdatedTime for Available (evaluation time) and Ready (computed). Tests and documentation updated accordingly. Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
0f274fc to
0465f91
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
pkg/services/cluster_test.go (1)
790-795: Assert against the actual evaluation window.These matchers only prove
LastUpdatedTimemoved forward fromfixedNow; they do not prove it was refreshed duringCreate()orUpdateClusterStatusFromAdapters(). Capturebefore/aftertimestamps around each call and assert the synthesized value falls inside that window so this test locks down the new freshness contract.Also applies to: 817-823
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/services/cluster_test.go` around lines 790 - 795, The test currently only asserts LastUpdatedTime > fixedNow; change it to capture timestamps immediately before and after the call that should refresh the field (e.g., record beforeCreate := time.Now() then call Create() or UpdateClusterStatusFromAdapters(), then afterCreate := time.Now()) and assert createdAvailable.LastUpdatedTime and createdReady.LastUpdatedTime lie between those before/after bounds (using BeTemporally(">=", beforeCreate) and BeTemporally("<=", afterCreate)); apply the same pattern for the other affected assertions (the block around lines 817-823) so the test verifies the synthesized timestamps fall inside the actual evaluation window.pkg/services/node_pool_test.go (1)
650-655: Assert against the actual evaluation window.As written, these checks only show that
LastUpdatedTimeis later thanfixedNow. Capturingbefore/aftertimestamps aroundCreate()andUpdateNodePoolStatusFromAdapters()would make the test verify the intended evaluation-time refresh behavior directly.Also applies to: 677-683
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/services/node_pool_test.go` around lines 650 - 655, The assertions for LastUpdatedTime should verify the evaluation timestamp window rather than just being greater than fixedNow: capture timestamps immediately before and after calling Create() and UpdateNodePoolStatusFromAdapters() (e.g., store beforeEval := time.Now() before calling the evaluated function and afterEval := time.Now() after it returns) and replace the BeTemporally(">", fixedNow) checks with assertions that LastUpdatedTime is within [beforeEval, afterEval] (use BeTemporally(">=", beforeEval) and BeTemporally("<=", afterEval) or equivalent); apply this pattern to the checks for createdAvailable.LastUpdatedTime and createdReady.LastUpdatedTime (and repeat the same change for the similar assertions at lines 677-683).pkg/services/status_aggregation_test.go (1)
35-155: Add one synthesized-condition test for the Ready=True path.These cases validate
computeReadyLastUpdated()directly, but they will not fail ifBuildSyntheticConditions()stops threading that timestamp throughapplyConditionHistory(). One end-to-end assertion on the finalapi.ResourceConditionwould cover the actual wiring.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/services/status_aggregation_test.go` around lines 35 - 155, Add an end-to-end unit test that synthesizes conditions and verifies the ready timestamp is threaded to the final api.ResourceCondition: call BuildSyntheticConditions(...) then pass its result through applyConditionHistory(...) (or the code path that produces api.ResourceCondition) and assert that the Ready condition's timestamp field (e.g. LastTransitionTime/LastUpdateTime on the resulting api.ResourceCondition) equals the value returned by computeReadyLastUpdated(...) for the same inputs; place this alongside the existing TestComputeReadyLastUpdated_* tests and reference the functions BuildSyntheticConditions, applyConditionHistory, and computeReadyLastUpdated so the test fails if the timestamp is not propagated to the final api.ResourceCondition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/services/cluster_test.go`:
- Around line 790-795: The test currently only asserts LastUpdatedTime >
fixedNow; change it to capture timestamps immediately before and after the call
that should refresh the field (e.g., record beforeCreate := time.Now() then call
Create() or UpdateClusterStatusFromAdapters(), then afterCreate := time.Now())
and assert createdAvailable.LastUpdatedTime and createdReady.LastUpdatedTime lie
between those before/after bounds (using BeTemporally(">=", beforeCreate) and
BeTemporally("<=", afterCreate)); apply the same pattern for the other affected
assertions (the block around lines 817-823) so the test verifies the synthesized
timestamps fall inside the actual evaluation window.
In `@pkg/services/node_pool_test.go`:
- Around line 650-655: The assertions for LastUpdatedTime should verify the
evaluation timestamp window rather than just being greater than fixedNow:
capture timestamps immediately before and after calling Create() and
UpdateNodePoolStatusFromAdapters() (e.g., store beforeEval := time.Now() before
calling the evaluated function and afterEval := time.Now() after it returns) and
replace the BeTemporally(">", fixedNow) checks with assertions that
LastUpdatedTime is within [beforeEval, afterEval] (use BeTemporally(">=",
beforeEval) and BeTemporally("<=", afterEval) or equivalent); apply this pattern
to the checks for createdAvailable.LastUpdatedTime and
createdReady.LastUpdatedTime (and repeat the same change for the similar
assertions at lines 677-683).
In `@pkg/services/status_aggregation_test.go`:
- Around line 35-155: Add an end-to-end unit test that synthesizes conditions
and verifies the ready timestamp is threaded to the final api.ResourceCondition:
call BuildSyntheticConditions(...) then pass its result through
applyConditionHistory(...) (or the code path that produces
api.ResourceCondition) and assert that the Ready condition's timestamp field
(e.g. LastTransitionTime/LastUpdateTime on the resulting api.ResourceCondition)
equals the value returned by computeReadyLastUpdated(...) for the same inputs;
place this alongside the existing TestComputeReadyLastUpdated_* tests and
reference the functions BuildSyntheticConditions, applyConditionHistory, and
computeReadyLastUpdated so the test fails if the timestamp is not propagated to
the final api.ResourceCondition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c5b7aae1-530a-4b11-847e-5ae5f4e614d4
📒 Files selected for processing (6)
docs/api-resources.mdpkg/services/CLAUDE.mdpkg/services/cluster_test.gopkg/services/node_pool_test.gopkg/services/status_aggregation.gopkg/services/status_aggregation_test.go
0465f91 to
b7f6f99
Compare
xueli181114
left a comment
There was a problem hiding this comment.
Review: HYPERFLEET-706 — lastUpdateTime for Ready
The core logic in computeReadyLastUpdated is correct and well-tested. Six observations below — #1 is the main concern, the rest are quality improvements.
| # | Issue | Severity |
|---|---|---|
| 1 | Available's LastUpdatedTime semantics changed (not in ticket scope) |
Major |
| 2 | Double JSON unmarshal per adapter | Minor |
| 3 | Redundant guards when isReady=true |
Minor |
| 4 | MixedGenerations test is an impossible scenario |
Minor |
| 5 | Integration test uses same function for expected value | Minor |
| 6 | findAdapterStatus vs map-overwrite on duplicates |
Nitpick |
pkg/services/status_aggregation.go
Outdated
| LastUpdatedTime: now, | ||
| } | ||
| preserveSyntheticCondition(&availableCondition, existingAvailable, now) | ||
| applyConditionHistory(&availableCondition, existingAvailable, now, now) |
There was a problem hiding this comment.
Major: Unintended behavioral change to Available's LastUpdatedTime
The ticket only asks to change Ready's LastUpdatedTime. But the refactor from preserveSyntheticCondition → applyConditionHistory also changes Available's behavior.
Before (main): When Available status is unchanged, preserveSyntheticCondition preserves the existing LastUpdatedTime:
if !existing.LastUpdatedTime.IsZero() {
target.LastUpdatedTime = existing.LastUpdatedTime
}After (PR): applyConditionHistory always overwrites LastUpdatedTime with the caller-provided value (now for Available):
target.LastUpdatedTime = lastUpdatedTime // always overwrittenThis means Available's LastUpdatedTime now refreshes on every evaluation cycle (~10s), even when nothing changed. Previously it was stable. Any consumer relying on "when did Available actually change?" will now see constant updates.
The updated tests (BeTemporally windows instead of exact match) confirm this is a behavioral change, not just a refactor.
Is this intentional? If not, consider preserving the old behavior for Available by passing existing.LastUpdatedTime when status is unchanged.
There was a problem hiding this comment.
Even though I don't think lastUpdatedTime has very useful purpose... if preserving the old value can completely avoid an UPDATE to the database, then it would be a good thing.
I will change this
There was a problem hiding this comment.
Avoiding the DB action will require a separate ticket
The fix does not save a database write.
Looking at cluster.go:232-233, the caller unconditionally calls s.clusterDao.Replace(ctx, cluster) after building conditions
— there's no comparison between the old and new StatusConditions JSON before deciding whether to persist. Every adapter
status report triggers a DB write regardless of whether the computed conditions changed.
The fix only corrects what gets written: instead of writing a new LastUpdatedTime = now every ~10s, it writes back the
preserved original timestamp. The DB update still happens either way.
To actually skip the write when nothing changed, you'd need an equality check before the Replace call — but that's a separate optimization, not part of this ticket.
| Status string `json:"status"` | ||
| } | ||
| if err := json.Unmarshal(conditions, &conds); err != nil { | ||
| return false |
There was a problem hiding this comment.
Minor: Double JSON unmarshal per adapter
BuildSyntheticConditions calls ComputeReadyCondition() which unmarshals each adapter's conditions JSON to check Available=True. Then computeReadyLastUpdated() unmarshals the same JSON again via adapterConditionsHasAvailableTrue().
Each adapter's conditions are parsed twice per evaluation cycle. Not a problem at current scale, but ComputeReadyCondition already builds an adapterMap with parsed results — computeReadyLastUpdated could accept that map instead of re-parsing.
There was a problem hiding this comment.
This is a micro-optimization, building a map of 2 entries
I explicitly changed it to make the code more readable
There was a problem hiding this comment.
Priority: Inconsistency
This also silently returns false on
json.Unmarshal error, but this same PR adds explicit
logger.WithError(...).Warn(...) for the identical unmarshal failure in
ComputeAvailableCondition (line 128) and ComputeReadyCondition (line 202).
Corrupt adapter conditions reaching computeReadyLastUpdated would produce no
log entry, making the silent skip invisible to operators.
Consider adding logging here too, or restructuring so computeReadyLastUpdated
reuses the already-parsed results from ComputeReadyCondition (which would also
address @xueli181114 's double-unmarshal comment).
| } | ||
| if status.LastReportTime == nil { | ||
| return now // safety: no timestamp | ||
| } |
There was a problem hiding this comment.
Minor: Guards are unreachable when isReady=true
When isReady=true, ComputeReadyCondition has already confirmed ALL required adapters have Available=True at current generation. So:
- The
ObservedGeneration != resourceGeneration→continuebranch can't fire - The
!adapterConditionsHasAvailableTrue()→continuebranch can't fire minTimecan never be nil after the loop
These aren't harmful (defense-in-depth), but a comment like // Redundant when isReady=true; provides safety if called independently would clarify intent and prevent someone from thinking these are load-bearing checks.
| result := computeReadyLastUpdated(statuses, []string{"validator", "dns"}, 3, now, true) | ||
| if !result.Equal(newGenTime) { | ||
| t.Errorf("expected %v (only current-gen adapter), got %v", newGenTime, result) | ||
| } |
There was a problem hiding this comment.
Minor: This test exercises an impossible scenario
This passes isReady=true with dns at gen 2 and resource at gen 3. But ComputeReadyCondition would return false in this case (dns is at wrong gen), so isReady would never be true in production. The test provides false confidence.
Consider either:
- Documenting it as a defense-in-depth test:
// Tests function in isolation; this combination can't occur via BuildSyntheticConditions - Or removing it and relying on the integration test for the realistic path
There was a problem hiding this comment.
I will remove the test
| adapterStatuses, requiredAdapters, resourceGeneration, now, isReady, | ||
| ) | ||
|
|
||
| if !readyCondition.LastUpdatedTime.Equal(expectedLastUpdated) { |
There was a problem hiding this comment.
Minor: Tautological assertion
This test computes the expected value by calling computeReadyLastUpdated() — the same function under test. It only proves BuildSyntheticConditions passes the result through, not that the result is correct.
A stronger assertion would hardcode the expected value:
if !readyCondition.LastUpdatedTime.Equal(reportTime) {
t.Errorf("expected reportTime %v, got %v", reportTime, readyCondition.LastUpdatedTime)
}This independently verifies the full chain: BuildSyntheticConditions → computeReadyLastUpdated → correct timestamp.
b7f6f99 to
b8e4bd4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/services/status_aggregation.go`:
- Around line 304-305: The generation check that currently skips adapters when
status.ObservedGeneration != resourceGeneration should be relaxed to match
ComputeReadyCondition's behavior: only enforce generation equality when
resourceGeneration > 0; i.e., if resourceGeneration == 0 allow adapters
regardless of ObservedGeneration. Update the condition inside the function
containing that check (look for references to status.ObservedGeneration,
resourceGeneration, and computeReadyLastUpdated) accordingly, and add a
regression unit test that calls BuildSyntheticConditions (or the public entry
that uses it) with resourceGeneration == 0 to assert Ready remains computed and
the staleness signal is produced rather than being skipped.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 48eb9e9d-2f4a-4da0-b1b9-4a9db369492b
📒 Files selected for processing (6)
docs/api-resources.mdpkg/services/CLAUDE.mdpkg/services/cluster_test.gopkg/services/node_pool_test.gopkg/services/status_aggregation.gopkg/services/status_aggregation_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- docs/api-resources.md
- pkg/services/node_pool_test.go
- pkg/services/CLAUDE.md
- pkg/services/cluster_test.go
| if status.ObservedGeneration != resourceGeneration { | ||
| continue // not at current gen, skip |
There was a problem hiding this comment.
Keep the generation filter aligned with ComputeReadyCondition.
Line 304 is stricter than ComputeReadyCondition, which only enforces generation equality when resourceGeneration > 0. If a caller ever passes 0 here, BuildSyntheticConditions can return Ready=True while computeReadyLastUpdated() skips every adapter and falls back to now, so the new staleness signal disappears on that path. Please mirror the same guard here and add a regression test for the zero-generation case.
Suggested fix
- if status.ObservedGeneration != resourceGeneration {
+ if resourceGeneration > 0 && status.ObservedGeneration != resourceGeneration {
continue // not at current gen, skip
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if status.ObservedGeneration != resourceGeneration { | |
| continue // not at current gen, skip | |
| if resourceGeneration > 0 && status.ObservedGeneration != resourceGeneration { | |
| continue // not at current gen, skip |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/services/status_aggregation.go` around lines 304 - 305, The generation
check that currently skips adapters when status.ObservedGeneration !=
resourceGeneration should be relaxed to match ComputeReadyCondition's behavior:
only enforce generation equality when resourceGeneration > 0; i.e., if
resourceGeneration == 0 allow adapters regardless of ObservedGeneration. Update
the condition inside the function containing that check (look for references to
status.ObservedGeneration, resourceGeneration, and computeReadyLastUpdated)
accordingly, and add a regression unit test that calls BuildSyntheticConditions
(or the public entry that uses it) with resourceGeneration == 0 to assert Ready
remains computed and the staleness signal is produced rather than being skipped.
| - `observed_generation` - Generation this condition reflects | ||
| - `created_time` - When condition was first created (API-managed) | ||
| - `last_updated_time` - When adapter last reported (API-managed, from AdapterStatus.last_report_time) | ||
| - `last_updated_time` - When this condition was last refreshed (API-managed). For **Available**, always the evaluation time. For **Ready**: when Ready=True, the minimum of `last_report_time` across all required adapters that report Available=True at the current generation; when Ready=False, the evaluation time (so consumers can detect staleness). |
There was a problem hiding this comment.
Priority: Inconsistency
The doc on line 459 says Available's last_updated_time is "always the
evaluation time", but the code in BuildSyntheticConditions preserves the
existing value when status and ObservedGeneration are unchanged (lines 359-365
of status_aggregation.go). The test
TestBuildSyntheticConditions_AvailableLastUpdatedTime_Stable confirms the
preservation behavior.
Whichever direction the fix goes for the Available behavior (per Xue Li's
comment), please update the doc to match. If preservation is kept, something
like: "For Available, the evaluation time when the status or
observed_generation changes; otherwise preserved from the previous
evaluation."
| ) | ||
|
|
||
| // adapterConditionStatusTrue is the string value for a True adapter condition status. | ||
| const adapterConditionStatusTrue = "True" |
There was a problem hiding this comment.
Priority: Pattern
adapterConditionStatusTrue = "True" (line 26) duplicates
api.AdapterConditionTrue (AdapterConditionStatus("True") in
status_types.go:19). This creates a second source of truth.
Consider either:
- Using string(api.AdapterConditionTrue) at the comparison sites (lines 161,
239, 273), or - Deriving the constant explicitly: const adapterConditionStatusTrue =
string(api.AdapterConditionTrue)
Both options keep a single source of truth while satisfying the string comparison.
| } | ||
| break | ||
| if err := json.Unmarshal(adapterStatus.Conditions, &conditions); err != nil { | ||
| logger.WithError(context.Background(), err).Warn( |
There was a problem hiding this comment.
Priority: Pattern
context.Background() at lines 128 and 202 means these warn logs won't carry
request-scoped metadata (cluster_id, trace_id, request_id) that callers like
UpdateClusterStatusFromAdapters have available. If a corrupt conditions blob
causes unexpected Available/Ready results in production, correlating these
logs to a specific cluster would require timestamp matching.
Not blocking, but consider threading context.Context through
ComputeAvailableCondition and ComputeReadyCondition so the logs inherit the
caller's context. Alternatively, a // TODO noting the limitation would help
future maintainers.
| now := time.Now() | ||
| statuses := api.AdapterStatusList{ | ||
| makeAdapterStatus("validator", 1, ptr(now.Add(-5*time.Second)), makeConditionsJSON(t, []struct{ Type, Status string }{ | ||
| {api.ConditionTypeAvailable, "True"}, |
There was a problem hiding this comment.
Priority: Pattern
The PR extracts adapterConditionStatusTrue to eliminate the "True" magic
string in production code, but the new tests (lines
- 50,
- 64,
- 79,
- 94,
- 109,
- 125,
- 128,
- 145,
- 170,
-
reintroduce bare "True" and "False" literals across ~10 call sites.
Since the test helper uses struct{ Type, Status string } (untyped), consider
using string(api.AdapterConditionTrue) and string(api.AdapterConditionFalse)
to stay consistent with the PR's own motivation:
{api.ConditionTypeAvailable, string(api.AdapterConditionTrue)}
|
Priority: Improvement The availableLastUpdated logic has an untested code path: generation changed Consider adding a test like: func TestBuildSyntheticConditions_AvailableLastUpdatedTime_UpdatesOnGeneration
Change(t *testing.T) {
// existing Available=True at gen 1, adapters now at gen 2 still True
// assert Available.LastUpdatedTime == now (refreshed due to gen change)
} |
What
Why
Previously, Ready’s LastUpdatedTime did not reflect when adapters last reported or when the condition was re-evaluated. That made it hard for Sentinel to treat “no recent update” as “condition may be stale” and to reason about readiness correctly. This change aligns the API’s semantics with how consumers use the field.
How
pkg/services/status_aggregation.gostatus_aggregation_test.go: New unit tests for computeReadyLastUpdated (not ready, missing adapter, nil LastReportTime, wrong generation, Available=False, single qualifying adapter, multiple adapters minimum, mixed generations).cluster_test.goand node_pool_test.go: Updated TestClusterSyntheticTimestampsStableWithoutAdapterStatus and TestNodePoolSyntheticTimestampsStableWithoutAdapterStatus so they expect LastUpdatedTime to be after the fixed “now” (i.e. refreshed at evaluation time), with brief comments explaining the behavior.Docs
docs/api-resources.md: Clarified last_updated_time for ResourceCondition — Available always uses evaluation time; Ready uses minimum adapter last_report_time when Ready=True and evaluation time when Ready=False.pkg/services/CLAUDE.md: Documented that Ready’s LastUpdatedTime comes from computeReadyLastUpdated and when it is evaluation time vs minimum adapter time.Testing
make test (unit tests).
make test-integration (optional, for full confidence).
Summary by CodeRabbit
Documentation
Tests
Refactor
Bug Fixes