HYPERFLEET-537 - docs: configurable conditions-based decision making#102
HYPERFLEET-537 - docs: configurable conditions-based decision making#102
Conversation
…ns-based decision making Replace hardcoded max_age_not_ready/max_age_ready references with configurable conditions using CEL expressions (reference_time + rules with name/expression/max_age), aligning with the implementation in hyperfleet-sentinel PR #72.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughThis change replaces fixed max-age constants with a CEL-driven conditions system for reconciliation. Configs add a conditions block (reference_time plus ordered named CEL rules mapping expressions to max_age), resource_selector, hyperfleet_api, and message_data mappings. At runtime the controller compiles CEL expressions at startup (fail-fast). Reconciliation still publishes immediately when resource.generation > resource.status.observed_generation; otherwise it evaluates CEL rules to select max_age, computes nextEventTime = reference_time + max_age, and publishes when now >= nextEventTime. Per-deployment configs may override rules/max_age. Sequence Diagram(s)sequenceDiagram
participant Config as Configuration
participant CELEngine as CEL Engine
participant Monitor as Resource Monitor
participant Evaluator as Rule Evaluator
participant Broker as Event Broker
Config->>CELEngine: Load & compile conditions (fail-fast)
CELEngine-->>Config: Compilation result
Monitor->>Evaluator: Poll resource / read status
Evaluator->>Evaluator: Compare resource.generation vs observed_generation
alt generation > observed_generation
Evaluator->>Broker: Publish CreateEvent (immediate)
else generation == observed_generation
Evaluator->>CELEngine: Evaluate ordered CEL rules
CELEngine-->>Evaluator: Matched rule + max_age, reference_time
Evaluator->>Evaluator: Compute nextEventTime = reference_time + max_age
alt now >= nextEventTime
Evaluator->>Broker: Publish Reconciliation Event
end
end
Broker-->>Broker: Queue/forward event
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
hyperfleet/components/sentinel/sentinel.md (1)
608-638:⚠️ Potential issue | 🟠 MajorUse one
observed_generationpath throughout the spec.This section compares against
resource.status.conditions.Ready.observed_generation, but the rest of the page and the test scenarios useresource.status.observed_generation. Please normalize on a single contract here; otherwise the Decision Engine description is ambiguous.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/sentinel/sentinel.md` around lines 608 - 638, The document inconsistently references two observed_generation paths; normalize to a single contract by replacing the use of resource.status.conditions.Ready.observed_generation with the canonical resource.status.observed_generation (or vice versa if you prefer that canonical name) everywhere in the Decision Logic and Implementation Requirements so the generation check in Evaluate(resource, now) consistently compares resource.generation against the same observed_generation path; update all references, examples, and test scenario mentions and ensure the text and logging (e.g., the generation mismatch reason string) reflect the chosen single field name.hyperfleet/components/sentinel/sentinel-deployment.md (1)
43-61:⚠️ Potential issue | 🟠 MajorThe pod spec still doesn't consume the documented broker ConfigMap.
This file now introduces
hyperfleet-sentinel-broker, but the Deployment only pullsGCP_PROJECT_IDfromsentinel-configand never imports anyBROKER_*keys. As written, the RabbitMQ example never reaches the container environment, and Pub/Sub has two competing config sources.Also applies to: 121-177
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 43 - 61, The Deployment pod spec is missing env entries that consume the new hyperfleet-sentinel-broker ConfigMap so BROKER_* values never reach the container; update the env block to add the required BROKER_* env vars (e.g., BROKER_TYPE, BROKER_URL, BROKER_USER, BROKER_PASSWORD or whatever keys your broker consumer expects) with valueFrom: configMapKeyRef name: hyperfleet-sentinel-broker and the appropriate key names, ensure BROKER_CREDENTIALS (if still needed) remains sourced from sentinel-secrets (secretKeyRef) only when required, and remove or reconcile any duplicate BROKER sources so Pub/Sub continues to use GCP_PROJECT_ID from sentinel-config while RabbitMQ/other brokers use hyperfleet-sentinel-broker.
🧹 Nitpick comments (1)
hyperfleet/components/sentinel/sentinel-deployment.md (1)
126-134: Expose rule-oriented labels in the metrics spec.Once skip/publish decisions come from arbitrary CEL rules, a fixed
ready_state={ready,not_ready}label no longer tells operators which rule matched and cannot represent the no-match fallback. Amatched_ruleordecision_reasonlabel would fit the new configuration model much better.Also applies to: 209-225
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 126 - 134, The metrics spec currently emits a fixed ready_state label which can't convey which CEL rule matched or represent a no-match fallback; update the metrics emission to include a rule-oriented label (e.g., matched_rule or decision_reason) populated with the matching rule name (use the rule "name" values such as isReady and isNotReady) and ensure a deterministic fallback value (e.g., "no_match") when no rule matches; apply the same change to the other rules block referenced (lines ~209-225) so all rule-based decisions expose the matching rule name and a clear no-match reason in metrics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 268-279: Update the later sequence diagrams so their decision
boxes match the documented Decision Algorithm: change the checks to first
compare resource.generation vs resource.status.observed_generation
(generation-first reconciliation), then evaluate CEL condition rules to select
matched_rule.max_age (or smallest max_age if none), compute reference_time from
the rule expression, and use now >= reference_time + max_age to trigger
PublishEvent(CreateEvent(resource)); replace any hardcoded phase != "Ready" or
last_updated_time + 10s logic with these same symbols (FetchResources,
resource.generation, resource.status.observed_generation, matched_rule.max_age,
reference_time, PublishEvent, CreateEvent, CEL) so the diagrams and prose are
consistent.
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 238-255: Update the doc to specify the timestamp fallback used
when no rule matches: state that reference_time (used by nextEventTime) will use
conditionTime(resource, "<conditionName>") if available, otherwise fall back to
resource.metadata.creationTimestamp (and if that is missing, use the resource's
observedGeneration/now as a last resort); mention this behavior wherever
reference_time, nextEventTime, or conditionTime are described (including the
default-rule/no-match section) so Test 9’s scenario is fully specified. Ensure
the prose notes that rules still pick the smallest max_age when none match but
that the timestamp for ROI calculation falls back as described.
---
Outside diff comments:
In `@hyperfleet/components/sentinel/sentinel-deployment.md`:
- Around line 43-61: The Deployment pod spec is missing env entries that consume
the new hyperfleet-sentinel-broker ConfigMap so BROKER_* values never reach the
container; update the env block to add the required BROKER_* env vars (e.g.,
BROKER_TYPE, BROKER_URL, BROKER_USER, BROKER_PASSWORD or whatever keys your
broker consumer expects) with valueFrom: configMapKeyRef name:
hyperfleet-sentinel-broker and the appropriate key names, ensure
BROKER_CREDENTIALS (if still needed) remains sourced from sentinel-secrets
(secretKeyRef) only when required, and remove or reconcile any duplicate BROKER
sources so Pub/Sub continues to use GCP_PROJECT_ID from sentinel-config while
RabbitMQ/other brokers use hyperfleet-sentinel-broker.
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 608-638: The document inconsistently references two
observed_generation paths; normalize to a single contract by replacing the use
of resource.status.conditions.Ready.observed_generation with the canonical
resource.status.observed_generation (or vice versa if you prefer that canonical
name) everywhere in the Decision Logic and Implementation Requirements so the
generation check in Evaluate(resource, now) consistently compares
resource.generation against the same observed_generation path; update all
references, examples, and test scenario mentions and ensure the text and logging
(e.g., the generation mismatch reason string) reflect the chosen single field
name.
---
Nitpick comments:
In `@hyperfleet/components/sentinel/sentinel-deployment.md`:
- Around line 126-134: The metrics spec currently emits a fixed ready_state
label which can't convey which CEL rule matched or represent a no-match
fallback; update the metrics emission to include a rule-oriented label (e.g.,
matched_rule or decision_reason) populated with the matching rule name (use the
rule "name" values such as isReady and isNotReady) and ensure a deterministic
fallback value (e.g., "no_match") when no rule matches; apply the same change to
the other rules block referenced (lines ~209-225) so all rule-based decisions
expose the matching rule name and a clear no-match reason in metrics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d3a74e28-76b6-4274-99be-601c3ab31699
📒 Files selected for processing (4)
hyperfleet/architecture/architecture-summary.mdhyperfleet/components/sentinel/sentinel-config.yamlhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/components/sentinel/sentinel.md
- Fix sequence diagrams in architecture-summary.md to use generation-first + condition rules decision logic - Document reference_time fallback to created_time in sentinel.md - Add envFrom for hyperfleet-sentinel-broker ConfigMap in deployment - Normalize observed_generation path to resource.status.observed_generation - Update metrics label from ready_state to reason (matches implementation)
There was a problem hiding this comment.
🧹 Nitpick comments (3)
hyperfleet/components/sentinel/sentinel.md (1)
905-916: Test 9 could be clearer about reference_time fallback behavior.Test 9 documents the scenario where no condition rule matches, correctly stating it falls back to the smallest max_age. However, the test states
reference_time = now() - 15swithout explaining how this value is determined when the resource "has no Ready condition (or unexpected status)."According to line 249 in this file, when the reference_time expression evaluation fails, it should fall back to
created_time. Test 9 should clarify this to fully document the edge case behavior mentioned in the past review.🧪 Suggested test clarification
**Test 9: No condition rule matches → use smallest max_age as fallback**Given:
- Resource has no Ready condition (or unexpected status)
- resource.generation = 1
- resource.status.observed_generation = 1
- reference_time = now() - 15s
- reference_time = resource.metadata.created_time = now() - 15s # Falls back to created_time when conditionTime() fails
- No rule matches → fallback to smallest max_age (10s)
Then:- Decision: PUBLISH
- Reason: "max age expired (fallback: no rule matched)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/components/sentinel/sentinel.md` around lines 905 - 916, Update Test 9 in sentinel.md to explicitly state that when the resource "has no Ready condition (or unexpected status)" the reference_time expression evaluation fails and therefore reference_time falls back to resource.metadata.created_time (e.g., "reference_time = resource.metadata.created_time = now() - 15s # Falls back to created_time when conditionTime() fails"); keep the rest of the scenario and expected Decision/Reason unchanged so readers understand the fallback behavior described around line 249.hyperfleet/architecture/architecture-summary.md (2)
268-279: Clarify reference_time behavior when no rule matches.The decision algorithm mentions using "smallest max_age if no match" (line 275) but doesn't specify what
reference_timevalue is used in that scenario. Sincereference_timeis typically derived from a CEL expression likeconditionTime(resource, "Ready"), readers may wonder what timestamp applies when no rule matches (e.g., when the resource lacks the expected condition).Based on the past review comment indicating this was addressed in sentinel.md, consider adding a brief note here for completeness.
📝 Suggested addition
Evaluate condition rules (CEL expressions) to find matching rule max_age = matched_rule.max_age (or smallest max_age if no match) - reference_time = evaluate reference_time expression + reference_time = evaluate reference_time expression (falls back to created_time if evaluation fails) IF now >= reference_time + max_age: PublishEvent(broker, CreateEvent(resource))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/architecture/architecture-summary.md` around lines 268 - 279, The Decision Algorithm text is unclear about which reference_time to use when no rule matches; update the description for reference_time (the symbol reference_time) to state the fallback behavior: when matched_rule is absent, evaluate reference_time using the same default expression rules (e.g., evaluate the CEL expression if provided, otherwise use resource.status.observed_generation timestamp or resource.metadata.creationTimestamp as a final fallback) and document that max_age uses the smallest configured value when no match; mention the relevant symbols FetchResources, matched_rule, max_age, reference_time, PublishEvent and CreateEvent so readers can find and confirm the logic.
232-240: Document CEL custom functions or cross-reference their definitions.The configuration introduces CEL expressions using custom functions like
conditionTime(resource, "Ready")andstatus(resource, "Ready"), but this file doesn't explain what these functions do or their return types. Readers encountering this architecture summary may struggle to understand the decision logic without knowing the function semantics.📚 Suggested clarification
Add a brief note after the configuration example explaining the custom CEL functions:
conditions: reference_time: 'conditionTime(resource, "Ready")' rules: - name: isReady expression: 'status(resource, "Ready") == "True"' max_age: 30m - name: isNotReady expression: 'status(resource, "Ready") == "False"' max_age: 10s + +# CEL Custom Functions: +# - conditionTime(resource, "conditionName"): Returns the last_updated_time of the specified condition +# - status(resource, "conditionName"): Returns the status value ("True", "False", "Unknown") of the specified condition +# See sentinel.md for complete CEL function documentation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/architecture/architecture-summary.md` around lines 232 - 240, Add a short explanatory note after the conditions example that documents the custom CEL functions conditionTime(resource, "Ready") and status(resource, "Ready"): state what each function returns (e.g., conditionTime returns an RFC3339 timestamp or duration since the named condition was set; status returns the string value of the named condition such as "True"/"False"/"Unknown"), describe expected parameter types (resource object and condition name string), and either link or cross-reference the function implementations/definitions so readers can find the exact semantics and return types used by the CEL expressions in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 268-279: The Decision Algorithm text is unclear about which
reference_time to use when no rule matches; update the description for
reference_time (the symbol reference_time) to state the fallback behavior: when
matched_rule is absent, evaluate reference_time using the same default
expression rules (e.g., evaluate the CEL expression if provided, otherwise use
resource.status.observed_generation timestamp or
resource.metadata.creationTimestamp as a final fallback) and document that
max_age uses the smallest configured value when no match; mention the relevant
symbols FetchResources, matched_rule, max_age, reference_time, PublishEvent and
CreateEvent so readers can find and confirm the logic.
- Around line 232-240: Add a short explanatory note after the conditions example
that documents the custom CEL functions conditionTime(resource, "Ready") and
status(resource, "Ready"): state what each function returns (e.g., conditionTime
returns an RFC3339 timestamp or duration since the named condition was set;
status returns the string value of the named condition such as
"True"/"False"/"Unknown"), describe expected parameter types (resource object
and condition name string), and either link or cross-reference the function
implementations/definitions so readers can find the exact semantics and return
types used by the CEL expressions in this file.
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 905-916: Update Test 9 in sentinel.md to explicitly state that
when the resource "has no Ready condition (or unexpected status)" the
reference_time expression evaluation fails and therefore reference_time falls
back to resource.metadata.created_time (e.g., "reference_time =
resource.metadata.created_time = now() - 15s # Falls back to created_time when
conditionTime() fails"); keep the rest of the scenario and expected
Decision/Reason unchanged so readers understand the fallback behavior described
around line 249.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6df6ec8f-eafa-48ee-a059-4c1608fcea0d
📒 Files selected for processing (3)
hyperfleet/architecture/architecture-summary.mdhyperfleet/components/sentinel/sentinel-deployment.mdhyperfleet/components/sentinel/sentinel.md
🚧 Files skipped from review as they are similar to previous changes (1)
- hyperfleet/components/sentinel/sentinel-deployment.md
| PublishEvent(broker, CreateEvent(resource)) // immediate reconciliation | ||
| ELSE: | ||
| max_age = max_age_ready (30m) | ||
| Evaluate condition rules (CEL expressions) to find matching rule |
There was a problem hiding this comment.
With custom rules, is it not possible to avoid the generation based strategy?
I don't know which could be the reasons for that, but imagine that they come up with some rules to not trigger with a spec.generation change
There was a problem hiding this comment.
The generation check is currently hardcoded in the implementation as the highest priority — it always triggers before CEL rules and can't be disabled.
This PR is documenting the existing behavior.
| poll_interval: 5s # How often to check the API | ||
| max_age_not_ready: 10s # Wait time for resources still provisioning | ||
| max_age_ready: 30m # Wait time for stable/ready resources | ||
|
|
There was a problem hiding this comment.
IMO we should remove a config file example from the architecture repository.
This only makes harder to keep changes in sync
If someone needs this level of detail, they should go to the sentinel repository
wdyt?
There was a problem hiding this comment.
Bringing this to discussion if we should move this and sentinel-deployment.md to the sentinel repo? I mentioned you in the slack thread about this.
| - --config=/etc/sentinel/config.yaml # Path to YAML config file | ||
| - --metrics-bind-address=:9090 | ||
| - --health-probe-bind-address=:8080 | ||
| envFrom: |
There was a problem hiding this comment.
Same for this file, IMO we have the operational guide in the sentinel repository
This is a risk of drift
…sponse structure The API returns observed_generation at the condition level (status.conditions[].observed_generation), not at resource.status level. Updated all references to use condition(resource, "Available").observed_generation which represents min(observed_generation) across all required adapters.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 274-279: Update the summary algorithm to specify the same fallback
used in sentinel.md for computing reference_time: when evaluating
conditionTime(resource, "Ready") returns no timestamp or no rule matches (so
matched_rule is absent), fall back to resource.created_time (or the documented
alternative in sentinel.md) before computing reference_time + max_age;
explicitly mention reference_time, matched_rule, max_age, and
conditionTime(resource, "Ready") in the text so the no-match and no-timestamp
paths produce a well-defined reference_time used in the IF now >= reference_time
+ max_age check.
- Around line 218-222: Update the responsibilities list to reflect the new
generation-first decision model: replace references to the old trigger-based
checks (status.phase and status.last_updated_time) with the new reconciliation
decision flow that uses generation-first reconciliation plus CEL-selected
max_age and reference_time; specifically, modify the Decision Logic bullet to
state that reconciliation is triggered by new generation detection and/or
CEL-evaluated max_age/reference_time rules (using CEL expressions) and remove or
deprecate the prior reliance on status.phase/status.last_updated_time so the
summary uses a single, consistent decision model.
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 215-218: State in the main algorithm that when evaluating
condition(resource, "Available").observed_generation it must default to 0 if the
Available condition is missing (to avoid CEL/runtime errors), and treat
resource.generation > observed_generation using that default; update the rule
that references resource.generation and condition(resource,
"Available").observed_generation to explicitly say "missing Available condition
= 0 (observed_generation = 0)" so new resources are handled deterministically
(this mirrors the behavior described later in Test 8).
- Around line 24-26: Clarify ownership of timing/generation fields by explicitly
stating whether adapters write per-condition fields
(status.conditions[].last_updated_time and
status.conditions[].observed_generation) or only the top-level observed_time /
observed_generation and the API derives the per-condition values; update the
sentinel.md text to pick one model (recommended: adapters POST top-level
observed_time/observed_generation and the API populates status.conditions[]
last_updated_time and observed_generation) and change the lines describing "Uses
adapter status updates" and any subsequent contract wording to reflect that
single, unambiguous flow, referencing the symbols
status.conditions[].last_updated_time, status.conditions[].observed_generation,
observed_time, and observed_generation so readers know which side is
authoritative.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cc00851b-dcd1-4c20-952a-ecaf7ebcb1d1
📒 Files selected for processing (2)
hyperfleet/architecture/architecture-summary.mdhyperfleet/components/sentinel/sentinel.md
| 2. **Decision Logic**: Determine if resource needs reconciliation based on: | ||
| - `status.phase` (Ready vs Not Ready) | ||
| - `status.last_updated_time` (time since last adapter check) | ||
| - Configured max age intervals (10s for not-ready, 30m for ready) | ||
| - Configurable condition rules (CEL expressions) to determine max age intervals | ||
| 3. **Event Creation**: Create reconciliation event with resource context |
There was a problem hiding this comment.
This responsibilities list still describes the old trigger model.
Lines 218-222 still say Sentinel decisions are based on status.phase and status.last_updated_time, while the rest of the document now defines generation-first reconciliation plus CEL-selected max_age / reference_time. Please update this list so the architecture summary has a single decision model.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/architecture/architecture-summary.md` around lines 218 - 222,
Update the responsibilities list to reflect the new generation-first decision
model: replace references to the old trigger-based checks (status.phase and
status.last_updated_time) with the new reconciliation decision flow that uses
generation-first reconciliation plus CEL-selected max_age and reference_time;
specifically, modify the Decision Logic bullet to state that reconciliation is
triggered by new generation detection and/or CEL-evaluated
max_age/reference_time rules (using CEL expressions) and remove or deprecate the
prior reliance on status.phase/status.last_updated_time so the summary uses a
single, consistent decision model.
| Evaluate condition rules (CEL expressions) to find matching rule | ||
| max_age = matched_rule.max_age (or smallest max_age if no match) | ||
| reference_time = evaluate reference_time expression | ||
|
|
||
| IF now >= resource.status.last_updated_time + max_age: | ||
| event = CreateEvent(resource) | ||
| PublishEvent(broker, event) | ||
| IF now >= reference_time + max_age: | ||
| PublishEvent(broker, CreateEvent(resource)) |
There was a problem hiding this comment.
Carry the reference_time fallback into the summary algorithm.
If no rule matches, or conditionTime(resource, "Ready") cannot yield a timestamp, reference_time + max_age is undefined here. Please document the same fallback used in sentinel.md (for example, fall back to created_time) so the no-match path is fully specified.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/architecture/architecture-summary.md` around lines 274 - 279,
Update the summary algorithm to specify the same fallback used in sentinel.md
for computing reference_time: when evaluating conditionTime(resource, "Ready")
returns no timestamp or no rule matches (so matched_rule is absent), fall back
to resource.created_time (or the documented alternative in sentinel.md) before
computing reference_time + max_age; explicitly mention reference_time,
matched_rule, max_age, and conditionTime(resource, "Ready") in the text so the
no-match and no-timestamp paths produce a well-defined reference_time used in
the IF now >= reference_time + max_age check.
| - **Uses adapter status updates**: Reads `status.conditions[].last_updated_time` and `status.conditions[].observed_generation` (updated by adapters on every check) to determine when to create next event | ||
| - **Smart triggering**: Two-tier decision logic prioritizes spec changes (generation mismatch) over periodic health checks (conditions-based max age) | ||
| - **Configurable conditions**: CEL expressions define which conditions to evaluate and what max age intervals to apply (e.g., 10s for not-ready, 30m for ready) |
There was a problem hiding this comment.
Clarify who owns the condition-level timing/generation fields.
Lines 24-26 say adapters update status.conditions[].last_updated_time and status.conditions[].observed_generation directly, but the later contract still shows adapters POSTing top-level observed_time / observed_generation and the API deriving the resource status fields. Please make one side explicitly derived from the other; right now the write contract is ambiguous.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/components/sentinel/sentinel.md` around lines 24 - 26, Clarify
ownership of timing/generation fields by explicitly stating whether adapters
write per-condition fields (status.conditions[].last_updated_time and
status.conditions[].observed_generation) or only the top-level observed_time /
observed_generation and the API derives the per-condition values; update the
sentinel.md text to pick one model (recommended: adapters POST top-level
observed_time/observed_generation and the API populates status.conditions[]
last_updated_time and observed_generation) and change the lines describing "Uses
adapter status updates" and any subsequent contract wording to reflect that
single, unambiguous flow, referencing the symbols
status.conditions[].last_updated_time, status.conditions[].observed_generation,
observed_time, and observed_generation so readers know which side is
authoritative.
| 1. **Generation Mismatch** (HIGHEST PRIORITY - immediate reconciliation): | ||
| - Resource generation > resource.status.observed_generation | ||
| - `resource.generation` > `condition(resource, "Available").observed_generation` | ||
| - Reason: User changed the spec (e.g., scaled nodes), requires immediate reconciliation | ||
| - Example: Cluster generation=2, observed_generation=1 → publish immediately | ||
| - Example: Cluster generation=2, Available.observed_generation=1 → publish immediately |
There was a problem hiding this comment.
Define the missing-Available behavior in the main algorithm.
The primary decision rule assumes condition(resource, "Available").observed_generation exists, but the null/initial-state behavior is only described later in Test 8. Please state here that a missing Available condition is treated as 0 (or equivalent) instead of a CEL/runtime error, otherwise brand-new resources are underspecified.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/components/sentinel/sentinel.md` around lines 215 - 218, State in
the main algorithm that when evaluating condition(resource,
"Available").observed_generation it must default to 0 if the Available condition
is missing (to avoid CEL/runtime errors), and treat resource.generation >
observed_generation using that default; update the rule that references
resource.generation and condition(resource, "Available").observed_generation to
explicitly say "missing Available condition = 0 (observed_generation = 0)" so
new resources are handled deterministically (this mirrors the behavior described
later in Test 8).
Summary
max_age_not_ready/max_age_readywith configurableconditionssection using CEL expressionsarchitecture-summary.mdfor consistencyFiles Changed
sentinel-config.yamlconditionsblock replacesmax_age_not_ready/max_age_readysentinel.mdsentinel-deployment.mdarchitecture-summary.mdRelated
Test plan
max_age_not_ready/max_age_readyin the repositoryconditionsformatSummary by CodeRabbit
Documentation
New Features
Chores