Skip to content

HYPERFLEET-537 - docs: configurable conditions-based decision making#102

Draft
rafabene wants to merge 3 commits intomainfrom
HYPERFLEET-537
Draft

HYPERFLEET-537 - docs: configurable conditions-based decision making#102
rafabene wants to merge 3 commits intomainfrom
HYPERFLEET-537

Conversation

@rafabene
Copy link
Contributor

@rafabene rafabene commented Mar 11, 2026

Summary

  • Replace hardcoded max_age_not_ready/max_age_ready with configurable conditions section using CEL expressions
  • Update sentinel flowchart to include CEL compilation at startup and conditions-based evaluation flow
  • Update Decision Engine description, test scenarios, and config examples across all sentinel docs
  • Update architecture-summary.md for consistency

Files Changed

File Change
sentinel-config.yaml New conditions block replaces max_age_not_ready/max_age_ready
sentinel.md Flowchart, Decision Engine, Decision Logic, Config Loader, Resource Filtering examples, Test Scenarios
sentinel-deployment.md ConfigMap updated with new config format
architecture-summary.md Config examples and Decision Algorithm updated

Related

Test plan

  • No remaining references to max_age_not_ready/max_age_ready in the repository
  • All YAML config examples use consistent conditions format
  • Flowchart renders correctly with new CEL compilation step
  • Test scenarios include new edge cases (no rule match fallback, CEL compilation failure)

Summary by CodeRabbit

  • Documentation

    • Updated Sentinel docs and diagrams to describe condition-based reconciliation, reference_time semantics, generation-priority behavior, and expression validation.
  • New Features

    • Condition-driven reconciliation rules using CEL expressions with per-rule max_age, ordered evaluation, and immediate reconciliation on generation change.
    • Per-config/regional variations, label-based resource selection, and configurable CloudEvent composition.
    • Externalized broker credentials/configuration and HyperFleet API integration options.
  • Chores

    • Metrics adjusted to report rule-based reasons and added config-loads metric.

…ns-based decision making

Replace hardcoded max_age_not_ready/max_age_ready references with
configurable conditions using CEL expressions (reference_time + rules
with name/expression/max_age), aligning with the implementation in
hyperfleet-sentinel PR #72.
@openshift-ci openshift-ci bot requested review from rh-amarin and xueli181114 March 11, 2026 19:32
@openshift-ci
Copy link

openshift-ci bot commented Mar 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tirthct for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Walkthrough

This change replaces fixed max-age constants with a CEL-driven conditions system for reconciliation. Configs add a conditions block (reference_time plus ordered named CEL rules mapping expressions to max_age), resource_selector, hyperfleet_api, and message_data mappings. At runtime the controller compiles CEL expressions at startup (fail-fast). Reconciliation still publishes immediately when resource.generation > resource.status.observed_generation; otherwise it evaluates CEL rules to select max_age, computes nextEventTime = reference_time + max_age, and publishes when now >= nextEventTime. Per-deployment configs may override rules/max_age.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Configuration
    participant CELEngine as CEL Engine
    participant Monitor as Resource Monitor
    participant Evaluator as Rule Evaluator
    participant Broker as Event Broker

    Config->>CELEngine: Load & compile conditions (fail-fast)
    CELEngine-->>Config: Compilation result

    Monitor->>Evaluator: Poll resource / read status
    Evaluator->>Evaluator: Compare resource.generation vs observed_generation

    alt generation > observed_generation
        Evaluator->>Broker: Publish CreateEvent (immediate)
    else generation == observed_generation
        Evaluator->>CELEngine: Evaluate ordered CEL rules
        CELEngine-->>Evaluator: Matched rule + max_age, reference_time
        Evaluator->>Evaluator: Compute nextEventTime = reference_time + max_age
        alt now >= nextEventTime
            Evaluator->>Broker: Publish Reconciliation Event
        end
    end

    Broker-->>Broker: Queue/forward event
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: introducing configurable conditions-based decision making to replace fixed max-age constants, which is reflected across all documentation and configuration files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch HYPERFLEET-537
📝 Coding Plan for PR comments
  • Generate coding plan

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
hyperfleet/components/sentinel/sentinel.md (1)

608-638: ⚠️ Potential issue | 🟠 Major

Use one observed_generation path throughout the spec.

This section compares against resource.status.conditions.Ready.observed_generation, but the rest of the page and the test scenarios use resource.status.observed_generation. Please normalize on a single contract here; otherwise the Decision Engine description is ambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 608 - 638, The
document inconsistently references two observed_generation paths; normalize to a
single contract by replacing the use of
resource.status.conditions.Ready.observed_generation with the canonical
resource.status.observed_generation (or vice versa if you prefer that canonical
name) everywhere in the Decision Logic and Implementation Requirements so the
generation check in Evaluate(resource, now) consistently compares
resource.generation against the same observed_generation path; update all
references, examples, and test scenario mentions and ensure the text and logging
(e.g., the generation mismatch reason string) reflect the chosen single field
name.
hyperfleet/components/sentinel/sentinel-deployment.md (1)

43-61: ⚠️ Potential issue | 🟠 Major

The pod spec still doesn't consume the documented broker ConfigMap.

This file now introduces hyperfleet-sentinel-broker, but the Deployment only pulls GCP_PROJECT_ID from sentinel-config and never imports any BROKER_* keys. As written, the RabbitMQ example never reaches the container environment, and Pub/Sub has two competing config sources.

Also applies to: 121-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 43 - 61,
The Deployment pod spec is missing env entries that consume the new
hyperfleet-sentinel-broker ConfigMap so BROKER_* values never reach the
container; update the env block to add the required BROKER_* env vars (e.g.,
BROKER_TYPE, BROKER_URL, BROKER_USER, BROKER_PASSWORD or whatever keys your
broker consumer expects) with valueFrom: configMapKeyRef name:
hyperfleet-sentinel-broker and the appropriate key names, ensure
BROKER_CREDENTIALS (if still needed) remains sourced from sentinel-secrets
(secretKeyRef) only when required, and remove or reconcile any duplicate BROKER
sources so Pub/Sub continues to use GCP_PROJECT_ID from sentinel-config while
RabbitMQ/other brokers use hyperfleet-sentinel-broker.
🧹 Nitpick comments (1)
hyperfleet/components/sentinel/sentinel-deployment.md (1)

126-134: Expose rule-oriented labels in the metrics spec.

Once skip/publish decisions come from arbitrary CEL rules, a fixed ready_state={ready,not_ready} label no longer tells operators which rule matched and cannot represent the no-match fallback. A matched_rule or decision_reason label would fit the new configuration model much better.

Also applies to: 209-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 126 -
134, The metrics spec currently emits a fixed ready_state label which can't
convey which CEL rule matched or represent a no-match fallback; update the
metrics emission to include a rule-oriented label (e.g., matched_rule or
decision_reason) populated with the matching rule name (use the rule "name"
values such as isReady and isNotReady) and ensure a deterministic fallback value
(e.g., "no_match") when no rule matches; apply the same change to the other
rules block referenced (lines ~209-225) so all rule-based decisions expose the
matching rule name and a clear no-match reason in metrics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 268-279: Update the later sequence diagrams so their decision
boxes match the documented Decision Algorithm: change the checks to first
compare resource.generation vs resource.status.observed_generation
(generation-first reconciliation), then evaluate CEL condition rules to select
matched_rule.max_age (or smallest max_age if none), compute reference_time from
the rule expression, and use now >= reference_time + max_age to trigger
PublishEvent(CreateEvent(resource)); replace any hardcoded phase != "Ready" or
last_updated_time + 10s logic with these same symbols (FetchResources,
resource.generation, resource.status.observed_generation, matched_rule.max_age,
reference_time, PublishEvent, CreateEvent, CEL) so the diagrams and prose are
consistent.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 238-255: Update the doc to specify the timestamp fallback used
when no rule matches: state that reference_time (used by nextEventTime) will use
conditionTime(resource, "<conditionName>") if available, otherwise fall back to
resource.metadata.creationTimestamp (and if that is missing, use the resource's
observedGeneration/now as a last resort); mention this behavior wherever
reference_time, nextEventTime, or conditionTime are described (including the
default-rule/no-match section) so Test 9’s scenario is fully specified. Ensure
the prose notes that rules still pick the smallest max_age when none match but
that the timestamp for ROI calculation falls back as described.

---

Outside diff comments:
In `@hyperfleet/components/sentinel/sentinel-deployment.md`:
- Around line 43-61: The Deployment pod spec is missing env entries that consume
the new hyperfleet-sentinel-broker ConfigMap so BROKER_* values never reach the
container; update the env block to add the required BROKER_* env vars (e.g.,
BROKER_TYPE, BROKER_URL, BROKER_USER, BROKER_PASSWORD or whatever keys your
broker consumer expects) with valueFrom: configMapKeyRef name:
hyperfleet-sentinel-broker and the appropriate key names, ensure
BROKER_CREDENTIALS (if still needed) remains sourced from sentinel-secrets
(secretKeyRef) only when required, and remove or reconcile any duplicate BROKER
sources so Pub/Sub continues to use GCP_PROJECT_ID from sentinel-config while
RabbitMQ/other brokers use hyperfleet-sentinel-broker.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 608-638: The document inconsistently references two
observed_generation paths; normalize to a single contract by replacing the use
of resource.status.conditions.Ready.observed_generation with the canonical
resource.status.observed_generation (or vice versa if you prefer that canonical
name) everywhere in the Decision Logic and Implementation Requirements so the
generation check in Evaluate(resource, now) consistently compares
resource.generation against the same observed_generation path; update all
references, examples, and test scenario mentions and ensure the text and logging
(e.g., the generation mismatch reason string) reflect the chosen single field
name.

---

Nitpick comments:
In `@hyperfleet/components/sentinel/sentinel-deployment.md`:
- Around line 126-134: The metrics spec currently emits a fixed ready_state
label which can't convey which CEL rule matched or represent a no-match
fallback; update the metrics emission to include a rule-oriented label (e.g.,
matched_rule or decision_reason) populated with the matching rule name (use the
rule "name" values such as isReady and isNotReady) and ensure a deterministic
fallback value (e.g., "no_match") when no rule matches; apply the same change to
the other rules block referenced (lines ~209-225) so all rule-based decisions
expose the matching rule name and a clear no-match reason in metrics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d3a74e28-76b6-4274-99be-601c3ab31699

📥 Commits

Reviewing files that changed from the base of the PR and between e40d5bc and a8fbe6b.

📒 Files selected for processing (4)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/sentinel/sentinel-config.yaml
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/components/sentinel/sentinel.md

- Fix sequence diagrams in architecture-summary.md to use
  generation-first + condition rules decision logic
- Document reference_time fallback to created_time in sentinel.md
- Add envFrom for hyperfleet-sentinel-broker ConfigMap in deployment
- Normalize observed_generation path to resource.status.observed_generation
- Update metrics label from ready_state to reason (matches implementation)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
hyperfleet/components/sentinel/sentinel.md (1)

905-916: Test 9 could be clearer about reference_time fallback behavior.

Test 9 documents the scenario where no condition rule matches, correctly stating it falls back to the smallest max_age. However, the test states reference_time = now() - 15s without explaining how this value is determined when the resource "has no Ready condition (or unexpected status)."

According to line 249 in this file, when the reference_time expression evaluation fails, it should fall back to created_time. Test 9 should clarify this to fully document the edge case behavior mentioned in the past review.

🧪 Suggested test clarification
 **Test 9: No condition rule matches → use smallest max_age as fallback**

Given:

  • Resource has no Ready condition (or unexpected status)
  • resource.generation = 1
  • resource.status.observed_generation = 1
    • reference_time = now() - 15s
    • reference_time = resource.metadata.created_time = now() - 15s # Falls back to created_time when conditionTime() fails
    • No rule matches → fallback to smallest max_age (10s)
      Then:
    • Decision: PUBLISH
    • Reason: "max age expired (fallback: no rule matched)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 905 - 916, Update
Test 9 in sentinel.md to explicitly state that when the resource "has no Ready
condition (or unexpected status)" the reference_time expression evaluation fails
and therefore reference_time falls back to resource.metadata.created_time (e.g.,
"reference_time = resource.metadata.created_time = now() - 15s  # Falls back to
created_time when conditionTime() fails"); keep the rest of the scenario and
expected Decision/Reason unchanged so readers understand the fallback behavior
described around line 249.
hyperfleet/architecture/architecture-summary.md (2)

268-279: Clarify reference_time behavior when no rule matches.

The decision algorithm mentions using "smallest max_age if no match" (line 275) but doesn't specify what reference_time value is used in that scenario. Since reference_time is typically derived from a CEL expression like conditionTime(resource, "Ready"), readers may wonder what timestamp applies when no rule matches (e.g., when the resource lacks the expected condition).

Based on the past review comment indicating this was addressed in sentinel.md, consider adding a brief note here for completeness.

📝 Suggested addition
   Evaluate condition rules (CEL expressions) to find matching rule
   max_age = matched_rule.max_age (or smallest max_age if no match)
-  reference_time = evaluate reference_time expression
+  reference_time = evaluate reference_time expression (falls back to created_time if evaluation fails)

   IF now >= reference_time + max_age:
     PublishEvent(broker, CreateEvent(resource))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/architecture/architecture-summary.md` around lines 268 - 279, The
Decision Algorithm text is unclear about which reference_time to use when no
rule matches; update the description for reference_time (the symbol
reference_time) to state the fallback behavior: when matched_rule is absent,
evaluate reference_time using the same default expression rules (e.g., evaluate
the CEL expression if provided, otherwise use
resource.status.observed_generation timestamp or
resource.metadata.creationTimestamp as a final fallback) and document that
max_age uses the smallest configured value when no match; mention the relevant
symbols FetchResources, matched_rule, max_age, reference_time, PublishEvent and
CreateEvent so readers can find and confirm the logic.

232-240: Document CEL custom functions or cross-reference their definitions.

The configuration introduces CEL expressions using custom functions like conditionTime(resource, "Ready") and status(resource, "Ready"), but this file doesn't explain what these functions do or their return types. Readers encountering this architecture summary may struggle to understand the decision logic without knowing the function semantics.

📚 Suggested clarification

Add a brief note after the configuration example explaining the custom CEL functions:

 conditions:
   reference_time: 'conditionTime(resource, "Ready")'
   rules:
     - name: isReady
       expression: 'status(resource, "Ready") == "True"'
       max_age: 30m
     - name: isNotReady
       expression: 'status(resource, "Ready") == "False"'
       max_age: 10s
+
+# CEL Custom Functions:
+# - conditionTime(resource, "conditionName"): Returns the last_updated_time of the specified condition
+# - status(resource, "conditionName"): Returns the status value ("True", "False", "Unknown") of the specified condition
+# See sentinel.md for complete CEL function documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/architecture/architecture-summary.md` around lines 232 - 240, Add
a short explanatory note after the conditions example that documents the custom
CEL functions conditionTime(resource, "Ready") and status(resource, "Ready"):
state what each function returns (e.g., conditionTime returns an RFC3339
timestamp or duration since the named condition was set; status returns the
string value of the named condition such as "True"/"False"/"Unknown"), describe
expected parameter types (resource object and condition name string), and either
link or cross-reference the function implementations/definitions so readers can
find the exact semantics and return types used by the CEL expressions in this
file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 268-279: The Decision Algorithm text is unclear about which
reference_time to use when no rule matches; update the description for
reference_time (the symbol reference_time) to state the fallback behavior: when
matched_rule is absent, evaluate reference_time using the same default
expression rules (e.g., evaluate the CEL expression if provided, otherwise use
resource.status.observed_generation timestamp or
resource.metadata.creationTimestamp as a final fallback) and document that
max_age uses the smallest configured value when no match; mention the relevant
symbols FetchResources, matched_rule, max_age, reference_time, PublishEvent and
CreateEvent so readers can find and confirm the logic.
- Around line 232-240: Add a short explanatory note after the conditions example
that documents the custom CEL functions conditionTime(resource, "Ready") and
status(resource, "Ready"): state what each function returns (e.g., conditionTime
returns an RFC3339 timestamp or duration since the named condition was set;
status returns the string value of the named condition such as
"True"/"False"/"Unknown"), describe expected parameter types (resource object
and condition name string), and either link or cross-reference the function
implementations/definitions so readers can find the exact semantics and return
types used by the CEL expressions in this file.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 905-916: Update Test 9 in sentinel.md to explicitly state that
when the resource "has no Ready condition (or unexpected status)" the
reference_time expression evaluation fails and therefore reference_time falls
back to resource.metadata.created_time (e.g., "reference_time =
resource.metadata.created_time = now() - 15s  # Falls back to created_time when
conditionTime() fails"); keep the rest of the scenario and expected
Decision/Reason unchanged so readers understand the fallback behavior described
around line 249.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6df6ec8f-eafa-48ee-a059-4c1608fcea0d

📥 Commits

Reviewing files that changed from the base of the PR and between a8fbe6b and d2f2665.

📒 Files selected for processing (3)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/components/sentinel/sentinel.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/components/sentinel/sentinel-deployment.md

PublishEvent(broker, CreateEvent(resource)) // immediate reconciliation
ELSE:
max_age = max_age_ready (30m)
Evaluate condition rules (CEL expressions) to find matching rule
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With custom rules, is it not possible to avoid the generation based strategy?
I don't know which could be the reasons for that, but imagine that they come up with some rules to not trigger with a spec.generation change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generation check is currently hardcoded in the implementation as the highest priority — it always triggers before CEL rules and can't be disabled.
This PR is documenting the existing behavior.

poll_interval: 5s # How often to check the API
max_age_not_ready: 10s # Wait time for resources still provisioning
max_age_ready: 30m # Wait time for stable/ready resources

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should remove a config file example from the architecture repository.
This only makes harder to keep changes in sync
If someone needs this level of detail, they should go to the sentinel repository
wdyt?

Copy link
Contributor Author

@rafabene rafabene Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bringing this to discussion if we should move this and sentinel-deployment.md to the sentinel repo? I mentioned you in the slack thread about this.

- --config=/etc/sentinel/config.yaml # Path to YAML config file
- --metrics-bind-address=:9090
- --health-probe-bind-address=:8080
envFrom:
Copy link
Collaborator

@rh-amarin rh-amarin Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this file, IMO we have the operational guide in the sentinel repository
This is a risk of drift

…sponse structure

The API returns observed_generation at the condition level
(status.conditions[].observed_generation), not at resource.status level.
Updated all references to use condition(resource, "Available").observed_generation
which represents min(observed_generation) across all required adapters.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 274-279: Update the summary algorithm to specify the same fallback
used in sentinel.md for computing reference_time: when evaluating
conditionTime(resource, "Ready") returns no timestamp or no rule matches (so
matched_rule is absent), fall back to resource.created_time (or the documented
alternative in sentinel.md) before computing reference_time + max_age;
explicitly mention reference_time, matched_rule, max_age, and
conditionTime(resource, "Ready") in the text so the no-match and no-timestamp
paths produce a well-defined reference_time used in the IF now >= reference_time
+ max_age check.
- Around line 218-222: Update the responsibilities list to reflect the new
generation-first decision model: replace references to the old trigger-based
checks (status.phase and status.last_updated_time) with the new reconciliation
decision flow that uses generation-first reconciliation plus CEL-selected
max_age and reference_time; specifically, modify the Decision Logic bullet to
state that reconciliation is triggered by new generation detection and/or
CEL-evaluated max_age/reference_time rules (using CEL expressions) and remove or
deprecate the prior reliance on status.phase/status.last_updated_time so the
summary uses a single, consistent decision model.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 215-218: State in the main algorithm that when evaluating
condition(resource, "Available").observed_generation it must default to 0 if the
Available condition is missing (to avoid CEL/runtime errors), and treat
resource.generation > observed_generation using that default; update the rule
that references resource.generation and condition(resource,
"Available").observed_generation to explicitly say "missing Available condition
= 0 (observed_generation = 0)" so new resources are handled deterministically
(this mirrors the behavior described later in Test 8).
- Around line 24-26: Clarify ownership of timing/generation fields by explicitly
stating whether adapters write per-condition fields
(status.conditions[].last_updated_time and
status.conditions[].observed_generation) or only the top-level observed_time /
observed_generation and the API derives the per-condition values; update the
sentinel.md text to pick one model (recommended: adapters POST top-level
observed_time/observed_generation and the API populates status.conditions[]
last_updated_time and observed_generation) and change the lines describing "Uses
adapter status updates" and any subsequent contract wording to reflect that
single, unambiguous flow, referencing the symbols
status.conditions[].last_updated_time, status.conditions[].observed_generation,
observed_time, and observed_generation so readers know which side is
authoritative.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cc00851b-dcd1-4c20-952a-ecaf7ebcb1d1

📥 Commits

Reviewing files that changed from the base of the PR and between d2f2665 and c4a18ec.

📒 Files selected for processing (2)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/sentinel/sentinel.md

Comment on lines 218 to 222
2. **Decision Logic**: Determine if resource needs reconciliation based on:
- `status.phase` (Ready vs Not Ready)
- `status.last_updated_time` (time since last adapter check)
- Configured max age intervals (10s for not-ready, 30m for ready)
- Configurable condition rules (CEL expressions) to determine max age intervals
3. **Event Creation**: Create reconciliation event with resource context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This responsibilities list still describes the old trigger model.

Lines 218-222 still say Sentinel decisions are based on status.phase and status.last_updated_time, while the rest of the document now defines generation-first reconciliation plus CEL-selected max_age / reference_time. Please update this list so the architecture summary has a single decision model.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/architecture/architecture-summary.md` around lines 218 - 222,
Update the responsibilities list to reflect the new generation-first decision
model: replace references to the old trigger-based checks (status.phase and
status.last_updated_time) with the new reconciliation decision flow that uses
generation-first reconciliation plus CEL-selected max_age and reference_time;
specifically, modify the Decision Logic bullet to state that reconciliation is
triggered by new generation detection and/or CEL-evaluated
max_age/reference_time rules (using CEL expressions) and remove or deprecate the
prior reliance on status.phase/status.last_updated_time so the summary uses a
single, consistent decision model.

Comment on lines +274 to +279
Evaluate condition rules (CEL expressions) to find matching rule
max_age = matched_rule.max_age (or smallest max_age if no match)
reference_time = evaluate reference_time expression

IF now >= resource.status.last_updated_time + max_age:
event = CreateEvent(resource)
PublishEvent(broker, event)
IF now >= reference_time + max_age:
PublishEvent(broker, CreateEvent(resource))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Carry the reference_time fallback into the summary algorithm.

If no rule matches, or conditionTime(resource, "Ready") cannot yield a timestamp, reference_time + max_age is undefined here. Please document the same fallback used in sentinel.md (for example, fall back to created_time) so the no-match path is fully specified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/architecture/architecture-summary.md` around lines 274 - 279,
Update the summary algorithm to specify the same fallback used in sentinel.md
for computing reference_time: when evaluating conditionTime(resource, "Ready")
returns no timestamp or no rule matches (so matched_rule is absent), fall back
to resource.created_time (or the documented alternative in sentinel.md) before
computing reference_time + max_age; explicitly mention reference_time,
matched_rule, max_age, and conditionTime(resource, "Ready") in the text so the
no-match and no-timestamp paths produce a well-defined reference_time used in
the IF now >= reference_time + max_age check.

Comment on lines +24 to +26
- **Uses adapter status updates**: Reads `status.conditions[].last_updated_time` and `status.conditions[].observed_generation` (updated by adapters on every check) to determine when to create next event
- **Smart triggering**: Two-tier decision logic prioritizes spec changes (generation mismatch) over periodic health checks (conditions-based max age)
- **Configurable conditions**: CEL expressions define which conditions to evaluate and what max age intervals to apply (e.g., 10s for not-ready, 30m for ready)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clarify who owns the condition-level timing/generation fields.

Lines 24-26 say adapters update status.conditions[].last_updated_time and status.conditions[].observed_generation directly, but the later contract still shows adapters POSTing top-level observed_time / observed_generation and the API deriving the resource status fields. Please make one side explicitly derived from the other; right now the write contract is ambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 24 - 26, Clarify
ownership of timing/generation fields by explicitly stating whether adapters
write per-condition fields (status.conditions[].last_updated_time and
status.conditions[].observed_generation) or only the top-level observed_time /
observed_generation and the API derives the per-condition values; update the
sentinel.md text to pick one model (recommended: adapters POST top-level
observed_time/observed_generation and the API populates status.conditions[]
last_updated_time and observed_generation) and change the lines describing "Uses
adapter status updates" and any subsequent contract wording to reflect that
single, unambiguous flow, referencing the symbols
status.conditions[].last_updated_time, status.conditions[].observed_generation,
observed_time, and observed_generation so readers know which side is
authoritative.

Comment on lines 215 to +218
1. **Generation Mismatch** (HIGHEST PRIORITY - immediate reconciliation):
- Resource generation > resource.status.observed_generation
- `resource.generation` > `condition(resource, "Available").observed_generation`
- Reason: User changed the spec (e.g., scaled nodes), requires immediate reconciliation
- Example: Cluster generation=2, observed_generation=1 → publish immediately
- Example: Cluster generation=2, Available.observed_generation=1 → publish immediately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Define the missing-Available behavior in the main algorithm.

The primary decision rule assumes condition(resource, "Available").observed_generation exists, but the null/initial-state behavior is only described later in Test 8. Please state here that a missing Available condition is treated as 0 (or equivalent) instead of a CEL/runtime error, otherwise brand-new resources are underspecified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 215 - 218, State in
the main algorithm that when evaluating condition(resource,
"Available").observed_generation it must default to 0 if the Available condition
is missing (to avoid CEL/runtime errors), and treat resource.generation >
observed_generation using that default; update the rule that references
resource.generation and condition(resource, "Available").observed_generation to
explicitly say "missing Available condition = 0 (observed_generation = 0)" so
new resources are handled deterministically (this mirrors the behavior described
later in Test 8).

@rafabene rafabene marked this pull request as draft March 12, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants