Skip to content

HYPERFLEET-707 - doc: update adapter configs to add time-based stability precondition#20

Open
86254860 wants to merge 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-707
Open

HYPERFLEET-707 - doc: update adapter configs to add time-based stability precondition#20
86254860 wants to merge 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-707

Conversation

@86254860
Copy link
Contributor

@86254860 86254860 commented Mar 13, 2026

Summary by CodeRabbit

  • Configuration Updates

    • Added a 300s TTL-based expiry threshold for readiness-based self-healing (configurable).
  • Improvements

    • Switched readiness checks to boolean-style evaluation for clearer not-ready detection.
    • Validation now triggers based on either sustained not-ready state or TTL expiry, improving automated recovery and lifecycle handling.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Walkthrough

Two Helm adapter task configuration files were modified to replace string-based ready status checks with boolean and TTL-based preconditions. Each file renames the readyConditionStatus precondition (to clusterNotReady and nodepoolNotReady), removes the previous structured conditions block, adds a TTL precondition (clusterReadyTTL / nodepoolReadyTTL) that computes seconds since the Ready condition's lastTransitionTime (defaulting to now()), and updates validationCheck to require the not-ready boolean OR the TTL expiry (300s). Ready evaluation now treats non-True as not-ready and defaults to true when data is insufficient.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: updating adapter configuration files to add a time-based stability precondition (TTL-based self-healing).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@helm/adapter3/adapter-task-config.yaml`:
- Around line 64-67: Update the inline comment above the precondition expression
to reflect the actual comparison used in the expression: change the text to
indicate ">= TTL_EXPIRY_THRESHOLD seconds" (i.e., passing when stable for
greater than or equal to the threshold) so the comment matches the logic using
readyCondition, readyCondition.last_transition_time and TTL_EXPIRY_THRESHOLD in
the expression.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8fa61e23-328c-4483-9b6c-bb7dd38ec389

📥 Commits

Reviewing files that changed from the base of the PR and between 78a4f67 and 8d502f8.

📒 Files selected for processing (2)
  • helm/adapter1/adapter-task-config.yaml
  • helm/adapter3/adapter-task-config.yaml

Comment on lines +64 to +67
# Precondition passes if cluster is NOT Ready OR if cluster is Ready and stable for >TTL_EXPIRY_THRESHOLD seconds since last transition (enables self-healing)
expression: |
readyConditionStatus == "False"
readyCondition == null || readyCondition.status != "True" ||
(timestamp(currentTime) - timestamp(readyCondition.last_transition_time)).getSeconds() >= TTL_EXPIRY_THRESHOLD
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor comment/code mismatch.

The comment states >TTL_EXPIRY_THRESHOLD seconds but the expression uses >=. The code behavior (passing at exactly the threshold) is likely the intended behavior, so consider updating the comment to >=TTL_EXPIRY_THRESHOLD for accuracy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helm/adapter3/adapter-task-config.yaml` around lines 64 - 67, Update the
inline comment above the precondition expression to reflect the actual
comparison used in the expression: change the text to indicate ">=
TTL_EXPIRY_THRESHOLD seconds" (i.e., passing when stable for greater than or
equal to the threshold) so the comment matches the logic using readyCondition,
readyCondition.last_transition_time and TTL_EXPIRY_THRESHOLD in the expression.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@helm/adapter1/adapter-task-config.yaml`:
- Around line 42-53: clusterNotReady and clusterReadyTTL assume
status.conditions and last_transition_time exist; make the CEL expressions
null-safe by first checking status.conditions != null and that the filtered list
has >0 elements before indexing, and for clusterReadyTTL also check the
existence of last_transition_time and fall back to now() if missing. Update the
clusterNotReady expression used in the "clusterNotReady" rule to:
status.conditions != null && status.conditions.filter(c, c.type ==
"Ready").size() > 0 ? status.conditions.filter(c, c.type == "Ready")[0].status
!= "True" : true; and update "clusterReadyTTL" to compute the timestamp using a
guarded last_transition_time: (timestamp(now()) - timestamp(status.conditions !=
null && status.conditions.filter(c, c.type == "Ready").size() > 0 &&
status.conditions.filter(c, c.type == "Ready")[0].last_transition_time != null ?
status.conditions.filter(c, c.type == "Ready")[0].last_transition_time :
now())).getSeconds() >= 300 so accesses in clusterReadyTTL and clusterNotReady
(status.conditions and last_transition_time) are protected.

In `@helm/adapter3/adapter-task-config.yaml`:
- Around line 47-58: The CEL expressions for nodepoolNotReady and
nodepoolReadyTTL assume status.conditions and last_transition_time exist; make
them defensive by first checking status and conditions are present and non-empty
and that last_transition_time is present before indexing or calling timestamp.
Update the nodepoolNotReady expression to use a guard like status != null &&
status.conditions != null && status.conditions.size() > 0 ?
status.conditions.filter(c, c.type == "Ready")[0].status != "True" : true, and
update nodepoolReadyTTL to compute the reference time only when
status/conditions and last_transition_time exist (fallback to now() otherwise),
e.g. check status != null && status.conditions != null &&
status.conditions.size() > 0 && status.conditions.filter(c, c.type ==
"Ready")[0].last_transition_time != null ? timestamp(status.conditions.filter(c,
c.type == "Ready")[0].last_transition_time) : timestamp(now()), then compare
getSeconds() >= 300; apply these guards in the nodepoolNotReady and
nodepoolReadyTTL expressions to avoid runtime errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c6898181-f2c9-48b0-9b31-0822173bc160

📥 Commits

Reviewing files that changed from the base of the PR and between 8d502f8 and 04cff6b.

📒 Files selected for processing (2)
  • helm/adapter1/adapter-task-config.yaml
  • helm/adapter3/adapter-task-config.yaml

Comment on lines +42 to +53
- name: "clusterNotReady"
expression: |
status.conditions.filter(c, c.type == "Ready").size() > 0
? status.conditions.filter(c, c.type == "Ready")[0].status
: "False"
# Structured conditions with valid operators
conditions:
- field: "readyConditionStatus"
operator: "equals"
value: "False"
? status.conditions.filter(c, c.type == "Ready")[0].status != "True"
: true
- name: "clusterReadyTTL"
expression: |
(timestamp(now()) - timestamp(
status.conditions.filter(c, c.type == "Ready").size() > 0
? status.conditions.filter(c, c.type == "Ready")[0].last_transition_time
: now()
)).getSeconds() >= 300
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard status.conditions/last_transition_time access to avoid brittle preconditions.

Line 44 and Line 50 assume status.conditions is always present, and Line 51 assumes last_transition_time exists. Partial API responses can make these expressions fail at runtime instead of cleanly falling back.

Suggested hardening
         - name: "clusterNotReady"
           expression: |
-            status.conditions.filter(c, c.type == "Ready").size() > 0
-              ? status.conditions.filter(c, c.type == "Ready")[0].status != "True"
+            status.?conditions.orValue([]).filter(c, c.type == "Ready").size() > 0
+              ? status.?conditions.orValue([]).filter(c, c.type == "Ready")[0].status != "True"
               : true
         - name: "clusterReadyTTL"
           expression: |
             (timestamp(now()) - timestamp(
-              status.conditions.filter(c, c.type == "Ready").size() > 0
-                ? status.conditions.filter(c, c.type == "Ready")[0].last_transition_time
+              status.?conditions.orValue([]).filter(c, c.type == "Ready").size() > 0 &&
+              has(status.?conditions.orValue([]).filter(c, c.type == "Ready")[0].last_transition_time)
+                ? status.?conditions.orValue([]).filter(c, c.type == "Ready")[0].last_transition_time
                 : now()
             )).getSeconds() >= 300

As per coding guidelines, **: -Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helm/adapter1/adapter-task-config.yaml` around lines 42 - 53, clusterNotReady
and clusterReadyTTL assume status.conditions and last_transition_time exist;
make the CEL expressions null-safe by first checking status.conditions != null
and that the filtered list has >0 elements before indexing, and for
clusterReadyTTL also check the existence of last_transition_time and fall back
to now() if missing. Update the clusterNotReady expression used in the
"clusterNotReady" rule to: status.conditions != null &&
status.conditions.filter(c, c.type == "Ready").size() > 0 ?
status.conditions.filter(c, c.type == "Ready")[0].status != "True" : true; and
update "clusterReadyTTL" to compute the timestamp using a guarded
last_transition_time: (timestamp(now()) - timestamp(status.conditions != null &&
status.conditions.filter(c, c.type == "Ready").size() > 0 &&
status.conditions.filter(c, c.type == "Ready")[0].last_transition_time != null ?
status.conditions.filter(c, c.type == "Ready")[0].last_transition_time :
now())).getSeconds() >= 300 so accesses in clusterReadyTTL and clusterNotReady
(status.conditions and last_transition_time) are protected.

Comment on lines +47 to +58
- name: "nodepoolNotReady"
expression: |
status.conditions.filter(c, c.type == "Ready").size() > 0
? status.conditions.filter(c, c.type == "Ready")[0].status
: "False"
# Structured conditions with valid operators
conditions:
- field: "readyConditionStatus"
operator: "equals"
value: "False"
? status.conditions.filter(c, c.type == "Ready")[0].status != "True"
: true
- name: "nodepoolReadyTTL"
expression: |
(timestamp(now()) - timestamp(
status.conditions.filter(c, c.type == "Ready").size() > 0
? status.conditions.filter(c, c.type == "Ready")[0].last_transition_time
: now()
)).getSeconds() >= 300
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle missing readiness fields defensively to prevent CEL runtime failures.

Line 49 and Line 55 assume status.conditions exists, and Line 56 assumes last_transition_time exists. If either is missing in API payloads, precondition evaluation can error and block execution instead of falling back.

Suggested hardening
         - name: "nodepoolNotReady"
           expression: |
-            status.conditions.filter(c, c.type == "Ready").size() > 0
-              ? status.conditions.filter(c, c.type == "Ready")[0].status != "True"
+            status.?conditions.orValue([]).filter(c, c.type == "Ready").size() > 0
+              ? status.?conditions.orValue([]).filter(c, c.type == "Ready")[0].status != "True"
               : true
         - name: "nodepoolReadyTTL"
           expression: |
             (timestamp(now()) - timestamp(
-              status.conditions.filter(c, c.type == "Ready").size() > 0
-                ? status.conditions.filter(c, c.type == "Ready")[0].last_transition_time
+              status.?conditions.orValue([]).filter(c, c.type == "Ready").size() > 0 &&
+              has(status.?conditions.orValue([]).filter(c, c.type == "Ready")[0].last_transition_time)
+                ? status.?conditions.orValue([]).filter(c, c.type == "Ready")[0].last_transition_time
                 : now()
             )).getSeconds() >= 300

As per coding guidelines, **: -Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helm/adapter3/adapter-task-config.yaml` around lines 47 - 58, The CEL
expressions for nodepoolNotReady and nodepoolReadyTTL assume status.conditions
and last_transition_time exist; make them defensive by first checking status and
conditions are present and non-empty and that last_transition_time is present
before indexing or calling timestamp. Update the nodepoolNotReady expression to
use a guard like status != null && status.conditions != null &&
status.conditions.size() > 0 ? status.conditions.filter(c, c.type ==
"Ready")[0].status != "True" : true, and update nodepoolReadyTTL to compute the
reference time only when status/conditions and last_transition_time exist
(fallback to now() otherwise), e.g. check status != null && status.conditions !=
null && status.conditions.size() > 0 && status.conditions.filter(c, c.type ==
"Ready")[0].last_transition_time != null ? timestamp(status.conditions.filter(c,
c.type == "Ready")[0].last_transition_time) : timestamp(now()), then compare
getSeconds() >= 300; apply these guards in the nodepoolNotReady and
nodepoolReadyTTL expressions to avoid runtime errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant