Skip to content

HYPERFLEET-751 - doc: align adapter docs with operational docs standard#76

Open
xueli181114 wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
xueli181114:HYPERFLEET-751
Open

HYPERFLEET-751 - doc: align adapter docs with operational docs standard#76
xueli181114 wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
xueli181114:HYPERFLEET-751

Conversation

@xueli181114
Copy link
Contributor

@xueli181114 xueli181114 commented Mar 12, 2026

Summary

  • Rename observability.mdmetrics.md (metric definitions are the canonical content)
  • Rename metrics.mdalerts.md (file contained alert rules, not metric definitions)
  • Add audience headers to all doc files (metrics, alerts, runbook, configuration, authoring guide)
  • Consolidate cross-references in alerts.md (single paragraph instead of scattered sections)
  • Remove duplicate subtitle in adapter-authoring-guide.md

Test plan

  • Verify all internal doc cross-references resolve correctly
  • Verify doc file names match the operational docs standard

Relates to: HYPERFLEET-751
Depends on: openshift-hyperfleet/architecture PR (same ticket) for the standard definition

Summary by CodeRabbit

  • Documentation
    • Removed observability doc and replaced it with a metrics reference and a new alerts page containing recommended alert rules and PromQL examples
    • Reorganized metrics into a table-driven, metric-first reference and consolidated broker metrics
    • Added audience annotations to Configuration, Runbook, and Adapter Authoring Guide; updated guide links to metrics.md
    • Updated README to surface the new documentation sections and links

@openshift-ci
Copy link

openshift-ci bot commented Mar 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yasun1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Mar 12, 2026

Walkthrough

This pull request restructures observability documentation for the HyperFleet adapter: removes docs/observability.md; adds docs/alerts.md with Prometheus alert rules; rewrites docs/metrics.md into a metric-first, table-driven reference including broker metrics and histogram guidance; and adds audience annotations to docs/configuration.md, docs/runbook.md, and docs/adapter-authoring-guide.md. README.md is updated to surface the new documentation links and section structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: aligning adapter documentation with an operational documentation standard through reorganization and standardization of doc files and headers.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/adapter-authoring-guide.md (1)

1158-1158: ⚠️ Potential issue | 🟡 Minor

Update the renamed internal docs link.

Line 1158 still points to observability.md, but this PR replaces that doc with the new metrics/alerts split. That leaves the deployment checklist with a broken cross-reference. Point this at metrics.md (or alerts.md if alert rules are the intended target).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/adapter-authoring-guide.md` at line 1158, Update the internal
cross-reference in docs/adapter-authoring-guide.md: replace the link target
"observability.md" in the sentence that references broker metrics on the
/metrics endpoint (port 9090) with "metrics.md" so the deployment checklist
points to the new metrics doc (or swap to "alerts.md" if the intent was to
reference alert rules).
🧹 Nitpick comments (1)
docs/alerts.md (1)

51-60: The missing-counter branch will be filtered out by label matching in the and on() operator.

absent(hyperfleet_adapter_events_processed_total) returns a series with no labels (empty label set {}), so the subsequent and on(component, version) hyperfleet_adapter_up == 1 has no matching labels to join on. The result is an empty vector, meaning the "counter has never been incremented" case will never trigger.

Proposed PromQL fix
 alert: HyperFleetAdapterNoEventsProcessed
 expr: |
-  (
-    sum by (component, version) (rate(hyperfleet_adapter_events_processed_total[15m])) == 0
-    or
-    absent(hyperfleet_adapter_events_processed_total) == 1
-  )
-  and on(component, version)
-  hyperfleet_adapter_up == 1
+  (
+    sum by (component, version) (rate(hyperfleet_adapter_events_processed_total[15m])) == 0
+    and on(component, version) hyperfleet_adapter_up == 1
+  )
+  or
+  (
+    hyperfleet_adapter_up == 1
+    unless on(component, version) hyperfleet_adapter_events_processed_total
+  )
 for: 5m
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/alerts.md` around lines 51 - 60, The absent(...) branch is dropped by
the label-restricted "and on(component, version)" join; replace the absent check
with a label-aware test such as count by(component,
version)(hyperfleet_adapter_events_processed_total) == 0 so the "never
incremented" case produces series with component and version labels and can join
with hyperfleet_adapter_up; update the alert expression to use count
by(component, version)(hyperfleet_adapter_events_processed_total) == 0 in place
of absent(hyperfleet_adapter_events_processed_total) == 1 while keeping the
existing sum by(component, version)(rate(...)) == 0 and the and on(component,
version) hyperfleet_adapter_up == 1 parts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/metrics.md`:
- Around line 67-72: The example uses histogram_quantile(0.95,
rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])) which
computes p95 per replica; change it to aggregate the _bucket series across
replicas first (use sum by (component, version, le) over rate(..._bucket[5m]))
and then pass that aggregated series into histogram_quantile so the p95 is a
cluster-wide value; update the example to use histogram_quantile(0.95, sum by
(component, version, le)
(rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])))
referencing histogram_quantile, rate, and
hyperfleet_adapter_event_processing_duration_seconds_bucket.

---

Outside diff comments:
In `@docs/adapter-authoring-guide.md`:
- Line 1158: Update the internal cross-reference in
docs/adapter-authoring-guide.md: replace the link target "observability.md" in
the sentence that references broker metrics on the /metrics endpoint (port 9090)
with "metrics.md" so the deployment checklist points to the new metrics doc (or
swap to "alerts.md" if the intent was to reference alert rules).

---

Nitpick comments:
In `@docs/alerts.md`:
- Around line 51-60: The absent(...) branch is dropped by the label-restricted
"and on(component, version)" join; replace the absent check with a label-aware
test such as count by(component,
version)(hyperfleet_adapter_events_processed_total) == 0 so the "never
incremented" case produces series with component and version labels and can join
with hyperfleet_adapter_up; update the alert expression to use count
by(component, version)(hyperfleet_adapter_events_processed_total) == 0 in place
of absent(hyperfleet_adapter_events_processed_total) == 1 while keeping the
existing sum by(component, version)(rate(...)) == 0 and the and on(component,
version) hyperfleet_adapter_up == 1 parts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b9e488df-c8c9-4d71-978a-a3dca71caf6a

📥 Commits

Reviewing files that changed from the base of the PR and between 012513e and 0af74cb.

📒 Files selected for processing (6)
  • docs/adapter-authoring-guide.md
  • docs/alerts.md
  • docs/configuration.md
  • docs/metrics.md
  • docs/observability.md
  • docs/runbook.md
💤 Files with no reviewable changes (1)
  • docs/observability.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docs/metrics.md (1)

7-7: Consider breaking up the ServiceMonitor description for readability.

The sentence explaining ServiceMonitor behavior is quite long (85+ words) and covers multiple concepts: enabled-by-default status, scrape configuration, CRD availability check, and values.yaml reference. Breaking it into 2-3 shorter sentences would improve scannability for operators.

♻️ Suggested refactor
-The Helm chart includes a **ServiceMonitor** template for automatic discovery by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). It is enabled by default (`serviceMonitor.enabled: true`) and scrapes the `/metrics` endpoint every 30s with `honorLabels: true` to preserve the adapter's `component` and `version` labels. The template is only rendered when the Prometheus Operator CRDs (`monitoring.coreos.com/v1/ServiceMonitor`) are available on the cluster; otherwise it is silently skipped. See the Helm `values.yaml` for configuration options (interval, scrapeTimeout, labels, namespaceSelector).
+The Helm chart includes a **ServiceMonitor** template for automatic discovery by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). It is enabled by default (`serviceMonitor.enabled: true`) and scrapes the `/metrics` endpoint every 30s with `honorLabels: true` to preserve the adapter's `component` and `version` labels.
+
+The template is only rendered when the Prometheus Operator CRDs (`monitoring.coreos.com/v1/ServiceMonitor`) are available on the cluster; otherwise it is silently skipped. See the Helm `values.yaml` for configuration options (interval, scrapeTimeout, labels, namespaceSelector).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/metrics.md` at line 7, The long single sentence describing the Helm
chart's ServiceMonitor should be split into 2–3 shorter sentences to improve
readability: reference the ServiceMonitor template and default flag
(serviceMonitor.enabled: true) in one sentence, describe the scrape behaviour
(/metrics, interval 30s, honorLabels: true) in a second sentence, and note the
CRD availability check (monitoring.coreos.com/v1/ServiceMonitor) and values.yaml
configuration options (interval, scrapeTimeout, labels, namespaceSelector) in a
third. Ensure the terms ServiceMonitor, serviceMonitor.enabled, /metrics,
honorLabels, and monitoring.coreos.com/v1/ServiceMonitor are preserved verbatim
so operators can locate related config and docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/metrics.md`:
- Line 7: The long single sentence describing the Helm chart's ServiceMonitor
should be split into 2–3 shorter sentences to improve readability: reference the
ServiceMonitor template and default flag (serviceMonitor.enabled: true) in one
sentence, describe the scrape behaviour (/metrics, interval 30s, honorLabels:
true) in a second sentence, and note the CRD availability check
(monitoring.coreos.com/v1/ServiceMonitor) and values.yaml configuration options
(interval, scrapeTimeout, labels, namespaceSelector) in a third. Ensure the
terms ServiceMonitor, serviceMonitor.enabled, /metrics, honorLabels, and
monitoring.coreos.com/v1/ServiceMonitor are preserved verbatim so operators can
locate related config and docs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9832ed23-0258-403f-a2d0-12026321d964

📥 Commits

Reviewing files that changed from the base of the PR and between bcebaae and c557049.

📒 Files selected for processing (7)
  • README.md
  • docs/adapter-authoring-guide.md
  • docs/alerts.md
  • docs/configuration.md
  • docs/metrics.md
  • docs/observability.md
  • docs/runbook.md
💤 Files with no reviewable changes (1)
  • docs/observability.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/adapter-authoring-guide.md
  • README.md
  • docs/alerts.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants