HYPERFLEET-751 - doc: align adapter docs with operational docs standard by xueli181114 · Pull Request #76 · openshift-hyperfleet/hyperfleet-adapter

xueli181114 · 2026-03-12T03:07:06Z

Summary

Rename observability.md → metrics.md (metric definitions are the canonical content)
Rename metrics.md → alerts.md (file contained alert rules, not metric definitions)
Add audience headers to all doc files (metrics, alerts, runbook, configuration, authoring guide)
Consolidate cross-references in alerts.md (single paragraph instead of scattered sections)
Remove duplicate subtitle in adapter-authoring-guide.md

Test plan

Verify all internal doc cross-references resolve correctly
Verify doc file names match the operational docs standard

Relates to: HYPERFLEET-751
Depends on: openshift-hyperfleet/architecture PR (same ticket) for the standard definition

Summary by CodeRabbit

Documentation
- Removed observability doc and replaced it with a metrics reference and a new alerts page containing recommended alert rules and PromQL examples
- Reorganized metrics into a table-driven, metric-first reference and consolidated broker metrics
- Added audience annotations to Configuration, Runbook, and Adapter Authoring Guide; updated guide links to metrics.md
- Updated README to surface the new documentation sections and links

openshift-ci · 2026-03-12T03:07:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yasun1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-03-12T03:07:24Z

Walkthrough

This pull request restructures observability documentation for the HyperFleet adapter: removes docs/observability.md; adds docs/alerts.md with Prometheus alert rules; rewrites docs/metrics.md into a metric-first, table-driven reference including broker metrics and histogram guidance; and adds audience annotations to docs/configuration.md, docs/runbook.md, and docs/adapter-authoring-guide.md. README.md is updated to surface the new documentation links and section structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: aligning adapter documentation with an operational documentation standard through reorganization and standardization of doc files and headers.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/adapter-authoring-guide.md (1)
1158-1158: ⚠️ Potential issue | 🟡 Minor

Update the renamed internal docs link.

Line 1158 still points to observability.md, but this PR replaces that doc with the new metrics/alerts split. That leaves the deployment checklist with a broken cross-reference. Point this at metrics.md (or alerts.md if alert rules are the intended target).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/adapter-authoring-guide.md` at line 1158, Update the internal
cross-reference in docs/adapter-authoring-guide.md: replace the link target
"observability.md" in the sentence that references broker metrics on the
/metrics endpoint (port 9090) with "metrics.md" so the deployment checklist
points to the new metrics doc (or swap to "alerts.md" if the intent was to
reference alert rules).

🧹 Nitpick comments (1)

docs/alerts.md (1)

51-60: The missing-counter branch will be filtered out by label matching in the and on() operator.

absent(hyperfleet_adapter_events_processed_total) returns a series with no labels (empty label set {}), so the subsequent and on(component, version) hyperfleet_adapter_up == 1 has no matching labels to join on. The result is an empty vector, meaning the "counter has never been incremented" case will never trigger.

Proposed PromQL fix

 alert: HyperFleetAdapterNoEventsProcessed
 expr: |
-  (
-    sum by (component, version) (rate(hyperfleet_adapter_events_processed_total[15m])) == 0
-    or
-    absent(hyperfleet_adapter_events_processed_total) == 1
-  )
-  and on(component, version)
-  hyperfleet_adapter_up == 1
+  (
+    sum by (component, version) (rate(hyperfleet_adapter_events_processed_total[15m])) == 0
+    and on(component, version) hyperfleet_adapter_up == 1
+  )
+  or
+  (
+    hyperfleet_adapter_up == 1
+    unless on(component, version) hyperfleet_adapter_events_processed_total
+  )
 for: 5m

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/alerts.md` around lines 51 - 60, The absent(...) branch is dropped by
the label-restricted "and on(component, version)" join; replace the absent check
with a label-aware test such as count by(component,
version)(hyperfleet_adapter_events_processed_total) == 0 so the "never
incremented" case produces series with component and version labels and can join
with hyperfleet_adapter_up; update the alert expression to use count
by(component, version)(hyperfleet_adapter_events_processed_total) == 0 in place
of absent(hyperfleet_adapter_events_processed_total) == 1 while keeping the
existing sum by(component, version)(rate(...)) == 0 and the and on(component,
version) hyperfleet_adapter_up == 1 parts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/metrics.md`:
- Around line 67-72: The example uses histogram_quantile(0.95,
rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])) which
computes p95 per replica; change it to aggregate the _bucket series across
replicas first (use sum by (component, version, le) over rate(..._bucket[5m]))
and then pass that aggregated series into histogram_quantile so the p95 is a
cluster-wide value; update the example to use histogram_quantile(0.95, sum by
(component, version, le)
(rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])))
referencing histogram_quantile, rate, and
hyperfleet_adapter_event_processing_duration_seconds_bucket.

---

Outside diff comments:
In `@docs/adapter-authoring-guide.md`:
- Line 1158: Update the internal cross-reference in
docs/adapter-authoring-guide.md: replace the link target "observability.md" in
the sentence that references broker metrics on the /metrics endpoint (port 9090)
with "metrics.md" so the deployment checklist points to the new metrics doc (or
swap to "alerts.md" if the intent was to reference alert rules).

---

Nitpick comments:
In `@docs/alerts.md`:
- Around line 51-60: The absent(...) branch is dropped by the label-restricted
"and on(component, version)" join; replace the absent check with a label-aware
test such as count by(component,
version)(hyperfleet_adapter_events_processed_total) == 0 so the "never
incremented" case produces series with component and version labels and can join
with hyperfleet_adapter_up; update the alert expression to use count
by(component, version)(hyperfleet_adapter_events_processed_total) == 0 in place
of absent(hyperfleet_adapter_events_processed_total) == 1 while keeping the
existing sum by(component, version)(rate(...)) == 0 and the and on(component,
version) hyperfleet_adapter_up == 1 parts.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b9e488df-c8c9-4d71-978a-a3dca71caf6a

📥 Commits

Reviewing files that changed from the base of the PR and between 012513e and 0af74cb.

📒 Files selected for processing (6)

docs/adapter-authoring-guide.md
docs/alerts.md
docs/configuration.md
docs/metrics.md
docs/observability.md
docs/runbook.md

💤 Files with no reviewable changes (1)

docs/observability.md

docs/metrics.md

coderabbitai

🧹 Nitpick comments (1)

docs/metrics.md (1)

7-7: Consider breaking up the ServiceMonitor description for readability.

The sentence explaining ServiceMonitor behavior is quite long (85+ words) and covers multiple concepts: enabled-by-default status, scrape configuration, CRD availability check, and values.yaml reference. Breaking it into 2-3 shorter sentences would improve scannability for operators.

♻️ Suggested refactor

-The Helm chart includes a **ServiceMonitor** template for automatic discovery by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). It is enabled by default (`serviceMonitor.enabled: true`) and scrapes the `/metrics` endpoint every 30s with `honorLabels: true` to preserve the adapter's `component` and `version` labels. The template is only rendered when the Prometheus Operator CRDs (`monitoring.coreos.com/v1/ServiceMonitor`) are available on the cluster; otherwise it is silently skipped. See the Helm `values.yaml` for configuration options (interval, scrapeTimeout, labels, namespaceSelector).
+The Helm chart includes a **ServiceMonitor** template for automatic discovery by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). It is enabled by default (`serviceMonitor.enabled: true`) and scrapes the `/metrics` endpoint every 30s with `honorLabels: true` to preserve the adapter's `component` and `version` labels.
+
+The template is only rendered when the Prometheus Operator CRDs (`monitoring.coreos.com/v1/ServiceMonitor`) are available on the cluster; otherwise it is silently skipped. See the Helm `values.yaml` for configuration options (interval, scrapeTimeout, labels, namespaceSelector).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/metrics.md` at line 7, The long single sentence describing the Helm
chart's ServiceMonitor should be split into 2–3 shorter sentences to improve
readability: reference the ServiceMonitor template and default flag
(serviceMonitor.enabled: true) in one sentence, describe the scrape behaviour
(/metrics, interval 30s, honorLabels: true) in a second sentence, and note the
CRD availability check (monitoring.coreos.com/v1/ServiceMonitor) and values.yaml
configuration options (interval, scrapeTimeout, labels, namespaceSelector) in a
third. Ensure the terms ServiceMonitor, serviceMonitor.enabled, /metrics,
honorLabels, and monitoring.coreos.com/v1/ServiceMonitor are preserved verbatim
so operators can locate related config and docs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/metrics.md`:
- Line 7: The long single sentence describing the Helm chart's ServiceMonitor
should be split into 2–3 shorter sentences to improve readability: reference the
ServiceMonitor template and default flag (serviceMonitor.enabled: true) in one
sentence, describe the scrape behaviour (/metrics, interval 30s, honorLabels:
true) in a second sentence, and note the CRD availability check
(monitoring.coreos.com/v1/ServiceMonitor) and values.yaml configuration options
(interval, scrapeTimeout, labels, namespaceSelector) in a third. Ensure the
terms ServiceMonitor, serviceMonitor.enabled, /metrics, honorLabels, and
monitoring.coreos.com/v1/ServiceMonitor are preserved verbatim so operators can
locate related config and docs.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9832ed23-0258-403f-a2d0-12026321d964

📥 Commits

Reviewing files that changed from the base of the PR and between bcebaae and c557049.

📒 Files selected for processing (7)

README.md
docs/adapter-authoring-guide.md
docs/alerts.md
docs/configuration.md
docs/metrics.md
docs/observability.md
docs/runbook.md

💤 Files with no reviewable changes (1)

docs/observability.md

🚧 Files skipped from review as they are similar to previous changes (3)

docs/adapter-authoring-guide.md
README.md
docs/alerts.md

openshift-ci bot requested review from ciaranRoche and yingzhanredhat March 12, 2026 03:07

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

docs/metrics.md Show resolved Hide resolved

xueli181114 closed this Mar 12, 2026

xueli181114 force-pushed the HYPERFLEET-751 branch from 0af74cb to f1ff182 Compare March 12, 2026 03:26

xueli181114 reopened this Mar 12, 2026

xueli181114 force-pushed the HYPERFLEET-751 branch from f1ff182 to bcebaae Compare March 12, 2026 12:24

openshift-merge-robot added the needs-rebase label Mar 12, 2026

HYPERFLEET-751 - doc: align adapter docs with operational docs standard

c557049

xueli181114 force-pushed the HYPERFLEET-751 branch from bcebaae to c557049 Compare March 12, 2026 12:32

openshift-merge-robot removed the needs-rebase label Mar 12, 2026

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HYPERFLEET-751 - doc: align adapter docs with operational docs standard#76

HYPERFLEET-751 - doc: align adapter docs with operational docs standard#76
xueli181114 wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
xueli181114:HYPERFLEET-751

xueli181114 commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

openshift-ci bot commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xueli181114 commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xueli181114 commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 12, 2026 •

edited

Loading