HYPERFLEET-751 - doc: align adapter docs with operational docs standard#76
HYPERFLEET-751 - doc: align adapter docs with operational docs standard#76xueli181114 wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughThis pull request restructures observability documentation for the HyperFleet adapter: removes docs/observability.md; adds docs/alerts.md with Prometheus alert rules; rewrites docs/metrics.md into a metric-first, table-driven reference including broker metrics and histogram guidance; and adds audience annotations to docs/configuration.md, docs/runbook.md, and docs/adapter-authoring-guide.md. README.md is updated to surface the new documentation links and section structure. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/adapter-authoring-guide.md (1)
1158-1158:⚠️ Potential issue | 🟡 MinorUpdate the renamed internal docs link.
Line 1158 still points to
observability.md, but this PR replaces that doc with the new metrics/alerts split. That leaves the deployment checklist with a broken cross-reference. Point this atmetrics.md(oralerts.mdif alert rules are the intended target).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/adapter-authoring-guide.md` at line 1158, Update the internal cross-reference in docs/adapter-authoring-guide.md: replace the link target "observability.md" in the sentence that references broker metrics on the /metrics endpoint (port 9090) with "metrics.md" so the deployment checklist points to the new metrics doc (or swap to "alerts.md" if the intent was to reference alert rules).
🧹 Nitpick comments (1)
docs/alerts.md (1)
51-60: The missing-counter branch will be filtered out by label matching in theand on()operator.
absent(hyperfleet_adapter_events_processed_total)returns a series with no labels (empty label set{}), so the subsequentand on(component, version) hyperfleet_adapter_up == 1has no matching labels to join on. The result is an empty vector, meaning the "counter has never been incremented" case will never trigger.Proposed PromQL fix
alert: HyperFleetAdapterNoEventsProcessed expr: | - ( - sum by (component, version) (rate(hyperfleet_adapter_events_processed_total[15m])) == 0 - or - absent(hyperfleet_adapter_events_processed_total) == 1 - ) - and on(component, version) - hyperfleet_adapter_up == 1 + ( + sum by (component, version) (rate(hyperfleet_adapter_events_processed_total[15m])) == 0 + and on(component, version) hyperfleet_adapter_up == 1 + ) + or + ( + hyperfleet_adapter_up == 1 + unless on(component, version) hyperfleet_adapter_events_processed_total + ) for: 5m🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/alerts.md` around lines 51 - 60, The absent(...) branch is dropped by the label-restricted "and on(component, version)" join; replace the absent check with a label-aware test such as count by(component, version)(hyperfleet_adapter_events_processed_total) == 0 so the "never incremented" case produces series with component and version labels and can join with hyperfleet_adapter_up; update the alert expression to use count by(component, version)(hyperfleet_adapter_events_processed_total) == 0 in place of absent(hyperfleet_adapter_events_processed_total) == 1 while keeping the existing sum by(component, version)(rate(...)) == 0 and the and on(component, version) hyperfleet_adapter_up == 1 parts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/metrics.md`:
- Around line 67-72: The example uses histogram_quantile(0.95,
rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])) which
computes p95 per replica; change it to aggregate the _bucket series across
replicas first (use sum by (component, version, le) over rate(..._bucket[5m]))
and then pass that aggregated series into histogram_quantile so the p95 is a
cluster-wide value; update the example to use histogram_quantile(0.95, sum by
(component, version, le)
(rate(hyperfleet_adapter_event_processing_duration_seconds_bucket[5m])))
referencing histogram_quantile, rate, and
hyperfleet_adapter_event_processing_duration_seconds_bucket.
---
Outside diff comments:
In `@docs/adapter-authoring-guide.md`:
- Line 1158: Update the internal cross-reference in
docs/adapter-authoring-guide.md: replace the link target "observability.md" in
the sentence that references broker metrics on the /metrics endpoint (port 9090)
with "metrics.md" so the deployment checklist points to the new metrics doc (or
swap to "alerts.md" if the intent was to reference alert rules).
---
Nitpick comments:
In `@docs/alerts.md`:
- Around line 51-60: The absent(...) branch is dropped by the label-restricted
"and on(component, version)" join; replace the absent check with a label-aware
test such as count by(component,
version)(hyperfleet_adapter_events_processed_total) == 0 so the "never
incremented" case produces series with component and version labels and can join
with hyperfleet_adapter_up; update the alert expression to use count
by(component, version)(hyperfleet_adapter_events_processed_total) == 0 in place
of absent(hyperfleet_adapter_events_processed_total) == 1 while keeping the
existing sum by(component, version)(rate(...)) == 0 and the and on(component,
version) hyperfleet_adapter_up == 1 parts.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b9e488df-c8c9-4d71-978a-a3dca71caf6a
📒 Files selected for processing (6)
docs/adapter-authoring-guide.mddocs/alerts.mddocs/configuration.mddocs/metrics.mddocs/observability.mddocs/runbook.md
💤 Files with no reviewable changes (1)
- docs/observability.md
0af74cb to
f1ff182
Compare
f1ff182 to
bcebaae
Compare
bcebaae to
c557049
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/metrics.md (1)
7-7: Consider breaking up the ServiceMonitor description for readability.The sentence explaining ServiceMonitor behavior is quite long (85+ words) and covers multiple concepts: enabled-by-default status, scrape configuration, CRD availability check, and values.yaml reference. Breaking it into 2-3 shorter sentences would improve scannability for operators.
♻️ Suggested refactor
-The Helm chart includes a **ServiceMonitor** template for automatic discovery by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). It is enabled by default (`serviceMonitor.enabled: true`) and scrapes the `/metrics` endpoint every 30s with `honorLabels: true` to preserve the adapter's `component` and `version` labels. The template is only rendered when the Prometheus Operator CRDs (`monitoring.coreos.com/v1/ServiceMonitor`) are available on the cluster; otherwise it is silently skipped. See the Helm `values.yaml` for configuration options (interval, scrapeTimeout, labels, namespaceSelector). +The Helm chart includes a **ServiceMonitor** template for automatic discovery by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). It is enabled by default (`serviceMonitor.enabled: true`) and scrapes the `/metrics` endpoint every 30s with `honorLabels: true` to preserve the adapter's `component` and `version` labels. + +The template is only rendered when the Prometheus Operator CRDs (`monitoring.coreos.com/v1/ServiceMonitor`) are available on the cluster; otherwise it is silently skipped. See the Helm `values.yaml` for configuration options (interval, scrapeTimeout, labels, namespaceSelector).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/metrics.md` at line 7, The long single sentence describing the Helm chart's ServiceMonitor should be split into 2–3 shorter sentences to improve readability: reference the ServiceMonitor template and default flag (serviceMonitor.enabled: true) in one sentence, describe the scrape behaviour (/metrics, interval 30s, honorLabels: true) in a second sentence, and note the CRD availability check (monitoring.coreos.com/v1/ServiceMonitor) and values.yaml configuration options (interval, scrapeTimeout, labels, namespaceSelector) in a third. Ensure the terms ServiceMonitor, serviceMonitor.enabled, /metrics, honorLabels, and monitoring.coreos.com/v1/ServiceMonitor are preserved verbatim so operators can locate related config and docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/metrics.md`:
- Line 7: The long single sentence describing the Helm chart's ServiceMonitor
should be split into 2–3 shorter sentences to improve readability: reference the
ServiceMonitor template and default flag (serviceMonitor.enabled: true) in one
sentence, describe the scrape behaviour (/metrics, interval 30s, honorLabels:
true) in a second sentence, and note the CRD availability check
(monitoring.coreos.com/v1/ServiceMonitor) and values.yaml configuration options
(interval, scrapeTimeout, labels, namespaceSelector) in a third. Ensure the
terms ServiceMonitor, serviceMonitor.enabled, /metrics, honorLabels, and
monitoring.coreos.com/v1/ServiceMonitor are preserved verbatim so operators can
locate related config and docs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9832ed23-0258-403f-a2d0-12026321d964
📒 Files selected for processing (7)
README.mddocs/adapter-authoring-guide.mddocs/alerts.mddocs/configuration.mddocs/metrics.mddocs/observability.mddocs/runbook.md
💤 Files with no reviewable changes (1)
- docs/observability.md
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/adapter-authoring-guide.md
- README.md
- docs/alerts.md
Summary
observability.md→metrics.md(metric definitions are the canonical content)metrics.md→alerts.md(file contained alert rules, not metric definitions)Test plan
Relates to: HYPERFLEET-751
Depends on: openshift-hyperfleet/architecture PR (same ticket) for the standard definition
Summary by CodeRabbit