Skip to content

feat(common-services): upgrade observability/search stack to current targets + hands-free migration#62

Draft
gnanirahulnutakki wants to merge 2 commits into
masterfrom
feature/common-services-stack-upgrades
Draft

feat(common-services): upgrade observability/search stack to current targets + hands-free migration#62
gnanirahulnutakki wants to merge 2 commits into
masterfrom
feature/common-services-stack-upgrades

Conversation

@gnanirahulnutakki
Copy link
Copy Markdown
Member

@gnanirahulnutakki gnanirahulnutakki commented Jun 2, 2026

Summary

Upgrades the seven observability/search subcharts in common-services to current, live-tested targets, adds the Elasticsearch/Kibana 8.x security-off configuration, and ships a hands-free upgrade migration Job so a plain helm upgrade or an ArgoCD sync from an older release completes without manual intervention.

Chart 2.0.2 → 2.1.0.

Version bumps (Chart.yaml)

Subchart Chart App Turbulence
fluent-bit 0.48.0 → 0.49.1 3.2.1 → 4.0.3 low — drop-in
grafana 8.10.0 → 11.6.1 11.5.1 → 12.4.3 medium — repo moved grafana.github.iografana-community + AngularJS removed in 12
prometheus 20.2.1 → 25.30.1 v2.43.0 → v2.55.1 medium — immutable STS selector
elasticsearch 7.17.3 → 8.5.1 7.17.25 → 8.19.16 (imageTag) medium — frozen chart, security-on default
kibana 7.17.3 → 8.5.1 7.17.25 → 8.19.16 (imageTag) medium — locked to ES; frozen chart needs 2 secrets
opensearch 2.16.1 → 2.37.0 2.11.0 → 2.19.5 medium — exporter plugin must be bumped in lockstep
opensearch-dashboards 2.14.0 → 2.33.0 2.11.0 → 2.19.5 medium — immutable Deployment selector

Elastic charts are frozen at 8.5.1 (repo archived); app version is driven by imageTag. ES stays free under the same Basic tier (8.16+ also AGPLv3). OpenSearch stays on the 2.x / Lucene-9 line (no reindex); 2.19.x is the prerequisite for any future 3.x.

values.yaml

  • elasticsearch: imageTag 8.19.16; security OFF (createCert: false, protocol: http, secret.enabled: false, xpack.security.enabled: false) + an ELASTIC_PASSWORD shim the frozen 8.5.1 chart's readiness probe requires even with security off.
  • kibana: imageTag 8.19.16; elasticsearchHosts: http://…; documents the two secrets the frozen 8.5.1 chart hard-requires.
  • opensearch: prometheus-exporter plugin 2.11.0.0 → 2.19.5.0must match the engine version exactly or it CrashLoops.
  • grafana: ES datasource esVersion 7.17.3 → 8.19.16.

Hands-free upgrade migration (templates/upgrade-migration/)

New Job + script ConfigMap + namespaced RBAC, modeled on the existing crds-installer. Runs as both a Helm pre-install,pre-upgrade hook and an ArgoCD PreSync hook. Idempotently:

  1. Creates the elasticsearch-master-certs + kibana-es-token placeholder secrets the frozen Kibana 8.5.1 chart needs when ES security is OFF (only if absent).
  2. Deletes the prometheus-server StatefulSet and opensearch-dashboards Deployment only while they still carry the legacy selector (the immutable field that changed). StatefulSet PVCs are retained — data survives.

On a cluster already on the new versions, every check is a silent no-op (exit 0) — safe on every consecutive ArgoCD sync. Gated by upgradeMigration.enabled (default true).

Validation (live, cluster qa-self-managed)

All seven upgraded on a real release and confirmed healthy. Specifically proven:

  • ES 7.17.25 → 8.19.16 in place, security off, cluster green; Kibana 8.x via the shim secrets.
  • OpenSearch 2.11.0 → 2.19.5 in place — a marker doc written on 2.11.0 survived (no reindex); plugin bumped to 2.19.5.0.
  • Prometheus → v2.55.1, PVC retained across the STS recreate.
  • Migration Job: fabricated legacy-selector workload deleted, new-selector workload skipped; real run against the already-migrated cluster skipped all four items and completed in ~16s.

Notes / follow-ups

  • The migration's idempotency keys on the legacy → app.kubernetes.io/name selector transition (correct for this upgrade). A future subchart bump that changes a different immutable field would need the check extended (noted in code comments).
  • Defaults ship ES/Kibana with security disabled (preserving the prior 7.17 posture). Set upgradeMigration.elasticsearchSecurityOffSecrets: false and configure ES security if you want it on.
  • apk add kubectl openssl in the Job needs cluster egress (same assumption as the existing crds-installer).

…targets

Bump the 7 observability/search subcharts to researched + live-tested targets
and add the Elasticsearch/Kibana 8.x security-off configuration.

Chart.yaml (chart 2.0.2 -> 2.1.0):
- fluent-bit            0.48.0 -> 0.49.1   (Fluent Bit 3.2.1 -> 4.0.3)
- grafana               8.10.0 -> 11.6.1   (app 11.5.1 -> 12.4.3); repo moved
                        grafana.github.io -> grafana-community
- prometheus            20.2.1 -> 25.30.1  (v2.43.0 -> v2.55.1, final 2.x LTS)
- elasticsearch         7.17.3 -> 8.5.1    (frozen chart; app via imageTag 8.19.16)
- kibana                7.17.3 -> 8.5.1    (frozen chart; app via imageTag 8.19.16)
- opensearch            2.16.1 -> 2.37.0   (app 2.11.0 -> 2.19.5)
- opensearch-dashboards 2.14.0 -> 2.33.0   (app 2.11.0 -> 2.19.5, matches engine)

values.yaml:
- elasticsearch: imageTag 8.19.16; security OFF (createCert/protocol/secret.enabled
  /xpack) + ELASTIC_PASSWORD readiness-probe shim required by the frozen 8.5.1 chart
- kibana: imageTag 8.19.16; elasticsearchHosts http; documents the two secrets the
  frozen 8.5.1 chart hard-requires (elasticsearch-master-certs, kibana-es-token)
- opensearch: prometheus-exporter plugin 2.11.0.0 -> 2.19.5.0 (MUST match the engine
  version exactly or it CrashLoops) -- found in live testing
- grafana Elasticsearch datasource esVersion 7.17.3 -> 8.19.16

All seven validated by a live deploy+upgrade test on qa-self-managed. Upgrade gotchas
(documented per-dependency in Chart.yaml):
- prometheus: helm upgrade fails on the immutable StatefulSet selector -> delete the
  prometheus-server STS (PVC retained) then re-upgrade
- opensearch-dashboards: Deployment selector is immutable -> delete the Deployment
  (stateless) then re-upgrade
- opensearch engine upgrades in place (data retained); the exporter plugin must be
  bumped in lockstep with the app version
Automate the manual steps the stack upgrade otherwise requires, so a plain
`helm upgrade` or an ArgoCD sync from an older common-services release to this
one completes without operator intervention.

New templates/upgrade-migration/ (Job + script ConfigMap + namespaced RBAC),
modeled on the existing crds-installer: runs as BOTH a Helm pre-install/
pre-upgrade hook AND an ArgoCD PreSync hook (weight -5 RBAC/CM, 0 Job). It:
  1. Creates the elasticsearch-master-certs + kibana-es-token placeholder
     secrets the frozen kibana 8.5.1 chart hard-requires when ES security is
     OFF (only if absent).
  2. Deletes the prometheus-server StatefulSet and opensearch-dashboards
     Deployment ONLY while they still carry the legacy selector labels
     (selector is immutable and changed in the new subcharts). Detects
     "already migrated" by the presence of the app.kubernetes.io/name selector
     label. StatefulSet PVCs are retained, so TSDB/index data survive.

Fully IDEMPOTENT: on a cluster already on the new versions every check is a
no-op and the Job exits 0 silently, so it is safe on every consecutive sync.

values.yaml: new `upgradeMigration` block (enabled: true by default;
elasticsearchSecurityOffSecrets toggle; image/resources).

Validated on qa-self-managed: (a) fabricated legacy-selector StatefulSet is
deleted while a new-selector one is skipped; (b) the real Job run against the
already-migrated cluster skips all four items and completes in ~16s.
@gnanirahulnutakki gnanirahulnutakki self-assigned this Jun 2, 2026
@gnanirahulnutakki gnanirahulnutakki force-pushed the feature/common-services-stack-upgrades branch from 05b293a to c95f2d9 Compare June 2, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant