feat(common-services): upgrade observability/search stack to current targets + hands-free migration#62
Draft
gnanirahulnutakki wants to merge 2 commits into
Draft
Conversation
…targets
Bump the 7 observability/search subcharts to researched + live-tested targets
and add the Elasticsearch/Kibana 8.x security-off configuration.
Chart.yaml (chart 2.0.2 -> 2.1.0):
- fluent-bit 0.48.0 -> 0.49.1 (Fluent Bit 3.2.1 -> 4.0.3)
- grafana 8.10.0 -> 11.6.1 (app 11.5.1 -> 12.4.3); repo moved
grafana.github.io -> grafana-community
- prometheus 20.2.1 -> 25.30.1 (v2.43.0 -> v2.55.1, final 2.x LTS)
- elasticsearch 7.17.3 -> 8.5.1 (frozen chart; app via imageTag 8.19.16)
- kibana 7.17.3 -> 8.5.1 (frozen chart; app via imageTag 8.19.16)
- opensearch 2.16.1 -> 2.37.0 (app 2.11.0 -> 2.19.5)
- opensearch-dashboards 2.14.0 -> 2.33.0 (app 2.11.0 -> 2.19.5, matches engine)
values.yaml:
- elasticsearch: imageTag 8.19.16; security OFF (createCert/protocol/secret.enabled
/xpack) + ELASTIC_PASSWORD readiness-probe shim required by the frozen 8.5.1 chart
- kibana: imageTag 8.19.16; elasticsearchHosts http; documents the two secrets the
frozen 8.5.1 chart hard-requires (elasticsearch-master-certs, kibana-es-token)
- opensearch: prometheus-exporter plugin 2.11.0.0 -> 2.19.5.0 (MUST match the engine
version exactly or it CrashLoops) -- found in live testing
- grafana Elasticsearch datasource esVersion 7.17.3 -> 8.19.16
All seven validated by a live deploy+upgrade test on qa-self-managed. Upgrade gotchas
(documented per-dependency in Chart.yaml):
- prometheus: helm upgrade fails on the immutable StatefulSet selector -> delete the
prometheus-server STS (PVC retained) then re-upgrade
- opensearch-dashboards: Deployment selector is immutable -> delete the Deployment
(stateless) then re-upgrade
- opensearch engine upgrades in place (data retained); the exporter plugin must be
bumped in lockstep with the app version
Automate the manual steps the stack upgrade otherwise requires, so a plain
`helm upgrade` or an ArgoCD sync from an older common-services release to this
one completes without operator intervention.
New templates/upgrade-migration/ (Job + script ConfigMap + namespaced RBAC),
modeled on the existing crds-installer: runs as BOTH a Helm pre-install/
pre-upgrade hook AND an ArgoCD PreSync hook (weight -5 RBAC/CM, 0 Job). It:
1. Creates the elasticsearch-master-certs + kibana-es-token placeholder
secrets the frozen kibana 8.5.1 chart hard-requires when ES security is
OFF (only if absent).
2. Deletes the prometheus-server StatefulSet and opensearch-dashboards
Deployment ONLY while they still carry the legacy selector labels
(selector is immutable and changed in the new subcharts). Detects
"already migrated" by the presence of the app.kubernetes.io/name selector
label. StatefulSet PVCs are retained, so TSDB/index data survive.
Fully IDEMPOTENT: on a cluster already on the new versions every check is a
no-op and the Job exits 0 silently, so it is safe on every consecutive sync.
values.yaml: new `upgradeMigration` block (enabled: true by default;
elasticsearchSecurityOffSecrets toggle; image/resources).
Validated on qa-self-managed: (a) fabricated legacy-selector StatefulSet is
deleted while a new-selector one is skipped; (b) the real Job run against the
already-migrated cluster skips all four items and completes in ~16s.
05b293a to
c95f2d9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Upgrades the seven observability/search subcharts in
common-servicesto current, live-tested targets, adds the Elasticsearch/Kibana 8.x security-off configuration, and ships a hands-free upgrade migration Job so a plainhelm upgradeor an ArgoCD sync from an older release completes without manual intervention.Chart
2.0.2 → 2.1.0.Version bumps (
Chart.yaml)0.48.0 → 0.49.18.10.0 → 11.6.1grafana.github.io→grafana-community+ AngularJS removed in 1220.2.1 → 25.30.17.17.3 → 8.5.17.17.3 → 8.5.12.16.1 → 2.37.02.14.0 → 2.33.0values.yamlimageTag 8.19.16; security OFF (createCert: false,protocol: http,secret.enabled: false,xpack.security.enabled: false) + anELASTIC_PASSWORDshim the frozen 8.5.1 chart's readiness probe requires even with security off.imageTag 8.19.16;elasticsearchHosts: http://…; documents the two secrets the frozen 8.5.1 chart hard-requires.2.11.0.0 → 2.19.5.0— must match the engine version exactly or it CrashLoops.esVersion 7.17.3 → 8.19.16.Hands-free upgrade migration (
templates/upgrade-migration/)New Job + script ConfigMap + namespaced RBAC, modeled on the existing
crds-installer. Runs as both a Helmpre-install,pre-upgradehook and an ArgoCDPreSynchook. Idempotently:elasticsearch-master-certs+kibana-es-tokenplaceholder secrets the frozen Kibana 8.5.1 chart needs when ES security is OFF (only if absent).prometheus-serverStatefulSet andopensearch-dashboardsDeployment only while they still carry the legacy selector (the immutable field that changed). StatefulSet PVCs are retained — data survives.On a cluster already on the new versions, every check is a silent no-op (exit 0) — safe on every consecutive ArgoCD sync. Gated by
upgradeMigration.enabled(defaulttrue).Validation (live, cluster
qa-self-managed)All seven upgraded on a real release and confirmed healthy. Specifically proven:
7.17.25 → 8.19.16in place, security off, cluster green; Kibana 8.x via the shim secrets.2.11.0 → 2.19.5in place — a marker doc written on 2.11.0 survived (no reindex); plugin bumped to 2.19.5.0.→ v2.55.1, PVC retained across the STS recreate.Notes / follow-ups
app.kubernetes.io/nameselector transition (correct for this upgrade). A future subchart bump that changes a different immutable field would need the check extended (noted in code comments).upgradeMigration.elasticsearchSecurityOffSecrets: falseand configure ES security if you want it on.apk add kubectl opensslin the Job needs cluster egress (same assumption as the existing crds-installer).