Releases: MerlionOS/tsdb-operator
v1.0.0
1.0.0 — 2026-04-14
First stable release. API surface for observability.merlionos.org/v1
is now covered by semver: breaking changes require a major bump. Plan
details in docs/V1-PREP.md.
Breaking
Only one breaking change since v0.x, and it already landed in v0.11.0:
PrometheusCluster.spec.additionalScrapeConfigsis a struct with
mutually-exclusiveinline/secretRef, not a bare string. See
the v0.11.0 migration diff.
Users on v0.11.x upgrading to v1.0.0 need no CR edits.
Users on v0.10.x or earlier must convert the field per the v0.11.0
diff before upgrading.
Added
- Print columns:
kubectl get prometheusclustershows Phase / Ready /
Age;kubectl get prometheusclustersetshows Members / Age. - CRD-level validation: MinLength on
remoteWrite[].urland
backup.schedule; Prometheus-duration pattern
(^[0-9]+(ms|s|m|h|d|w|y)$) onretention; Enum on
status.phase. Complements the admission webhook. +kubebuilder:storageversionexplicit on both CRDs.
Changed
- Default
spec.imagebumped toprom/prometheus:v2.55.1
(Prometheus 2.x LTS at release time). - Default
spec.thanos.imagebumped toquay.io/thanos/thanos:v0.37.2.
Stability guarantees
See docs/V1-PREP.md for the full field-by-field
v1 status table. In summary:
- CRD fields are under semver.
- Internal packages (
internal/...) are not importable; layout
may change. - REST API JSON shape tracks the CRD; additive changes are non-breaking.
- Helm chart values are additive-compatible; new opt-in keys don't
require a major. - Audit log table schema is operator-owned; no external reads.
Deprecation policy (going forward)
Fields deprecate for one minor version before removal, marked with
// Deprecated: and +kubebuilder:deprecatedversion:warning.
Renames run the new and old field in parallel for one minor with the
new winning. Schema removals require a major bump.
Stats
- 12 shipped releases under v0.x before v1.0
- 10 of those 12 caught a real bug via kind verification
- Final kind pass before v1.0 tag: zero bugs
- Coverage:
internal/controller75%,internal/webhook76%,
pkg/api56%
v0.11.0
0.11.0 — 2026-04-14
Breaking schema change in preparation for v1.0 — see
docs/V1-PREP.md. The only intentional breaking
change planned for v1; landing it now lets v1 promote in place.
Changed (breaking)
PrometheusCluster.spec.additionalScrapeConfigsis no longer a bare
string. It is now a struct with mutually-exclusive sub-fields:inline(string) — same content as the old field, wrapped under
scrape_configs:in the operator-managed ConfigMap.secretRef(corev1.SecretKeySelector) — Secret whose value is a
complete Prometheus scrape config file (must already include
scrape_configs:). Mounted at/etc/prometheus/extra-secret/<key>.
Migration v0.10.x → v0.11.0
spec:
- additionalScrapeConfigs: |
- - job_name: my-app
- static_configs:
- - targets: [my-app:8080]
+ additionalScrapeConfigs:
+ inline: |
+ - job_name: my-app
+ static_configs:
+ - targets: [my-app:8080]The webhook will reject the old shape; CR YAML must be edited before
operators upgrade. Operator binary upgrade itself is safe — existing
ConfigMaps are regenerated on next reconcile.
Added
- Webhook validation: rejects both / neither set, missing
secretRef.nameorsecretRef.key. - New tests covering both code paths and all four reject cases.
v0.10.1
0.10.1 — 2026-04-14
Fixed
- v0.10.0's reload trigger raced kubelet ConfigMap projection. The
reconciler POSTed/-/reloadimmediately on ConfigMap change, but
kubelet takes 60-90s to refresh the mounted file in the pod. The
reload fired against stale content and no second reload ever
happened, so additional scrape configs only took effect on pod
restart. Caught by kind verification immediately after the v0.10.0
tag.
Changed
- Replaced the controller-driven reload with a
config-reloader
sidecar container (ghcr.io/jimmidyson/configmap-reload:v0.13.1)
co-located with Prometheus. It watches/etc/prometheusdirectly
and POSTs/-/reloadwhen files change, sidestepping the kubelet
projection lag entirely. Same pattern prometheus-operator uses. triggerReloadand theHTTPfield onPrometheusClusterReconciler
are removed.reconcileConfigMapnow returns justerror.- Tests use
containerByNameinstead of indexing intoContainers[0],
so future container additions don't shift assertions.
v0.10.0
0.10.0 — 2026-04-14
Auto-reload Prometheus when its ConfigMap changes — closes the
additionalScrapeConfigs UX gap from v0.9.x ("change took effect only
after manual curl /-/reload or pod restart").
Added
PrometheusClusterReconciler.triggerReload: POSTs/-/reloadto
every Ready replica whenreconcileConfigMapreturns
OperationResultUpdated(i.e. content actually changed).
First-time creation does not trigger a reload — the pod hasn't
started yet, it'll pick the config up on first start.- HTTP client is injectable on
PrometheusClusterReconciler.HTTPfor
test substitution. - New
internal/controller/reload_test.gocovering pod listing,
multi-replica fan-out, and skipping pods without an IP.
Changed
reconcileConfigMapreturns(bool, error)instead of just
error— the bool reports whether the existing object was updated.
Internal API change; no operator behaviour change for existing
callers.
v0.9.1
0.9.1 — 2026-04-14
Fixed
additional-scrape-configs.ymlis now wrapped under ascrape_configs:
key before being written to the ConfigMap. Prometheus 2.43+
scrape_config_filesrejects a bare top-level YAML list with
cannot unmarshal !!seq into config.ScrapeConfigs. v0.9.0 produced
exactly that and the Prometheus pod CrashLooped on config load.
Caught by kind verification immediately after the v0.9.0 tag.- The CR-facing contract is unchanged — users still pass a bare list of
scrape entries; the wrapping is done by the operator.
v0.9.0
0.9.0 — 2026-04-14
User-side custom scrape config without hand-editing the operator's
ConfigMap.
Added
spec.additionalScrapeConfigs(string, top-level YAML list) on
PrometheusCluster. Stored under ConfigMap key
additional-scrape-configs.yml; mainprometheus.ymlreferences it
via Prometheus 2.43+scrape_config_filesdirective.- Webhook validation: rejects non-list YAML at
kubectl applytime
with the field path; doesn't try to validate scrape-config internals
(Prometheus reload still surfaces those). docs/SCRAPE-CONFIGS.{en,zh}.mdwith example and explicit limits
(inline-only, no auto-reload, no PodMonitor/ServiceMonitor).- Two new tests in
internal/controller/render_test.goand two in
internal/webhook/.
v0.8.0
0.8.0 — 2026-04-13
PrometheusClusterSet.spec.backupTemplate now actually takes effect on
member CRs. Closes the Deferred item from v0.5.0.
Added
PrometheusClusterSetReconciler.overlayBackup: copiesspec.backupTemplate
onto each matched member whose ownspec.backup.enabledis false and
that does not carry the opt-out annotation. Stamps
observability.merlionos.org/clusterset: <set-name>for traceability.OptOutAnnotationconstant
(observability.merlionos.org/clusterset-opt-out): members with value
"true"are never touched by any Set.- Three new envtest specs covering overlay / opt-out / member-wins.
docs/CLUSTERSET.{en,zh}.mdgrew an Overlay rules section plus
the non-obvious edge cases (deletion doesn't unwind, ownership
transfer back to the user).
Policy (all-or-nothing per member)
- Member wins when
spec.backup.enabledis already true. - Opt-out annotation always wins.
- Otherwise the member's
spec.backupis replaced wholesale with the
template plusenabled: true. No field-level merge.
v0.7.0
0.7.0 — 2026-04-13
Admission-time validation. Invalid specs are now rejected at
kubectl apply time instead of crashing the reconciler or silently
never firing a cron. Opt-in behind features.webhook=true.
Added
internal/webhook.PrometheusClusterValidator— validating admission
webhook (controller-runtime typedValidator[T]). Rejects:spec.replicas < 1spec.backup.enabled=truewith emptyspec.backup.bucketspec.backup.schedulethat failscron.ParseStandardspec.remoteWrite[].urlempty
cmd/main.goflag--enable-webhook; uses the existing
--webhook-cert-pathplumbing.- Helm chart:
features.webhook,webhook.*values. When enabled,
the chart creates a cert-managerIssuer+Certificate(self-signed
default) + Service +ValidatingWebhookConfigurationwith the
cert-manager.io/inject-ca-fromannotation. - Unit tests covering each rejection path + the happy path.
- Verified on kind:
kubectl applyof invalid specs gets the webhook's
specific error message.
v0.6.0
0.6.0 — 2026-04-13
Real backups. Closes the biggest honesty gap in the project: from v0.1.0
until now the scheduler uploaded the admin-API JSON response as the
"backup artifact" — a marker, not something you could restore from. This
release replaces that with a proper tar of the on-disk snapshot
directory, streamed via S3 multipart, with on-pod cleanup afterwards.
Verified end-to-end on kind+MinIO: a 1-minute cron produced 108–112 KiB
tarballs containing real TSDB blocks (chunks, index, meta.json).
Added
PodExecutorinterface andSPDYExecutorimplementation using
k8s.io/client-go/tools/remotecommand. Invoked with
tar -C /prometheus/snapshots -cf - <snapshot-name>to stream the
snapshot dir out of the pod.- Multipart streaming upload via the s3
manager.Uploader
(backup.S3Client.StreamUpload). Required becausePutObjectrejects
unseekable pipe readers over plain HTTP. - Snapshot admin-API response parser — the returned directory name is
what gets tarred and then deleted. - Best-effort cleanup:
rm -rf /prometheus/snapshots/<name>after a
successful upload so snapshot dirs don't accumulate on the PVC. - RBAC:
pods/execcreate verb added to the Helm ClusterRole. - Fallback path preserved: when
Execis nil (unit tests, non-cluster
contexts), the scheduler still uploads the admin-API response so
existing tests keep their shape.
Changed
backup.Uploaderinterface gainsStreamUpload. Existing PutObject
callers continue to work; streaming goes through the new method.docs/RESTORE.{en,zh}.mdheader rewritten — no more "this is a marker,
adjust the tar step yourself" caveat.
Fixed
PutObject, compute input header checksum failed, unseekable stream is not supported without TLS and trailing checksum— surfaced during
kind verification when piping anio.PipeReaderinto PutObject.
v0.5.0
0.5.0 — 2026-04-13
Multi-cluster aggregation. Adds the PrometheusClusterSet cluster-scoped
CRD, the flagship Milestone-4 feature: groups PrometheusCluster
resources by label across namespaces and reports membership + per-phase
counts in the Set's status.
Added
PrometheusClusterSet(cluster-scoped) withspec.clusterSelector,
spec.namespaceSelector, andspec.backupTemplate.- Set reconciler that watches
PrometheusClusterevents and refreshes
every Set'sstatus.{memberCount,phaseCount,members}. - REST API:
GET /api/clustersets,GET /api/clustersets/:name. - RBAC: cluster-scoped read on
prometheusclustersetsandnamespaces,
plus the new status/finalizers verbs in the Helm chart's ClusterRole. - Envtest specs covering label-match and "match everything" branches.
docs/CLUSTERSET.{en,zh}.mddescribing the model, REST surface, and
what is deliberately out of scope (no auto-overlay of the
backupTemplate, no cross-Kubernetes federation).
Deferred
- Mutating member CRs from the Set's
backupTemplate. Tracked under
"Later" in the roadmap.