Skip to content

Releases: MerlionOS/tsdb-operator

v1.0.0

13 Apr 22:59

Choose a tag to compare

1.0.0 — 2026-04-14

First stable release. API surface for observability.merlionos.org/v1
is now covered by semver: breaking changes require a major bump. Plan
details in docs/V1-PREP.md.

Breaking

Only one breaking change since v0.x, and it already landed in v0.11.0:

  • PrometheusCluster.spec.additionalScrapeConfigs is a struct with
    mutually-exclusive inline / secretRef, not a bare string. See
    the v0.11.0 migration diff.

Users on v0.11.x upgrading to v1.0.0 need no CR edits.
Users on v0.10.x or earlier must convert the field per the v0.11.0
diff before upgrading.

Added

  • Print columns: kubectl get prometheuscluster shows Phase / Ready /
    Age; kubectl get prometheusclusterset shows Members / Age.
  • CRD-level validation: MinLength on remoteWrite[].url and
    backup.schedule; Prometheus-duration pattern
    (^[0-9]+(ms|s|m|h|d|w|y)$) on retention; Enum on
    status.phase. Complements the admission webhook.
  • +kubebuilder:storageversion explicit on both CRDs.

Changed

  • Default spec.image bumped to prom/prometheus:v2.55.1
    (Prometheus 2.x LTS at release time).
  • Default spec.thanos.image bumped to quay.io/thanos/thanos:v0.37.2.

Stability guarantees

See docs/V1-PREP.md for the full field-by-field
v1 status table. In summary:

  • CRD fields are under semver.
  • Internal packages (internal/...) are not importable; layout
    may change.
  • REST API JSON shape tracks the CRD; additive changes are non-breaking.
  • Helm chart values are additive-compatible; new opt-in keys don't
    require a major.
  • Audit log table schema is operator-owned; no external reads.

Deprecation policy (going forward)

Fields deprecate for one minor version before removal, marked with
// Deprecated: and +kubebuilder:deprecatedversion:warning.
Renames run the new and old field in parallel for one minor with the
new winning. Schema removals require a major bump.

Stats

  • 12 shipped releases under v0.x before v1.0
  • 10 of those 12 caught a real bug via kind verification
  • Final kind pass before v1.0 tag: zero bugs
  • Coverage: internal/controller 75%, internal/webhook 76%,
    pkg/api 56%

v0.11.0

13 Apr 17:16

Choose a tag to compare

0.11.0 — 2026-04-14

Breaking schema change in preparation for v1.0 — see
docs/V1-PREP.md. The only intentional breaking
change planned for v1; landing it now lets v1 promote in place.

Changed (breaking)

  • PrometheusCluster.spec.additionalScrapeConfigs is no longer a bare
    string. It is now a struct with mutually-exclusive sub-fields:
    • inline (string) — same content as the old field, wrapped under
      scrape_configs: in the operator-managed ConfigMap.
    • secretRef (corev1.SecretKeySelector) — Secret whose value is a
      complete Prometheus scrape config file (must already include
      scrape_configs:). Mounted at /etc/prometheus/extra-secret/<key>.

Migration v0.10.x → v0.11.0

 spec:
-  additionalScrapeConfigs: |
-    - job_name: my-app
-      static_configs:
-        - targets: [my-app:8080]
+  additionalScrapeConfigs:
+    inline: |
+      - job_name: my-app
+        static_configs:
+          - targets: [my-app:8080]

The webhook will reject the old shape; CR YAML must be edited before
operators upgrade. Operator binary upgrade itself is safe — existing
ConfigMaps are regenerated on next reconcile.

Added

  • Webhook validation: rejects both / neither set, missing
    secretRef.name or secretRef.key.
  • New tests covering both code paths and all four reject cases.

v0.10.1

13 Apr 17:03

Choose a tag to compare

0.10.1 — 2026-04-14

Fixed

  • v0.10.0's reload trigger raced kubelet ConfigMap projection. The
    reconciler POSTed /-/reload immediately on ConfigMap change, but
    kubelet takes 60-90s to refresh the mounted file in the pod. The
    reload fired against stale content and no second reload ever
    happened, so additional scrape configs only took effect on pod
    restart. Caught by kind verification immediately after the v0.10.0
    tag.

Changed

  • Replaced the controller-driven reload with a config-reloader
    sidecar container (ghcr.io/jimmidyson/configmap-reload:v0.13.1)
    co-located with Prometheus. It watches /etc/prometheus directly
    and POSTs /-/reload when files change, sidestepping the kubelet
    projection lag entirely. Same pattern prometheus-operator uses.
  • triggerReload and the HTTP field on PrometheusClusterReconciler
    are removed. reconcileConfigMap now returns just error.
  • Tests use containerByName instead of indexing into Containers[0],
    so future container additions don't shift assertions.

v0.10.0

13 Apr 16:49

Choose a tag to compare

0.10.0 — 2026-04-14

Auto-reload Prometheus when its ConfigMap changes — closes the
additionalScrapeConfigs UX gap from v0.9.x ("change took effect only
after manual curl /-/reload or pod restart").

Added

  • PrometheusClusterReconciler.triggerReload: POSTs /-/reload to
    every Ready replica when reconcileConfigMap returns
    OperationResultUpdated (i.e. content actually changed).
    First-time creation does not trigger a reload — the pod hasn't
    started yet, it'll pick the config up on first start.
  • HTTP client is injectable on PrometheusClusterReconciler.HTTP for
    test substitution.
  • New internal/controller/reload_test.go covering pod listing,
    multi-replica fan-out, and skipping pods without an IP.

Changed

  • reconcileConfigMap returns (bool, error) instead of just
    error — the bool reports whether the existing object was updated.
    Internal API change; no operator behaviour change for existing
    callers.

v0.9.1

13 Apr 16:40

Choose a tag to compare

0.9.1 — 2026-04-14

Fixed

  • additional-scrape-configs.yml is now wrapped under a scrape_configs:
    key before being written to the ConfigMap. Prometheus 2.43+
    scrape_config_files rejects a bare top-level YAML list with
    cannot unmarshal !!seq into config.ScrapeConfigs. v0.9.0 produced
    exactly that and the Prometheus pod CrashLooped on config load.
    Caught by kind verification immediately after the v0.9.0 tag.
  • The CR-facing contract is unchanged — users still pass a bare list of
    scrape entries; the wrapping is done by the operator.

v0.9.0

13 Apr 16:32

Choose a tag to compare

0.9.0 — 2026-04-14

User-side custom scrape config without hand-editing the operator's
ConfigMap.

Added

  • spec.additionalScrapeConfigs (string, top-level YAML list) on
    PrometheusCluster. Stored under ConfigMap key
    additional-scrape-configs.yml; main prometheus.yml references it
    via Prometheus 2.43+ scrape_config_files directive.
  • Webhook validation: rejects non-list YAML at kubectl apply time
    with the field path; doesn't try to validate scrape-config internals
    (Prometheus reload still surfaces those).
  • docs/SCRAPE-CONFIGS.{en,zh}.md with example and explicit limits
    (inline-only, no auto-reload, no PodMonitor/ServiceMonitor).
  • Two new tests in internal/controller/render_test.go and two in
    internal/webhook/.

v0.8.0

13 Apr 15:26

Choose a tag to compare

0.8.0 — 2026-04-13

PrometheusClusterSet.spec.backupTemplate now actually takes effect on
member CRs. Closes the Deferred item from v0.5.0.

Added

  • PrometheusClusterSetReconciler.overlayBackup: copies spec.backupTemplate
    onto each matched member whose own spec.backup.enabled is false and
    that does not carry the opt-out annotation. Stamps
    observability.merlionos.org/clusterset: <set-name> for traceability.
  • OptOutAnnotation constant
    (observability.merlionos.org/clusterset-opt-out): members with value
    "true" are never touched by any Set.
  • Three new envtest specs covering overlay / opt-out / member-wins.
  • docs/CLUSTERSET.{en,zh}.md grew an Overlay rules section plus
    the non-obvious edge cases (deletion doesn't unwind, ownership
    transfer back to the user).

Policy (all-or-nothing per member)

  • Member wins when spec.backup.enabled is already true.
  • Opt-out annotation always wins.
  • Otherwise the member's spec.backup is replaced wholesale with the
    template plus enabled: true. No field-level merge.

v0.7.0

13 Apr 13:48

Choose a tag to compare

0.7.0 — 2026-04-13

Admission-time validation. Invalid specs are now rejected at
kubectl apply time instead of crashing the reconciler or silently
never firing a cron. Opt-in behind features.webhook=true.

Added

  • internal/webhook.PrometheusClusterValidator — validating admission
    webhook (controller-runtime typed Validator[T]). Rejects:
    • spec.replicas < 1
    • spec.backup.enabled=true with empty spec.backup.bucket
    • spec.backup.schedule that fails cron.ParseStandard
    • spec.remoteWrite[].url empty
  • cmd/main.go flag --enable-webhook; uses the existing
    --webhook-cert-path plumbing.
  • Helm chart: features.webhook, webhook.* values. When enabled,
    the chart creates a cert-manager Issuer+Certificate (self-signed
    default) + Service + ValidatingWebhookConfiguration with the
    cert-manager.io/inject-ca-from annotation.
  • Unit tests covering each rejection path + the happy path.
  • Verified on kind: kubectl apply of invalid specs gets the webhook's
    specific error message.

v0.6.0

13 Apr 12:14

Choose a tag to compare

0.6.0 — 2026-04-13

Real backups. Closes the biggest honesty gap in the project: from v0.1.0
until now the scheduler uploaded the admin-API JSON response as the
"backup artifact" — a marker, not something you could restore from. This
release replaces that with a proper tar of the on-disk snapshot
directory, streamed via S3 multipart, with on-pod cleanup afterwards.
Verified end-to-end on kind+MinIO: a 1-minute cron produced 108–112 KiB
tarballs containing real TSDB blocks (chunks, index, meta.json).

Added

  • PodExecutor interface and SPDYExecutor implementation using
    k8s.io/client-go/tools/remotecommand. Invoked with
    tar -C /prometheus/snapshots -cf - <snapshot-name> to stream the
    snapshot dir out of the pod.
  • Multipart streaming upload via the s3 manager.Uploader
    (backup.S3Client.StreamUpload). Required because PutObject rejects
    unseekable pipe readers over plain HTTP.
  • Snapshot admin-API response parser — the returned directory name is
    what gets tarred and then deleted.
  • Best-effort cleanup: rm -rf /prometheus/snapshots/<name> after a
    successful upload so snapshot dirs don't accumulate on the PVC.
  • RBAC: pods/exec create verb added to the Helm ClusterRole.
  • Fallback path preserved: when Exec is nil (unit tests, non-cluster
    contexts), the scheduler still uploads the admin-API response so
    existing tests keep their shape.

Changed

  • backup.Uploader interface gains StreamUpload. Existing PutObject
    callers continue to work; streaming goes through the new method.
  • docs/RESTORE.{en,zh}.md header rewritten — no more "this is a marker,
    adjust the tar step yourself" caveat.

Fixed

  • PutObject, compute input header checksum failed, unseekable stream is not supported without TLS and trailing checksum — surfaced during
    kind verification when piping an io.PipeReader into PutObject.

v0.5.0

13 Apr 09:21

Choose a tag to compare

0.5.0 — 2026-04-13

Multi-cluster aggregation. Adds the PrometheusClusterSet cluster-scoped
CRD, the flagship Milestone-4 feature: groups PrometheusCluster
resources by label across namespaces and reports membership + per-phase
counts in the Set's status.

Added

  • PrometheusClusterSet (cluster-scoped) with spec.clusterSelector,
    spec.namespaceSelector, and spec.backupTemplate.
  • Set reconciler that watches PrometheusCluster events and refreshes
    every Set's status.{memberCount,phaseCount,members}.
  • REST API: GET /api/clustersets, GET /api/clustersets/:name.
  • RBAC: cluster-scoped read on prometheusclustersets and namespaces,
    plus the new status/finalizers verbs in the Helm chart's ClusterRole.
  • Envtest specs covering label-match and "match everything" branches.
  • docs/CLUSTERSET.{en,zh}.md describing the model, REST surface, and
    what is deliberately out of scope (no auto-overlay of the
    backupTemplate, no cross-Kubernetes federation).

Deferred

  • Mutating member CRs from the Set's backupTemplate. Tracked under
    "Later" in the roadmap.