Releases · MerlionOS/tsdb-operator

13 Apr 22:59

lai3d

v1.0.0

fc1f545

v1.0.0 Latest

Latest

1.0.0 — 2026-04-14

First stable release. API surface for observability.merlionos.org/v1
is now covered by semver: breaking changes require a major bump. Plan
details in docs/V1-PREP.md.

Breaking

Only one breaking change since v0.x, and it already landed in v0.11.0:

PrometheusCluster.spec.additionalScrapeConfigs is a struct with
mutually-exclusive inline / secretRef, not a bare string. See
the v0.11.0 migration diff.

Users on v0.11.x upgrading to v1.0.0 need no CR edits.
Users on v0.10.x or earlier must convert the field per the v0.11.0
diff before upgrading.

Added

Print columns: kubectl get prometheuscluster shows Phase / Ready /
Age; kubectl get prometheusclusterset shows Members / Age.
CRD-level validation: MinLength on remoteWrite[].url and
backup.schedule; Prometheus-duration pattern
(^[0-9]+(ms|s|m|h|d|w|y)$) on retention; Enum on
status.phase. Complements the admission webhook.
+kubebuilder:storageversion explicit on both CRDs.

Changed

Default spec.image bumped to prom/prometheus:v2.55.1
(Prometheus 2.x LTS at release time).
Default spec.thanos.image bumped to quay.io/thanos/thanos:v0.37.2.

Stability guarantees

See docs/V1-PREP.md for the full field-by-field
v1 status table. In summary:

CRD fields are under semver.
Internal packages (internal/...) are not importable; layout
may change.
REST API JSON shape tracks the CRD; additive changes are non-breaking.
Helm chart values are additive-compatible; new opt-in keys don't
require a major.
Audit log table schema is operator-owned; no external reads.

Deprecation policy (going forward)

Fields deprecate for one minor version before removal, marked with
// Deprecated: and +kubebuilder:deprecatedversion:warning.
Renames run the new and old field in parallel for one minor with the
new winning. Schema removals require a major bump.

Stats

12 shipped releases under v0.x before v1.0
10 of those 12 caught a real bug via kind verification
Final kind pass before v1.0 tag: zero bugs
Coverage: internal/controller 75%, internal/webhook 76%,
pkg/api 56%

Assets 2

13 Apr 17:16

lai3d

v0.11.0

0cc65fe

v0.11.0

0.11.0 — 2026-04-14

Breaking schema change in preparation for v1.0 — see
docs/V1-PREP.md. The only intentional breaking
change planned for v1; landing it now lets v1 promote in place.

Changed (breaking)

PrometheusCluster.spec.additionalScrapeConfigs is no longer a bare
string. It is now a struct with mutually-exclusive sub-fields:
- inline (string) — same content as the old field, wrapped under
  scrape_configs: in the operator-managed ConfigMap.
- secretRef (corev1.SecretKeySelector) — Secret whose value is a
  complete Prometheus scrape config file (must already include
  scrape_configs:). Mounted at /etc/prometheus/extra-secret/<key>.

Migration v0.10.x → v0.11.0

 spec:
-  additionalScrapeConfigs: |
-    - job_name: my-app
-      static_configs:
-        - targets: [my-app:8080]
+  additionalScrapeConfigs:
+    inline: |
+      - job_name: my-app
+        static_configs:
+          - targets: [my-app:8080]

The webhook will reject the old shape; CR YAML must be edited before
operators upgrade. Operator binary upgrade itself is safe — existing
ConfigMaps are regenerated on next reconcile.

Added

Webhook validation: rejects both / neither set, missing
secretRef.name or secretRef.key.
New tests covering both code paths and all four reject cases.

Assets 2

13 Apr 17:03

lai3d

v0.10.1

0d41aa3

v0.10.1

0.10.1 — 2026-04-14

Fixed

v0.10.0's reload trigger raced kubelet ConfigMap projection. The
reconciler POSTed /-/reload immediately on ConfigMap change, but
kubelet takes 60-90s to refresh the mounted file in the pod. The
reload fired against stale content and no second reload ever
happened, so additional scrape configs only took effect on pod
restart. Caught by kind verification immediately after the v0.10.0
tag.

Changed

Replaced the controller-driven reload with a config-reloader
sidecar container (ghcr.io/jimmidyson/configmap-reload:v0.13.1)
co-located with Prometheus. It watches /etc/prometheus directly
and POSTs /-/reload when files change, sidestepping the kubelet
projection lag entirely. Same pattern prometheus-operator uses.
triggerReload and the HTTP field on PrometheusClusterReconciler
are removed. reconcileConfigMap now returns just error.
Tests use containerByName instead of indexing into Containers[0],
so future container additions don't shift assertions.

Assets 2

13 Apr 16:49

lai3d

v0.10.0

95d5d6c

v0.10.0

0.10.0 — 2026-04-14

Auto-reload Prometheus when its ConfigMap changes — closes the
additionalScrapeConfigs UX gap from v0.9.x ("change took effect only
after manual curl /-/reload or pod restart").

Added

PrometheusClusterReconciler.triggerReload: POSTs /-/reload to
every Ready replica when reconcileConfigMap returns
OperationResultUpdated (i.e. content actually changed).
First-time creation does not trigger a reload — the pod hasn't
started yet, it'll pick the config up on first start.
HTTP client is injectable on PrometheusClusterReconciler.HTTP for
test substitution.
New internal/controller/reload_test.go covering pod listing,
multi-replica fan-out, and skipping pods without an IP.

Changed

reconcileConfigMap returns (bool, error) instead of just
error — the bool reports whether the existing object was updated.
Internal API change; no operator behaviour change for existing
callers.

Assets 2

13 Apr 16:40

lai3d

v0.9.1

8988b7a

v0.9.1

0.9.1 — 2026-04-14

Fixed

additional-scrape-configs.yml is now wrapped under a scrape_configs:
key before being written to the ConfigMap. Prometheus 2.43+
scrape_config_files rejects a bare top-level YAML list with
cannot unmarshal !!seq into config.ScrapeConfigs. v0.9.0 produced
exactly that and the Prometheus pod CrashLooped on config load.
Caught by kind verification immediately after the v0.9.0 tag.
The CR-facing contract is unchanged — users still pass a bare list of
scrape entries; the wrapping is done by the operator.

Assets 2

13 Apr 16:32

lai3d

v0.9.0

ffb2e3b

v0.9.0

0.9.0 — 2026-04-14

User-side custom scrape config without hand-editing the operator's
ConfigMap.

Added

spec.additionalScrapeConfigs (string, top-level YAML list) on
PrometheusCluster. Stored under ConfigMap key
additional-scrape-configs.yml; main prometheus.yml references it
via Prometheus 2.43+ scrape_config_files directive.
Webhook validation: rejects non-list YAML at kubectl apply time
with the field path; doesn't try to validate scrape-config internals
(Prometheus reload still surfaces those).
docs/SCRAPE-CONFIGS.{en,zh}.md with example and explicit limits
(inline-only, no auto-reload, no PodMonitor/ServiceMonitor).
Two new tests in internal/controller/render_test.go and two in
internal/webhook/.

Assets 2

13 Apr 15:26

lai3d

v0.8.0

30d5f6e

v0.8.0

0.8.0 — 2026-04-13

PrometheusClusterSet.spec.backupTemplate now actually takes effect on
member CRs. Closes the Deferred item from v0.5.0.

Added

PrometheusClusterSetReconciler.overlayBackup: copies spec.backupTemplate
onto each matched member whose own spec.backup.enabled is false and
that does not carry the opt-out annotation. Stamps
observability.merlionos.org/clusterset: <set-name> for traceability.
OptOutAnnotation constant
(observability.merlionos.org/clusterset-opt-out): members with value
"true" are never touched by any Set.
Three new envtest specs covering overlay / opt-out / member-wins.
docs/CLUSTERSET.{en,zh}.md grew an Overlay rules section plus
the non-obvious edge cases (deletion doesn't unwind, ownership
transfer back to the user).

Policy (all-or-nothing per member)

Member wins when spec.backup.enabled is already true.
Opt-out annotation always wins.
Otherwise the member's spec.backup is replaced wholesale with the
template plus enabled: true. No field-level merge.

Assets 2

13 Apr 13:48

lai3d

v0.7.0

9f3e2e7

v0.7.0

0.7.0 — 2026-04-13

Admission-time validation. Invalid specs are now rejected at
kubectl apply time instead of crashing the reconciler or silently
never firing a cron. Opt-in behind features.webhook=true.

Added

internal/webhook.PrometheusClusterValidator — validating admission
webhook (controller-runtime typed Validator[T]). Rejects:
- spec.replicas < 1
- spec.backup.enabled=true with empty spec.backup.bucket
- spec.backup.schedule that fails cron.ParseStandard
- spec.remoteWrite[].url empty
cmd/main.go flag --enable-webhook; uses the existing
--webhook-cert-path plumbing.
Helm chart: features.webhook, webhook.* values. When enabled,
the chart creates a cert-manager Issuer+Certificate (self-signed
default) + Service + ValidatingWebhookConfiguration with the
cert-manager.io/inject-ca-from annotation.
Unit tests covering each rejection path + the happy path.
Verified on kind: kubectl apply of invalid specs gets the webhook's
specific error message.

Assets 2

13 Apr 12:14

lai3d

v0.6.0

feaa965

v0.6.0

0.6.0 — 2026-04-13

Real backups. Closes the biggest honesty gap in the project: from v0.1.0
until now the scheduler uploaded the admin-API JSON response as the
"backup artifact" — a marker, not something you could restore from. This
release replaces that with a proper tar of the on-disk snapshot
directory, streamed via S3 multipart, with on-pod cleanup afterwards.
Verified end-to-end on kind+MinIO: a 1-minute cron produced 108–112 KiB
tarballs containing real TSDB blocks (chunks, index, meta.json).

Added

PodExecutor interface and SPDYExecutor implementation using
k8s.io/client-go/tools/remotecommand. Invoked with
tar -C /prometheus/snapshots -cf - <snapshot-name> to stream the
snapshot dir out of the pod.
Multipart streaming upload via the s3 manager.Uploader
(backup.S3Client.StreamUpload). Required because PutObject rejects
unseekable pipe readers over plain HTTP.
Snapshot admin-API response parser — the returned directory name is
what gets tarred and then deleted.
Best-effort cleanup: rm -rf /prometheus/snapshots/<name> after a
successful upload so snapshot dirs don't accumulate on the PVC.
RBAC: pods/exec create verb added to the Helm ClusterRole.
Fallback path preserved: when Exec is nil (unit tests, non-cluster
contexts), the scheduler still uploads the admin-API response so
existing tests keep their shape.

Changed

backup.Uploader interface gains StreamUpload. Existing PutObject
callers continue to work; streaming goes through the new method.
docs/RESTORE.{en,zh}.md header rewritten — no more "this is a marker,
adjust the tar step yourself" caveat.

Fixed

PutObject, compute input header checksum failed, unseekable stream is not supported without TLS and trailing checksum — surfaced during
kind verification when piping an io.PipeReader into PutObject.

Assets 2

13 Apr 09:21

lai3d

v0.5.0

e9377ab

v0.5.0

0.5.0 — 2026-04-13

Multi-cluster aggregation. Adds the PrometheusClusterSet cluster-scoped
CRD, the flagship Milestone-4 feature: groups PrometheusCluster
resources by label across namespaces and reports membership + per-phase
counts in the Set's status.

Added

PrometheusClusterSet (cluster-scoped) with spec.clusterSelector,
spec.namespaceSelector, and spec.backupTemplate.
Set reconciler that watches PrometheusCluster events and refreshes
every Set's status.{memberCount,phaseCount,members}.
REST API: GET /api/clustersets, GET /api/clustersets/:name.
RBAC: cluster-scoped read on prometheusclustersets and namespaces,
plus the new status/finalizers verbs in the Helm chart's ClusterRole.
Envtest specs covering label-match and "match everything" branches.
docs/CLUSTERSET.{en,zh}.md describing the model, REST surface, and
what is deliberately out of scope (no auto-overlay of the
backupTemplate, no cross-Kubernetes federation).

Deferred

Mutating member CRs from the Set's backupTemplate. Tracked under
"Later" in the roadmap.

Assets 2

Releases: MerlionOS/tsdb-operator

v1.0.0

1.0.0 — 2026-04-14

Breaking

Added

Changed

Stability guarantees

Deprecation policy (going forward)

Stats

Uh oh!

v0.11.0

0.11.0 — 2026-04-14

Changed (breaking)

Migration v0.10.x → v0.11.0

Added

Uh oh!

v0.10.1

0.10.1 — 2026-04-14

Fixed

Changed

Uh oh!

v0.10.0

0.10.0 — 2026-04-14

Added

Changed

Uh oh!

v0.9.1

0.9.1 — 2026-04-14

Fixed

Uh oh!

v0.9.0

0.9.0 — 2026-04-14

Added

Uh oh!

v0.8.0

0.8.0 — 2026-04-13

Added

Policy (all-or-nothing per member)

Uh oh!

v0.7.0

0.7.0 — 2026-04-13

Added

Uh oh!

v0.6.0

0.6.0 — 2026-04-13

Added

Changed

Fixed

Uh oh!

v0.5.0

0.5.0 — 2026-04-13

Added

Deferred

Uh oh!