fix: run init-job on first install by jonasbadstuebner · Pull Request #450 · cockroachdb/helm-charts

jonasbadstuebner · 2025-01-21T23:40:25Z

There are multiple issues stating the init-job is not run as expected

This is because there is a deadlock with the CRDB-StatfulSet requiring the init-job to run, which is only ran by Helm when the StatefulSet is considered ready

The optional use of the --wait of helm install is causing differing observations

This PR attempts to fix the problem by using the Job as plain Job instead of hook when the Chart is initially installed

jonasbadstuebner · 2025-01-21T23:52:07Z

⚠ Not tested yet ⚠
The rendering looks good - I am just not sure if this behaves as I think it does.
If someone else wants to test it, feel free.

Expected behavior:
On install (--wait given or not does not matter):

1. The CRDB-StatefulSet and the init-Job are deployed almost at the same time
2. The init-job initializes the DB - once it is ready for initialization - and finishes
3. The `helm install` command succeeds
4. The Job is eventually auto-deleted by Kubernetes due to the `ttlSecondsAfterFinished` being 0
   (this unblocks the `helm upgrade`)

On Upgrade:

(0. The init-Job is removed, if present from a previous `helm upgrade` - should not be
    there anymore from the `helm install` step)
1. The CRDB-StatefulSet is upgraded
2. The init-Job is run after the StatefulSet is upgraded and ready
3. The `helm upgrade` command succeeds

jonasbadstuebner · 2025-01-22T00:14:41Z

The open issues related to the init-job deadlock:
For sure:
#69
#287
#402

Probably:
#345 (same error message)

The closed issues related to the init-job deadlock:
#195
#234
#389 (I'd argue this is the same topic, even though the person put the blame on terraform)
#400
#410

This just shows that the topic is highly confusing, as people expect a simple helm install to work. I really hope my solution works.
Otherwise, I would love to see a big note in the README at least, to not use --wait on helm install.

jonasbadstuebner · 2025-03-12T21:31:36Z

I just had big success with the init job.

The changes I propose in this PR seem to do their job (pun intended...?).

The cluster was initialized without any manual action.
This is my install and upgrade section for the Fluxv2 HelmRelease:

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
[...]
spec:
  install:
    remediation:
      retries: -1
  upgrade:
    remediation:
      retries: -1

Nothing special required here, as you can see. Flux enables --wait and --wait-for-jobs per default, so these flags should work on the normal helm install command after my changes too.

The init-job was applied in parallel to the StatefulSet, ran in the loop until the CRDB-pods accepted the init-command, ran the init, printed a success message and stayed as "completed" (which is expected and totally fine) before I got the Helm install succeeded message.

And when the upgrade ran, it removed the previously installed job - because of the helm.sh/hook-delete-policy: before-hook-creation annotation - and it ran the job after the StatefulSet was fully restarted.

Side-Note: It is probably debatable, if the init-job is even needed at all on upgrade...because what does it do there?

I mark this PR as ready and would be happy to address any questions or notes regarding my code changes.

jonasbadstuebner · 2025-03-12T21:33:05Z

I'm unsure what a good title for this PR would be, so if you have one, let me know please

jonasbadstuebner · 2025-03-12T21:38:28Z

closes #69
closes #287
closes #402
closes #345 (I think - as I said, it is the same error message and the behavior is the same as when the init-job is not run because the StatefulSet did not become ready - also the person used --wait=true in their install command)

prafull01 · 2025-07-16T15:31:09Z

cockroachdb/templates/job.init.yaml

-              initCluster() {
-                while true; do


Removing this loop can cause issues, what if the init was not successful in first attempt. Job will fail and cockroachdb will not come up.

cockroach init already runs in a loop. I think this was changed since the Job was created and the Job was never updated..

prafull01 · 2025-07-16T15:33:16Z

cockroachdb/templates/job.init.yaml

+    helm.sh/hook: post-upgrade
+    helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded


The main issue with this is the job will be left in the cluster and when you do an upgrade, the new image will be patched to this job which is not allowed as job is an immutable resource in kubernetes.

Since the hook-delete-policy still applies on Upgrade, it removes the old job before creating the one for the upgrade.
Feel free to test it out.

There are multiple issues stating the init-job is not run as expected This is because there is a deadlock with the CRDB-StatfulSet requiring the init-job to run, which is only ran by Helm when the StatefulSet is considered ready The optional use of the --wait of `helm install` is causing differing observations This PR attempts to fix the problem by using the Job as plain Job instead of hook when the Chart is initially installed

jonasbadstuebner mentioned this pull request Jan 21, 2025

CockroachDB doesnt start and init doesnt seem to launch #402

Open

jonasbadstuebner force-pushed the fix-init-job-running branch 5 times, most recently from 603e86a to 2162138 Compare March 12, 2025 20:35

jonasbadstuebner marked this pull request as ready for review March 12, 2025 21:31

jonasbadstuebner changed the title ~~fix: Make init-job run as hook only on install~~ fix: Make init-job a Helm Chart-Hook only on upgrade Mar 12, 2025

jonasbadstuebner changed the title ~~fix: Make init-job a Helm Chart-Hook only on upgrade~~ fix: run init-job on first install Mar 13, 2025

jonasbadstuebner force-pushed the fix-init-job-running branch 4 times, most recently from 7184a75 to 63deb0f Compare April 27, 2025 21:53

jonasbadstuebner mentioned this pull request Jul 16, 2025

ensure the init job is created as a helm hook so the chart can be upgraded #68

Merged

prafull01 reviewed Jul 16, 2025

View reviewed changes

jonasbadstuebner force-pushed the fix-init-job-running branch from 63deb0f to bd9eead Compare July 16, 2025 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: run init-job on first install#450

fix: run init-job on first install#450
jonasbadstuebner wants to merge 1 commit intocockroachdb:masterfrom
jonasbadstuebner:fix-init-job-running

jonasbadstuebner commented Jan 21, 2025

Uh oh!

jonasbadstuebner commented Jan 21, 2025

Uh oh!

jonasbadstuebner commented Jan 22, 2025

Uh oh!

jonasbadstuebner commented Mar 12, 2025 •

edited

Loading

Uh oh!

jonasbadstuebner commented Mar 12, 2025

Uh oh!

jonasbadstuebner commented Mar 12, 2025

Uh oh!

prafull01 Jul 16, 2025

Uh oh!

jonasbadstuebner Jul 16, 2025

Uh oh!

prafull01 Jul 16, 2025

Uh oh!

jonasbadstuebner Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		helm.sh/hook: post-upgrade
		helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded

Conversation

jonasbadstuebner commented Jan 21, 2025

Uh oh!

jonasbadstuebner commented Jan 21, 2025

Uh oh!

jonasbadstuebner commented Jan 22, 2025

Uh oh!

jonasbadstuebner commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonasbadstuebner commented Mar 12, 2025

Uh oh!

jonasbadstuebner commented Mar 12, 2025

Uh oh!

prafull01 Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

jonasbadstuebner Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

prafull01 Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

jonasbadstuebner Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonasbadstuebner commented Mar 12, 2025 •

edited

Loading