Skip to content

Double pg-dump (2Gi) & increase hg-repos (175Gi)#2222

Open
myieye wants to merge 2 commits intodevelopfrom
increase-sizes-pg-dump-and-hg-repos
Open

Double pg-dump (2Gi) & increase hg-repos (175Gi)#2222
myieye wants to merge 2 commits intodevelopfrom
increase-sizes-pg-dump-and-hg-repos

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Mar 28, 2026

Kopia backups are apparently failing with: (according to @g3mackay)

Usage of EmptyDir volume "pg-dump" exceeds the limit "1Gi".

And failure emails mention:

Annotations
description = PVC hg-repos is only 9.96% free.}
message = PVC hg-repos has less than 9.96% free

Regarding hg-repos

About 6 months ago we deleted all the obvious large junk.
I'm sure there's still some cleanup that could happen, but it would be quite a bit more effort this time.
This simple size bump seems reasonable and I'm now on holiday for a week.

Regarding pg-dump

We're definitely putting way more stuff in the DB these days: the entire data/change history of every project that starts using FW-Lite.

While I'm sure there's some cleanup that could happen, it will definitely continue to grow and grow.

Regarding debugging the backup failures

I have not been able to view/find any logs on any of the failed jobs. That puzzles me and makes it guess work on my side.
It seems that every time the job is triggered it first fails and then succeeds. Which also puzzles me.

@github-actions github-actions bot added the 📦 Lexbox issues related to any server side code, fw-headless included label Mar 28, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 28, 2026

📝 Walkthrough

Walkthrough

Kubernetes manifests in production and restore-scripts directories receive storage allocation updates. The pg-dump temporary volume (emptyDir) increases from 1Gi to 2Gi across backup and maintenance configurations, while the hg-repos persistent volume claim increases from 150Gi to 175Gi. These are pure resource configuration changes with no behavioral logic modifications.

Changes

Cohort / File(s) Summary
Temporary Storage Expansion
deployment/production/backup.yaml, deployment/restore-scripts/kopia-maint.yaml
Increased pg-dump emptyDir sizeLimit from 1Gi to 2Gi for PostgreSQL backup staging. kopia-maint.yaml also explicitly defines the volume name.
Persistent Storage Expansion
deployment/production/hg-repos-volume.yaml
Increased PersistentVolumeClaim storage request from 150Gi to 175Gi for Mercurial repositories.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • imnasnainaec

Poem

Hoppy clusters hop and grow,
Storage limits overflow!
Backups swell from one to two,
Repos stretch from 150 through,
More gigabytes to hold the fort, 🐰📦
Infrastructure gets the support!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: doubling pg-dump to 2Gi and increasing hg-repos to 175Gi, which matches the changeset exactly.
Description check ✅ Passed The description provides context for both storage increases, explaining the backup failures and growth patterns that motivated these changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch increase-sizes-pg-dump-and-hg-repos

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deployment/production/backup.yaml (1)

37-41: ⚠️ Potential issue | 🟠 Major

Add ephemeral-storage resource requests/limits alongside the 2Gi emptyDir volumes.

Lines 101 and 65 (kopia-maint.yaml) establish a 2Gi cap on the pg-dump emptyDir volume, but without ephemeral-storage resource requests/limits on the consuming containers, these pods can still be scheduled onto disk-constrained nodes and face eviction under disk pressure.

The postgres-dump container (lines 37–41) and create-snapshots container (lines 65–69) in backup.yaml, as well as both containers in kopia-maint.yaml, need explicit ephemeral-storage constraints:

Suggested manifest update
               resources:
                 requests:
                   memory: 150Mi
+                  ephemeral-storage: 2300Mi
                 limits:
                   memory: 150Mi
+                  ephemeral-storage: 2300Mi
@@
               resources:
                 requests:
                   memory: 1Gi
+                  ephemeral-storage: 512Mi
                 limits:
                   memory: 1Gi
+                  ephemeral-storage: 512Mi

Also applies to kopia-maint.yaml containers (lines 7–15 and 32–40).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/production/backup.yaml` around lines 37 - 41, The postgres-dump
and create-snapshots containers in backup.yaml (and both containers defined in
kopia-maint.yaml) lack ephemeral-storage resource requests/limits, so add
ephemeral-storage requests: 2Gi and limits: 2Gi to each container's resources
block (the same place where memory: 150Mi is set) so the containers declare the
2Gi emptyDir disk usage and will not be evicted under node disk pressure; update
the resources section for the containers named postgres-dump, create-snapshots
and the two containers in kopia-maint.yaml to include ephemeral-storage: 2Gi for
both requests and limits.
🧹 Nitpick comments (1)
deployment/restore-scripts/kopia-maint.yaml (1)

7-40: Harden container securityContext selectively based on image capabilities.

postgres:15-alpine can reliably run with runAsNonRoot: true (UID/GID 70) and should be hardened. However, kopia/kopia:latest (current Ubuntu-based builds) does not support runAsNonRoot: true without custom configuration or entrypoints—it defaults to root with no explicit USER directive. Official documentation typically requires root or privileged access for FUSE-based features.

Consider hardening postgres fully:

securityContext:
  allowPrivilegeEscalation: false
  runAsNonRoot: true
  runAsUser: 70
  runAsGroup: 70
  fsGroup: 70
  capabilities:
    drop: ["ALL"]

For kopia, either: (1) keep the container as-is if FUSE or privileged features are needed, or (2) if root is unnecessary, apply allowPrivilegeEscalation: false and capabilities.drop: ["ALL"] without runAsNonRoot: true, and verify compatibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployment/restore-scripts/kopia-maint.yaml` around lines 7 - 40, Add
per-container securityContext entries: for the pg container set
allowPrivilegeEscalation: false, runAsNonRoot: true, runAsUser: 70, runAsGroup:
70, fsGroup: 70 and drop all capabilities to harden Postgres (reference the
container named "pg"); for the kopia container either leave it unchanged if
FUSE/privileged behavior is required or add allowPrivilegeEscalation: false and
capabilities.drop: ["ALL"] but do NOT set runAsNonRoot for "kopia" unless you
verify the image supports a non-root user; update the pod spec's containers
section where "name: pg" and "name: kopia" are defined and ensure secrets and
volume mounts remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployment/production/hg-repos-volume.yaml`:
- Line 9: The PVC resize request currently sets storage: 175Gi but may silently
fail if the bound StorageClass (storageClassName: weekly-snapshots-retain-4)
does not have allowVolumeExpansion: true; update the manifest and workflow to
either (1) validate that the StorageClass used by this PVC has
allowVolumeExpansion: true before applying the patch, or (2) switch the PVC to a
StorageClass that supports expansion, or (3) implement a safe migration path
that creates a new PVC with 175Gi, copies data, and swaps claims; reference the
PVC definition that includes storage: 175Gi and the storageClassName
weekly-snapshots-retain-4 when making these changes.

---

Outside diff comments:
In `@deployment/production/backup.yaml`:
- Around line 37-41: The postgres-dump and create-snapshots containers in
backup.yaml (and both containers defined in kopia-maint.yaml) lack
ephemeral-storage resource requests/limits, so add ephemeral-storage requests:
2Gi and limits: 2Gi to each container's resources block (the same place where
memory: 150Mi is set) so the containers declare the 2Gi emptyDir disk usage and
will not be evicted under node disk pressure; update the resources section for
the containers named postgres-dump, create-snapshots and the two containers in
kopia-maint.yaml to include ephemeral-storage: 2Gi for both requests and limits.

---

Nitpick comments:
In `@deployment/restore-scripts/kopia-maint.yaml`:
- Around line 7-40: Add per-container securityContext entries: for the pg
container set allowPrivilegeEscalation: false, runAsNonRoot: true, runAsUser:
70, runAsGroup: 70, fsGroup: 70 and drop all capabilities to harden Postgres
(reference the container named "pg"); for the kopia container either leave it
unchanged if FUSE/privileged behavior is required or add
allowPrivilegeEscalation: false and capabilities.drop: ["ALL"] but do NOT set
runAsNonRoot for "kopia" unless you verify the image supports a non-root user;
update the pod spec's containers section where "name: pg" and "name: kopia" are
defined and ensure secrets and volume mounts remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9bb12a29-f2b3-4c28-980a-ad2bc4c86630

📥 Commits

Reviewing files that changed from the base of the PR and between 6e444cb and d11fcf9.

📒 Files selected for processing (3)
  • deployment/production/backup.yaml
  • deployment/production/hg-repos-volume.yaml
  • deployment/restore-scripts/kopia-maint.yaml

resources:
requests:
storage: 150Gi
storage: 175Gi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't bundle raising hg-repos and pg-dump volumes in one commit, these are volumes serve two quite different purposes, pg-dump is a temporary space to store postgress dumps during backup, while hg-repos is the linguistic data we wish to backup. Therefore it is important to get the backup issue resolved before doing anything with the hg-repos volume. Also does hg-repos even need increasing yet?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tim-eves thanks for the feedback!

As explained in the issue description, I do think increasing the size of hg-repos makes sense. It's time. I feel like I was fairly conservative (i.e. +25Gi), because a deep clean should happen at some time, but that will be a decent amount of work, because we've already deleted the obvious large junk.

And I have increased the volumes in 2 separate commits. I'd plan to merge them that way too. I suppose it could have been separate PRs. But...considering all that, do you approve of these changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 Lexbox issues related to any server side code, fw-headless included

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants