Skip to content

Avoid ARG_MAX overflow in bootc image-mode builds#4793

Merged
psss merged 3 commits into
teemtee:mainfrom
Lorquas:jmolet/bootc-arg-max-fix
May 18, 2026
Merged

Avoid ARG_MAX overflow in bootc image-mode builds#4793
psss merged 3 commits into
teemtee:mainfrom
Lorquas:jmolet/bootc-arg-max-fix

Conversation

@Lorquas
Copy link
Copy Markdown
Contributor

@Lorquas Lorquas commented Apr 8, 2026

Summary

When plan environment contains large values (e.g. base64-encoded SSH keys, certificates, and API tokens from CI systems like Testing Farm), bootc image-mode builds fail with OSError: [Errno 7] Argument list too long: 'ssh'. Two patterns combine to hit the OS ARG_MAX limit:

  1. collect_command() inlines the full environment as export statements into every Containerfile RUN directive, multiplying the size
  2. build_container() passes the entire generated Containerfile as a cat <<EOF SSH command-line argument

Changes

tmt/package_managers/bootc.py: Transfer the generated Containerfile to the guest via push() (rsync/SCP) instead of inlining it in a cat heredoc SSH command, removing the command-line size ceiling entirely. Uses self.guest.guest_workdir / 'Containerfile' as the local staging path.

Relates to #4518

Testing

  • Integration test (tests/prepare/large-env/): End-to-end BeakerLib test that dynamically sizes a base64 environment variable based on the system's MAX_ARG_STRLEN (PAGE_SIZE * 32). The test plan uses two prepare: how: shell steps so the full environment is inlined via collect_command() into two separate RUN directives — making the combined Containerfile exceed MAX_ARG_STRLEN. This reliably triggers the old cat <<EOF failure while passing with the push() fix.
  • Verified locally: old code fails with [Errno 7] Argument list too long: 'ssh', new code passes.
  • All pre-commit hooks pass (mypy, pyright, ruff, shellcheck, tmt lint, codespell, flake8).

Pull Request Checklist

  • implement the feature
  • extend the test coverage
  • include a release note

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where large environment variables in bootc image-mode plans could exceed the OS ARG_MAX limit by being inlined into Containerfile RUN directives. The changes introduce a mechanism to emit environment variables as ENV directives once, and switch to using the guest.push() method for writing Containerfiles instead of relying on SSH command-line execution. A new test case has been added to verify this behavior. Please move the imports in tmt/package_managers/bootc.py to the top of the file to comply with PEP 8 standards.

Comment thread tmt/package_managers/bootc.py Outdated
@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch from 6a4211e to b600f8e Compare April 8, 2026 20:20
Comment thread tmt/guest/__init__.py Outdated
@happz
Copy link
Copy Markdown
Contributor

happz commented Apr 8, 2026

env being a parameter of collect_command method: is it guaranteed that two calls to collect_command(), contributing two commands to the same "session" of future Containerfile commands, would have the same env content? For example, collect_command(..., env={'FOO': 'bar'}) followed by collect_command(..., env={'FOO': 'baz'}) would mean the second command runs with unexpected environment, correct?

Would it make sense to simply ENV directives for command's env before each command as they are getting collected, to prevent this behaviour?

Is it possible to somehow clear the environment before emitting new directives for new command? I'm thinking about collect_command(..., env={'DEBUG': '1'}) followed by collect_command(...) - the latter command would inherit environment from the first command, running with DEBUG=1 which was not what the command asked for.

@happz
Copy link
Copy Markdown
Contributor

happz commented Apr 8, 2026

Summary

When plan environment contains large values (e.g. base64-encoded SSH keys, certificates, and API tokens from CI systems like Testing Farm), bootc image-mode builds fail with OSError: [Errno 7] Argument list too long: 'ssh'. Two patterns combine to hit the OS ARG_MAX limit:

  1. collect_command() inlines the full environment as export statements into every Containerfile RUN directive, multiplying the size
  2. build_container() passes the entire generated Containerfile as a cat <<EOF SSH command-line argument

Changes

Fix A (tmt/guest/__init__.py): Emit plan environment variables as Containerfile ENV directives once at the top, rather than repeating export statements in every RUN directive. This is both smaller and more idiomatic for container builds.

This one is problematic, see #4793 (comment) - as long as Containerfile can host multiple commands, collected by a sequence of collect_command(), ENV can cause commands to see unexpected environment.

Fix B (tmt/package_managers/bootc.py): Transfer the generated Containerfile to the guest via push() (rsync/SCP) instead of inlining it in a cat heredoc SSH command, removing the command-line size ceiling entirely.

This one seems like a no-brainer, we absolutely can and should transfer the file via push(). I think you can use self.workdir / Containerfile instead of the temporary file, as the guest should be owned by one thread at one point in time, no other thread would be collecting commands or even thinking about rebuilding the guest, so there's no contest over such a path.

@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch 2 times, most recently from a5f4423 to 6b40981 Compare April 8, 2026 23:10
@Lorquas
Copy link
Copy Markdown
Contributor Author

Lorquas commented Apr 8, 2026

@happz Thanks for the hints, that simplifies things tremendously. I tried something like

                local_containerfile = self.workdir / 'Containerfile'
                local_containerfile.write_text(containerfile)

but it seems like by then workdir never exists because when either PackageManager or PackageManagerEngine run their super().__init__(logger=logger) the __init__'s only accept kwargs guest and logger, not parent. So Common.workdir returns None if parent is None. It seems like package managers are kind of detached from the run > plan > step.. workflow? I could attempt to wire that in but it would touch way more moving parts than I am comfortable handling.

@LecrisUT LecrisUT added the plugin | bootc The bootc provision plugin label Apr 15, 2026
@github-project-automation github-project-automation Bot moved this to backlog in planning Apr 15, 2026
@LecrisUT LecrisUT added the bug Something isn't working label Apr 15, 2026
@tcornell-bus
Copy link
Copy Markdown
Contributor

Hi @Lorquas, we would like to propose this for the sprint that starts on May 7th. Would you have the capacity to continue work on this based on your and @happz discussion?

@Lorquas
Copy link
Copy Markdown
Contributor Author

Lorquas commented Apr 27, 2026

@tcornell-bus Yes, I've fixed everything except the super()init problem. I can look at that this week.

@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch from 6b40981 to 0a4a1d7 Compare April 27, 2026 19:05
@Lorquas
Copy link
Copy Markdown
Contributor Author

Lorquas commented Apr 27, 2026

Updated the branch to wire parent=guest through the PackageManager / PackageManagerEngine constructors, so self.workdir is available in BootcEngine.build_container() — replacing the tempfile approach with self.workdir / 'Containerfile' as suggested.

This means all PackageManager instances now technically have access to workdir, not just the bootc engine. This should be safe: Common.workdir is lazy and only creates its directory on first access. Since no other package manager code references self.workdir that I could find.

Other Note: BootcEngine forwards constructor kwargs to its internal aux_engine (e.g. DNF5), so parent=guest gets passed there too. The aux engine simply gains an unused parent reference. Let me know if you want me to strip that out.

@Lorquas Lorquas marked this pull request as ready for review April 27, 2026 19:11
Comment thread tmt/package_managers/__init__.py Outdated
Comment thread tests/prepare/bootc-large-env/test.sh Outdated
Comment thread tests/prepare/bootc-large-env/test.sh Outdated
Comment thread tests/prepare/bootc-large-env/test.sh Outdated
Comment thread tests/prepare/bootc-large-env/main.fmf Outdated
Comment thread tests/prepare/bootc-large-env/data/test.fmf Outdated
Comment thread tests/prepare/bootc-large-env/data/plans.fmf Outdated
@LecrisUT LecrisUT added step | prepare Stuff related to the prepare step and removed plugin | bootc The bootc provision plugin labels May 7, 2026
@psss psss added this to the 1.74 milestone May 7, 2026
@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch 3 times, most recently from 583a095 to fb08432 Compare May 8, 2026 14:55
@Lorquas Lorquas requested a review from LecrisUT May 8, 2026 14:59
@tcornell-bus tcornell-bus moved this from backlog to review in planning May 8, 2026
Comment thread tests/prepare/bootc-large-env/data/plans.fmf Outdated
Comment thread tests/prepare/bootc-large-env/main.fmf Outdated
@LecrisUT
Copy link
Copy Markdown
Member

LecrisUT commented May 11, 2026

/packit build

@LecrisUT LecrisUT added the ci | full test Pull request is ready for the full test execution label May 11, 2026
@happz
Copy link
Copy Markdown
Contributor

happz commented May 12, 2026

Updated the branch to wire parent=guest through the PackageManager / PackageManagerEngine constructors, so self.workdir is available in BootcEngine.build_container() — replacing the tempfile approach with self.workdir / 'Containerfile' as suggested.

This means all PackageManager instances now technically have access to workdir, not just the bootc engine. This should be safe: Common.workdir is lazy and only creates its directory on first access. Since no other package manager code references self.workdir that I could find.

Other Note: BootcEngine forwards constructor kwargs to its internal aux_engine (e.g. DNF5), so parent=guest gets passed there too. The aux engine simply gains an unused parent reference. Let me know if you want me to strip that out.

I had to think quite hard to recall why parent becomes needed, and it seems like an unnecessary complication. And I don't remember whether I proposed it, or whether it's based on your own discovery. Would you mind using guest.guest_workdir instead of self.workdir? A package manager is tied to the guest, as such it has no workdir of its own, it will be puzzling in the future.

@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch from fb08432 to 5307117 Compare May 12, 2026 16:43
@Lorquas
Copy link
Copy Markdown
Contributor Author

Lorquas commented May 12, 2026

Updated the branch addressing all feedback and rebased on latest main. Forgive me for not replying in each thread, I figure this is better than the notification spam.

I had to think quite hard to recall why parent becomes needed

I also forgot what the original PR even looked like at this point. Here's what I got so far:

@happz — Switched to self.guest.guest_workdir / 'Containerfile' instead of self.workdir, and reverted all the parent= constructor plumbing from PackageManager.__init__ and the call sites in tmt/guest/__init__.py.

@LecrisUT

  • parent= changes fully reverted
  • Renamed tests/prepare/bootc-large-env/ to tests/prepare/large-env/, stripped refrences to bootc/image-mode
  • Flattened plans.fmf and dropped the /centos-stream-10: subplan structure since test.sh handles things
  • large-env.yaml now generated into $run/ temp directory instead of the source data/ tree

@thrix
Copy link
Copy Markdown
Contributor

thrix commented May 13, 2026

Related to #4871 #4875., the fixes there should be revisited on top of this

Copy link
Copy Markdown
Member

@LecrisUT LecrisUT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure this is better than the notification spam.

Don't worry about that, we have plenty of notification "spam" we deal on a daily basis 😅. Also incremental commits are fine, we do squash commits.

Comment thread tests/prepare/large-env/test.sh Outdated
Comment thread tmt/package_managers/bootc.py
@happz
Copy link
Copy Markdown
Contributor

happz commented May 14, 2026

@Lorquas please, check out the tests, they fail in our CI: https://artifacts.dev.testing-farm.io/2689664c-7b34-4464-a180-d2a7d5f80fc2/work-preparez7vj16yz/plans/provision/virtual/prepare/execute/data/guest/default-0/tests/prepare/large-env-3/output.txt. If I'm not mistaken, the large env file needs to be injected into the tree, tmt has a limitation in place to stick to the tree when it comes to loading files.

@psss
Copy link
Copy Markdown
Member

psss commented May 14, 2026

@Lorquas, could you please update the pull request description so that it can be used as the git commit message when squashing? Currently it contains some outdated stuff like Fix A or unit tests. Could you also add a short release note? Thanks.

Comment thread tests/prepare/large-env/test.sh
Comment thread tests/prepare/large-env/main.fmf Outdated
@psss psss moved this from review to implement in planning May 14, 2026
@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch from 3405d1a to 2471ca9 Compare May 18, 2026 13:48
@psss psss moved this from implement to review in planning May 18, 2026
@psss psss changed the title Avoid ARG_MAX overflow in bootc image-mode builds Avoid ARG_MAX overflow in bootc image-mode builds May 18, 2026
@psss
Copy link
Copy Markdown
Member

psss commented May 18, 2026

/packit build

@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch from 2471ca9 to e452b02 Compare May 18, 2026 16:45
Comment thread tests/prepare/large-env/test.sh Outdated
@Lorquas Lorquas force-pushed the jmolet/bootc-arg-max-fix branch 2 times, most recently from 8fde6f2 to 9a99c09 Compare May 18, 2026 17:46
Lorquas and others added 3 commits May 18, 2026 20:30
When plan environments contain large values, collect_command() inlines
them as export statements and build_container() passes the entire
Containerfile via `cat <<EOF` over SSH — easily exceeding the OS
ARG_MAX limit.

Write the Containerfile to the package-manager workdir and transfer it
with guest.push() (rsync/SCP) instead of passing it on the command
line.

To make self.workdir available in PackageManager instances, thread
parent=guest through the PackageManager constructor so Common.workdir
can resolve a directory.

Adds an integration test (tests/prepare/bootc-large-env) that injects
a 50 KB environment variable into a bootc image-mode build and
verifies the build succeeds.
Co-authored-by: Cristian Le <github@lecris.me>
- Generate large-env.yaml inside the fmf data/ tree so tmt accepts
  it as an --environment-file source (fixes CI failure)
- Add TODO comment for enabling container/virtual provision methods
- Change link from verifies to relates (fix is partial for teemtee#4518)
- Add release note for 1.74
@psss psss force-pushed the jmolet/bootc-arg-max-fix branch from 9a99c09 to 4e668a2 Compare May 18, 2026 18:30
@psss
Copy link
Copy Markdown
Member

psss commented May 18, 2026

/packit build

@psss psss merged commit d5de8e1 into teemtee:main May 18, 2026
34 checks passed
@github-project-automation github-project-automation Bot moved this from review to done in planning May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci | full test Pull request is ready for the full test execution step | prepare Stuff related to the prepare step

Projects

Status: done

Development

Successfully merging this pull request may close these issues.

6 participants