Skip to content

fix: remove AzureLinux 3.0 modprobe LPE blacklist (CSE-time + VHD bake-in) — kernel 6.6.139.1-1.azl3+ fixes upstream#8546

Open
djsly wants to merge 9 commits into
mainfrom
djsly/38070527-remove-azl3-lpe-mitigation
Open

fix: remove AzureLinux 3.0 modprobe LPE blacklist (CSE-time + VHD bake-in) — kernel 6.6.139.1-1.azl3+ fixes upstream#8546
djsly wants to merge 9 commits into
mainfrom
djsly/38070527-remove-azl3-lpe-mitigation

Conversation

@djsly
Copy link
Copy Markdown
Collaborator

@djsly djsly commented May 21, 2026

Summary

Three kernel LPE vulnerabilities tracked in Azure/AKS#5753 are now fixed upstream in the AzureLinux 3.0 kernel as of 6.6.139.1-1.azl3. The corresponding modprobe blacklist mitigations are no longer required on AzureLinux 3.0 and have been fully descoped on that OS — both the CSE-time runtime apply and the VHD-build bake-in.

Vulnerability CVE(s) Modules
Copy Fail CVE-2026-31431 algif_aead
DirtyFrag CVE-2026-43284, CVE-2026-43500 esp4, esp6, rxrpc
Fragnesia CVE-2026-46300 esp4, esp6 (covered by DirtyFrag)

Only AzureLinux 3.0 (regular and Kata) is descoped, because kernel 6.6.139.1-1.azl3 fixes all three CVEs upstream AND customers reported the blacklist actively blocks legitimate workloads that need those modules. Ubuntu mitigation is unchanged. AzureLinux OSGuard intentionally retains the mitigation (defense-in-depth — hardened secure-boot variant; OSGuard workloads do not require the affected modules). Mariner (AzL2) is no longer actively built — last AKSCBLMariner VHD shipped 2025-12-06 and the 6-month support window closes ~2026-06; mitigation remains baked in on every in-support Mariner VHD so the runtime apply is no longer needed.

Decision rationale

Initial iteration of this PR kept the four install <mod> /bin/false + blacklist <mod> entries baked into AzureLinux 3.0 VHDs as defense-in-depth, removing only the CSE-time runtime apply. Customer feedback during review made it clear that legitimate AzL3 workloads require some of those kernel modules (notably the esp* modules for IPsec/XFRM use cases and rxrpc for AFS-style RPC). With the upstream kernel fix shipping, keeping the blacklist baked in is no longer defense-in-depth — it is an active regression. This PR therefore removes the bake-in on AzureLinux 3.0 too.

Scope

OS Before this PR After this PR Reason
Ubuntu 22.04 / 24.04 runtime apply + bake-in unchanged (apply + bake-in) Upstream kernel fix not yet shipped
AzureLinux 2.0 (Mariner) runtime apply + bake-in runtime apply skipped; bake-in unchanged AKS stopped building Mariner on 2025-12-06; bake-in already present in every in-support Mariner VHD makes the runtime apply redundant
AzureLinux 3.0 regular + Kata (azurelinux, azurelinux-kata) runtime apply + bake-in runtime apply skipped + bake-in removed Kernel 6.6.139.1-1.azl3+ fixes all three CVEs; blacklist blocks legitimate workloads
AzureLinuxOSGuard (AzL3-OSGuard) runtime apply + bake-in unchanged (apply + bake-in) Hardened secure-boot variant — defense-in-depth retained; OSGuard workloads do not require the affected modules
Flatcar / ACL never in scope unchanged Never applied
Windows N/A N/A Never affected

What this PR does NOT do

Customers running existing in-support AzL3 VHDs will continue to have the blacklist baked in until they upgrade to a newer VHD; no CSE-time active removal is implemented in this PR.

The four-module blacklist will simply not be present on newly-built AzL3 VHDs going forward. Existing in-support AzL3 VHDs (built before this change merges) keep the baked-in /etc/modprobe.d/CIS.conf blacklist entries. We did not add a CSE-time rm or rewrite-on-boot path that would actively scrub pre-existing blacklist files on already-deployed nodes — that was considered and explicitly rejected to avoid mutating in-place security configuration on already-provisioned fleet. Affected customers will pick up the unblocked configuration when they roll their node pools to a newer AzL3 VHD.

Backward-compat analysis (6-month VHD window)

  • New CSE on old AzL3 VHD (still has bake-in): CSE-time gate (isUbuntu || isAzureLinuxOSGuard) skips the runtime apply on AzL3 regular/Kata. The pre-existing /etc/modprobe.d/CIS.conf is left in place. Net effect: the four modules remain blocked on old AzL3 VHDs until the customer rolls. Kernel-fix-only mitigation kicks in once they upgrade.
  • Old CSE on new AzL3 VHD (no bake-in): old CSE would have called disableVulnerableKernelModule for the four modules and written /etc/modprobe.d/disable-<mod>.conf drop-ins itself. Until both pieces (new CSE and new VHD) are in the field for an AzL3 customer, the runtime CSE writes will still block. Once both ship, customers requiring those modules on AzL3 see the unblocked configuration.
  • New CSE on Ubuntu: unchanged — runtime apply still runs, bake-in still present.
  • New CSE on OSGuard: unchanged — runtime apply still runs, bake-in still present (defense-in-depth retained).
  • New CSE on Mariner: runtime apply now skipped, but bake-in already present in all in-support Mariner VHDs — net behaviour identical to before. Mariner support fully sunsets ~2026-06.

Files changed

File Change
parts/linux/cloud-init/artifacts/cse_main.sh CSE-time OS gate changed from isUbuntu || isMarinerOrAzureLinux to isUbuntu || isAzureLinuxOSGuard. Drops AzL3 regular/Kata (kernel fixed) and Mariner (no longer built; bake-in covers in-support VHDs). Comment block rewritten with full rationale.
vhdbuilder/packer/packer_source.sh cpAndMode $MODPROBE_CIS_SRC … now wrapped in if isAzureLinux "$OS" "$OS_VARIANT" && [ "${OS_VERSION}" = "3.0" ] && ! isAzureLinuxOSGuard … skip → else copy. OSGuard explicitly retains the bake-in. Ubuntu, Mariner, ACL, Flatcar paths unchanged byte-for-byte.
vhdbuilder/packer/test/linux-vhd-content-test.sh testVulnerableKernelModulesDisabled takes $OS_SKU $OS_VERSION and asserts ABSENCE of both install <mod> /bin/false and blacklist <mod> directives on AzL3, presence + load-refusal on Ubuntu/Mariner. (OSGuard's OS_SKU is AzureLinuxOSGuard — distinct from AzureLinux — so it falls through to the full presence check correctly.)
e2e/validators.go ValidateVulnerableKernelModulesDisabled is now OS-conditional: AzL3 regular (NOT OSGuard, via !s.VHD.Distro.IsAzureLinuxOSGuardDistro()) asserts absence of both install and blacklist directives; Ubuntu and OSGuard fall through to the full presence + load-refusal check.
spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Modified: shebang restored to #!/usr/bin/env shellspec, plus added a new OS-gate test suite (8 cases) covering Ubuntu APPLY, OSGuard APPLY, Mariner/Kata SKIP, AzL3 regular/Kata SKIP, ACL SKIP, Flatcar SKIP. Existing 5 unit tests for disableVulnerableKernelModule() are unchanged. Total: 13/13 pass.
parts/linux/cloud-init/artifacts/modprobe-CIS.conf Unchanged — file still ships in the repo and is still baked into Ubuntu / Mariner / ACL / Flatcar / AzureLinuxOSGuard VHDs.

Test plan

  • shellspec --shell bash spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh → 13/13 pass.
  • cd e2e && go build ./... → clean.
  • bash -n syntax-check on packer_source.sh, linux-vhd-content-test.sh, cse_main.sh.
  • AzL3 VHD content test on freshly-built VHD: testVulnerableKernelModulesDisabled AzureLinux 3.0 must PASS with all four modules absent (both install and blacklist directives).
  • OSGuard VHD content test: existing presence + load-refusal assertions must continue to PASS.
  • Ubuntu / Mariner VHD content test on freshly-built VHD: existing presence + load-refusal assertions must continue to PASS.
  • E2E on AzL3 regular: validator must report all four entries ABSENT.
  • E2E on AzL3 OSGuard: validator must report presence + not-loaded + load-refused.
  • E2E on Ubuntu 22.04 / 24.04: validator must report presence + not-loaded + load-refused (unchanged).

Related

🤖 Generated with GitHub Copilot CLI

Kernel 6.6.139.1-1.azl3 and later fix Copy Fail (CVE-2026-31431),
DirtyFrag (CVE-2026-43284, CVE-2026-43500), and Fragnesia (CVE-2026-46300)
upstream, so the runtime modprobe blacklist for algif_aead/esp4/esp6/rxrpc
is no longer required on AzureLinux 3.0.

Defense-in-depth: the static modprobe-CIS.conf baked into every VHD is
left untouched, so all VHDs in the 6-month support window still drop
the install/blacklist directives at build time regardless of kernel
version.

Ubuntu 22.04/24.04 and AzureLinux 2.0 (Mariner) keep the runtime apply:
their upstream kernel does not yet ship the fix. Windows was never
affected.

Updates:
  * parts/linux/cloud-init/artifacts/cse_main.sh - gate is now
    isUbuntu || isMariner (was isUbuntu || isMarinerOrAzureLinux).
  * spec/.../cse_main_disable_modules_spec.sh - new tests asserting
    APPLY on Ubuntu/Mariner and SKIP on AzureLinux 3.0 / Kata / ACL /
    Flatcar.
  * e2e/validators.go - ValidateVulnerableKernelModulesDisabled is
    OS-conditional: full presence + load-refusal check on Ubuntu/Mariner,
    defense-in-depth modprobe.d entry presence-only check on AzureLinux.

Refs: Azure/AKS#5753

AB#38070527

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread parts/linux/cloud-init/artifacts/cse_main.sh Outdated
Comment thread spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the Linux CSE-time CVE kernel-module mitigation behavior to skip the runtime modprobe blacklist on AzureLinux 3.0 (relying on the upstream kernel fix in 6.6.139.1-1.azl3+), while keeping Ubuntu and Mariner behavior unchanged and retaining the baked-in modprobe-CIS.conf defense-in-depth approach.

Changes:

  • Gate CSE-time disableVulnerableKernelModule calls to Ubuntu + Mariner only, excluding AzureLinux 3.0.
  • Add ShellSpec coverage to assert the OS gating behavior (APPLY vs SKIP) across key OS variants.
  • Make the e2e vulnerable-module validator OS-conditional, doing a presence-only modprobe.d check on AzureLinux 3.0 and the full “present + not loaded + modprobe refused” checks elsewhere.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
parts/linux/cloud-init/artifacts/cse_main.sh Updates the OS gate so AzureLinux 3.0 skips the runtime module-disable calls while Ubuntu/Mariner still apply them.
spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Adds unit tests validating the new OS gate behavior.
e2e/validators.go Makes the vulnerable-module validation logic conditional for AzureLinux 3.0 vs other distros.

Comment thread e2e/validators.go Outdated
Customers reported that the algif_aead / esp4 / esp6 / rxrpc modprobe
blacklist baked into AzureLinux 3.0 VHDs blocks legitimate workloads.
Now that kernel 6.6.139.1-1.azl3+ fixes Copy Fail / DirtyFrag / Fragnesia
upstream, the bake-in is no longer needed on AzL3.

Changes:
- packer_source.sh: skip cpAndMode of MODPROBE_CIS on AzureLinux 3.0
  (Ubuntu and Mariner bake-in unchanged — those kernels still vulnerable).
- linux-vhd-content-test.sh: testVulnerableKernelModulesDisabled now
  asserts the four entries are ABSENT on AzL3 and present + load-refused
  on Ubuntu/Mariner.
- e2e/validators.go: ValidateVulnerableKernelModulesDisabled now asserts
  absence on AzureLinux (matching newly-built VHDs); Ubuntu/Mariner full
  presence+refusal check unchanged.
- cse_main.sh: updated AzL3 skip comment to reflect that the static
  blacklist file is no longer baked in either; existing in-support AzL3
  VHDs continue to carry the bake-in until they roll (no CSE-time active
  removal — by design).

No CSE-time active removal of pre-existing blacklist files is implemented;
customers on existing in-support AzL3 VHDs will get the unblocked
configuration on their next AzL3 VHD upgrade.

AB#38070527

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@djsly djsly changed the title fix: skip CSE-time CVE modprobe blacklist on AzureLinux 3.0 (kernel 6.6.139.1-1.azl3 has upstream fix) fix: remove AzureLinux 3.0 modprobe LPE blacklist (CSE-time + VHD bake-in) — kernel 6.6.139.1-1.azl3+ fixes upstream May 21, 2026
Addresses review feedback:
- cse_main.sh: drop unused isMariner branch from the modprobe
  blacklist gate (AKS does not build Mariner VHDs anymore).
- cse_main_disable_modules_spec.sh: update spec cases to match the
  new gate — Ubuntu APPLY; AzL3/Mariner/Kata/ACL/Flatcar SKIP.
- validators.go: refresh the top-level doc comment on
  ValidateVulnerableKernelModulesDisabled to describe the
  OS-conditional behavior accurately (Ubuntu: full presence +
  load-refusal; AzureLinux: ABSENCE of blacklist entries).

AB#38070527

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 20:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Comment thread vhdbuilder/packer/packer_source.sh Outdated
Comment thread e2e/validators.go Outdated
Comment thread e2e/validators.go
Comment thread spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Outdated
Comment thread vhdbuilder/packer/test/linux-vhd-content-test.sh
Comment thread parts/linux/cloud-init/artifacts/cse_main.sh Outdated
Address inline review comment from djsly — the shebang was incorrectly
changed from #!/usr/bin/env shellspec to #!/bin/bash in this PR. All
other spec files under spec/parts/linux/cloud-init/artifacts/ use the
shellspec shebang as a convention; revert to match.

ShellSpec ignores the file shebang when invoked via the shellspec CLI
(the shell is controlled by --shell), so this change is purely a
convention fix with no runtime impact. shellspec --shell bash on the
spec still reports 11/11.

AB#38070527

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses re-review comments from the Copilot reviewer pass on commit
aa4513e and corrects a factual error in the previous comment.

Mariner (AzureLinux 2.0) IS still actively built (packer JSONs and
recent CBLMariner release notes confirm). The previous commit's
'AKS no longer builds Mariner' claim was wrong, and the simplification
of the CSE gate to 'isUbuntu' only would have silently dropped the
mitigation on Mariner nodes whose kernel is not yet patched upstream.

Changes:

* cse_main.sh: restore the gate to apply on Ubuntu, Mariner (AzL2),
  and AzureLinux OSGuard. Only AzureLinux 3.0 (regular + Kata) is
  descoped — kernel 6.6.139.1-1.azl3+ has the upstream fix and
  customers need those modules. OSGuard explicitly stays in-scope
  as defense-in-depth (it's the hardened secure-boot variant).
  Comment block rewritten to reflect the actual scope.

* packer_source.sh: AzL3 bake-in skip now excludes OSGuard
  (! isAzureLinuxOSGuard). OSGuard is OS=azurelinux + OS_VARIANT=OSGUARD,
  so the previous OS+OS_VERSION-only check incorrectly stripped it.

* validators.go: ValidateVulnerableKernelModulesDisabled AzL3-absence
  Absence check strengthened to match both 'install <mod> /bin/false'
  and 'blacklist <mod>' so a partial removal cannot pass silently.

* linux-vhd-content-test.sh: AzL3 absence check strengthened to also
  detect 'blacklist <mod>' entries (OSGuard is correctly distinguished
  at this layer because OSGuard's OS_SKU is 'AzureLinuxOSGuard', not
  'AzureLinux', so no additional condition needed here).

* cse_main_disable_modules_spec.sh: spec updated to match the new gate
  — Ubuntu APPLY, Mariner APPLY, Mariner Kata APPLY, OSGuard APPLY,
  AzL3 regular SKIP, AzL3 Kata SKIP, ACL SKIP, Flatcar SKIP. 13 cases,
  all pass.

AB#38070527

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 20:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread parts/linux/cloud-init/artifacts/cse_main.sh
Comment thread vhdbuilder/packer/test/linux-vhd-content-test.sh Outdated
Comment thread e2e/validators.go Outdated
Per Sylvain's follow-up review on commit e1ae35c: the isMariner branch
restored in that commit is dead code — verified that AKS stopped
building Mariner (AzL2) VHDs on 2025-12-06 and the active build
pipeline (.pipelines/.vsts-vhd-builder-release.yaml) only references
buildAzureLinuxV3*, buildAzureLinuxOSGuardV3*, and buildflatcar*
parameters (no buildMariner*). The mitigation is also already baked
into modprobe-CIS.conf on every in-support Mariner VHD, so the runtime
apply was purely defense-in-depth duplicating the bake-in.

Gate is now: isUbuntu || isAzureLinuxOSGuard.

This unconditionally drops the mitigation runtime-apply on Mariner
nodes that might scale up via CRP-served CSE during the remaining
~16 days of Mariner VHD support (last build's 6-month window expires
~2026-06). That is acceptable because:
  1. The static bake-in in /etc/modprobe.d/modprobe-CIS.conf on the
     VHD itself remains in place on all in-support Mariner VHDs.
  2. Mariner support fully sunsets in ~2 weeks.

Updates:
  * cse_main.sh: gate simplified; comment rewritten with full Mariner
    rationale.
  * cse_main_disable_modules_spec.sh: Mariner / Mariner-Kata cases
    flipped from APPLY to SKIP. 13/13 still pass.

AB#38070527

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address Copilot reviewer feedback on commit 6b470f9:

* e2e/validators.go: the 'append the module name to the list below'
  comment was added before this PR introduced two separate module
  lists (AzL3-absence branch + default presence/load-refusal branch).
  Clarify that BOTH lists must be updated when adding a new CVE.

* linux-vhd-content-test.sh: same issue — testVulnerableKernelModulesDisabled
  now has two loops (AzL3-absence + default). Update the comment to
  say BOTH must be appended.

No functional changes — comment-only.

AB#38070527

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 21:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread parts/linux/cloud-init/artifacts/cse_main.sh Outdated
Comment thread parts/linux/cloud-init/artifacts/cse_main.sh Outdated
Comment thread spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Outdated
Comment thread spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 21:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Outdated
Comment thread spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Outdated
Comment thread spec/parts/linux/cloud-init/artifacts/cse_main_disable_modules_spec.sh Outdated
Comment thread parts/linux/cloud-init/artifacts/cse_main.sh Outdated
Comment on lines +337 to 344
# Mariner/AzureLinux 2.0 (AzL2) images are frozen (see FrozenCBLMarinerV2AndAzureLinuxV2SIGImageVersion=202512.06.0),
# so they cannot pick up new modprobe-CIS.conf entries for these 2026 CVEs via VHD refresh.
# Keep the CSE-time runtime apply enabled for AzL2/Mariner while those images remain supported.
# See https://github.com/Azure/AKS/issues/5753.
#
# See https://github.com/Azure/AKS/issues/5753.
if isUbuntu "$OS" || isAzureLinuxOSGuard "$OS" "$OS_VARIANT" || { isMarinerOrAzureLinux "$OS" && [ "${OS_VERSION}" = "2.0" ]; }; then
disableVulnerableKernelModule "algif_aead" "CVE-2026-31431 (Copy Fail)"
Comment thread parts/linux/cloud-init/artifacts/cse_main.sh
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 22, 2026 00:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


local failed=0

if [ "$os_sku" = "AzureLinux" ] && [ "$os_version" = "3.0" ]; then
Comment on lines +322 to +324
# Disable kernel modules with known LPE vulnerabilities (CVE-2026-31431, DirtyFrag, Fragnesia).
# Applied at CSE provisioning time on Ubuntu and AzureLinux OSGuard. To add a new CVE
# mitigation, add a disableVulnerableKernelModule call below.
# Keep the CSE-time runtime apply enabled for AzL2/Mariner while those images remain supported.
# See https://github.com/Azure/AKS/issues/5753.
#
if isUbuntu "$OS" || isAzureLinuxOSGuard "$OS" "$OS_VARIANT" || { isMarinerOrAzureLinux "$OS" && [ "${OS_VERSION}" = "2.0" ]; }; then
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants