Skip to content

Fix Azure maintain_system immutability and Y: mapping#1215

Open
jwmossmoz wants to merge 5 commits into
masterfrom
fix/azure-maintainsystem-inmutable
Open

Fix Azure maintain_system immutability and Y: mapping#1215
jwmossmoz wants to merge 5 commits into
masterfrom
fix/azure-maintainsystem-inmutable

Conversation

@jwmossmoz
Copy link
Copy Markdown
Contributor

@jwmossmoz jwmossmoz commented May 15, 2026

Summary

  • treat Puppet/OpenVox detailed exit codes like worker-images: 0/2 are success, 1/4/6 are failure
  • mark inmutable=true only after the post-bootstrap Puppet run succeeds
  • keep the host Y: mapping available on immutable boots and map Y: again in task-user-init.ps1 for the generic-worker task user
  • add Puppet timing/evaltrace summary logging for first-boot diagnostics

Why this fixes it

The traced worker did not fail because Puppet failed. Puppet exited 2, which means success with changes.

The task failed later because run-task used HG_STORE_PATH=y:/hg-shared. Before this PR, Y: was created only in the mutable boot path by the SYSTEM maintain_system task, after Puppet ran. That mapping could exist for SYSTEM, but subst mappings are session-scoped, so a new generic-worker task user did not reliably see it; on later immutable boots, the host-side LinkZY2D call was skipped too. This PR keeps the host mapping on immutable boots and creates Y: inside each task-user session before task commands run.

The post-bootstrap Puppet run still matters because it is the pass that applies the machine's final runtime state after bootstrap changes are in place. Once that run exits 0 or 2, the image can be marked immutable; if it exits 1, 4, or 6, the boot should stay mutable and fail instead of hiding a bad catalog run.

Validation

Local checks:

  • PowerShell parse check for azure-maintainsystem.ps1 and task-user-init.ps1
  • regex sanity check for Puppet evaltrace parsing
  • git diff --check
  • Invoke-ScriptAnalyzer reports existing style warnings only

Worker-images revalidation:

Direct startup-test evidence for the previous y:/hg-shared failure path:

Same-worker post-bootstrap evidence:

  • In the 24H2 integration group, vm-f9snmhsoqga9tf0txdfmrgepikayjryo0iv ran AK0sAoY1Qt-YojELHs3AkQ first, then rebooted into a new generic-worker task user and ran UUMjgX34QvmLlxcIkK9Lag on the same VM; both completed successfully.
  • Papertrail for that worker shows the initial boot Puppet apply completed with exit code 2 at 2026-05-20 17:57:46Z. After the first task resolved, later boot-time maintain_system passes at 18:04:26Z and 18:19:11Z ran the immutable path: Run-MaintainSystem, LinkZY2D :: mapped Y: to D:\\, and Start-WorkerRunner, with no second Puppet apply in that window.
  • Worker-runner reported Resolved 1 tasks in total so far at 18:03:50Z and Resolved 2 tasks in total so far at 18:18:40Z, confirming same-VM reuse across the generic-worker reboot/new-task-user path.

Current PR check note:

  • The earlier win116425h2azure serverspec failure was from a stale-branch NVIDIA A10 driver mismatch after RELOPS-2372: bump NVIDIA A10 GRID driver to 573.96 (#1218) landed on master.
  • This branch now includes that master merge, so the driver data and serverspec expectation are aligned; rerun checks should not hit that mismatch.

Original failure examples from the earlier cancelled run:

@jwmossmoz jwmossmoz force-pushed the fix/azure-maintainsystem-inmutable branch from d3de8d3 to fe72fa8 Compare May 15, 2026 20:14
@jwmossmoz jwmossmoz changed the title Fix Azure maintain_system immutability after Puppet success Fix Azure maintain_system immutability and Y: mapping May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant