Skip to content

feat(devicepolicy): enforce VS Code extension allowlist#138

Open
raysubham wants to merge 10 commits into
step-security:mainfrom
raysubham:feat/dmg-policy-profiles
Open

feat(devicepolicy): enforce VS Code extension allowlist#138
raysubham wants to merge 10 commits into
step-security:mainfrom
raysubham:feat/dmg-policy-profiles

Conversation

@raysubham

@raysubham raysubham commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the on-device policy-enforcement loop (Developer MDM, ide_extension category) for Windows, macOS, and Linux. The agent stays thin: the backend compiles the effective extensions.allowed policy, the agent converges the user-scope VS Code settings.json to it (format-preserving single-key JSONC merge), reads it back, and reports compliance — VS Code itself does the extension disabling. The agent never installs/uninstalls extensions and never touches non-VS-Code IDEs.

Two enforcement surfaces, deliberately layered:

  • settings.jsonextensions.allowed (this PR) — the agent writes it; tamper-evident; all three OSes.
  • OS-native managed policy (registry / /etc/vscode/policy.json / macOS managed preferences) — delivered by a real MDM via the backend export channel (separate path); tamper-proof; outranks the user setting. The agent probes it read-only and yields (mdm_managed) when present, rather than writing a value VS Code would ignore.

What's in here

API (internal/devicepolicy/api.go)

  • Fetch: GET /v1/:customer/developer-mdm-agent/devices/:device_id/effective-policy?category=ide_extension
  • Report: POST /v1/:customer/developer-mdm-agent/devices/:device_id/compliance
  • both on the existing agent auth channel (Bearer tenant key)
  • backend hash is persisted and echoed verbatim in reports — never recomputed — so the backend's byte-exact applied == desired check gates compliant
  • malformed responses (including a non-object policy) are rejected; on-disk enforcement is never wiped on a transient/malformed fetch

Settings writer (settings_writer.go)

  • One writer for every OS — only the path differs (%APPDATA%\Code\User, ~/Library/Application Support/Code/User, ~/.config/Code/User)
  • Manages ONLY the extensions.allowed key: format-preserving single-key JSONC merge (tailscale/hujson) — every other key, comment, and whitespace detail is preserved byte-for-byte
  • Atomic replace (temp + fsync + rename) with a capped .bak backup of the prior file (internal/atomicfile, ≤3 retained), so a botched write is always recoverable
  • Refuses to touch an unparseable / non-object settings.json (never blind-overwrites a file the user owns)
  • Value written is the compiled extensions.allowed object, compacted and verbatim — never re-serialized (re-serialization could reorder keys and break the backend's hash check)

Managed-policy probe (probe.go + probe_{windows,linux,darwin}.go)

  • Read-only presence check at the OS policy location: HKLM/HKCU registry (Windows), /etc/vscode/policy.json (Linux), /Library/Managed Preferences (macOS)
  • A real policy there outranks user settings, so the agent reports mdm_managed and does not write

Reconciler (reconcile.go)

  • Ladder: probe → read current value → idempotency → drift check → ownership-store writability preflight → merge-write + readback → persist ownership on every successful write (rollback to pre-cycle disk state if the persist fails) → verify → report
  • First-write ownership: a present-but-unrecorded value is the agent's to own — it is overwritten and converges to compliant. Genuine MDM coexistence is the probe's job, not a foreign-value park
  • Drift: when the agent has a record and the on-disk value has diverged (user hand-edit/removal), it re-applies and reports the distinct drift_detected — a one-cycle, edge-triggered signal that returns to compliant next cycle
  • Idempotent: unchanged hash + converged disk → report only, no write
  • Clear (unassignment) removes only the agent-owned key and reports nothing

Ownership state (cache.go~/.stepsecurity/device-policy-state.json)

  • Records applied_hash + written_value; drives value-based ownership and drift
  • Schema-versioned; a file from a newer agent is treated as unreadable ("owns nothing") rather than misread

States reported: compliant, pending, policy_not_applied, drift_detected, mdm_managed, write_failed, verification_failed

Wiring + gating

  • runIDEExtensionEnforce runs on the existing scheduled cycle in main.go
  • Gated behind FeatureDevicePolicy — off until GA (backend MinEnforcementAgentVersion is still a placeholder); enable via --override-gate / STEPSECURITY_OVERRIDE_GATE=1 for dogfooding

CI

  • New test-windows-devicepolicy job (windows-latest) in tests.yml runs the windows-tagged managed-policy probe tests natively — the macOS test job can only cross-compile them, so without this the Windows registry probe would ship with zero executed coverage

Test plan

  • go test -race -count=1 ./... — full suite green
  • go vet clean on darwin, linux, windows; gofmt clean; go mod tidy no drift
  • Reconciler fake-injected tests: managed-policy probe → mdm_managed, first-write ownership, drift re-apply, idempotent re-apply, preflight + rollback failure paths, readback-mismatch recovery, clear semantics
  • Settings-writer tests: single-key JSONC merge preserves comments/formatting, never clobbers an unparseable file, leaves a recoverable .bak
  • Windows-tagged managed-policy probe tests run natively in the new CI job
  • Dogfood on a Windows + macOS + Linux device via gate override before GA

Pre-GA follow-ups (not in this PR)

  • Cut a real enforcement-agent release version and update the backend's placeholder MinEnforcementAgentVersion
  • Flip FeatureDevicePolicy on

Add the developer-MDM policy enforcement loop for the ide_extension
category. The agent stays thin: the backend compiles the effective
policy; the agent writes it to the OS-native VS Code policy location,
verifies it, and reports compliance — VS Code itself does the disabling.

- fetch: GET /v1/:customer/developer-mdm-agent/devices/:device_id/
  effective-policy?category=ide_extension; backend hash persisted and
  echoed verbatim (never recomputed); non-object policy rejected
- writers: Windows registry (REG_SZ AllowedExtensions under the VS Code
  policies key) and Linux /etc/vscode/policy.json; only the
  AllowedExtensions value is managed, coexisting policies preserved;
  macOS/other get no writer (MDM-export path)
- reconcile: value-based ownership — clear/overwrite only when the
  on-disk value equals the recorded written value, any other present
  value yields mdm_managed; ownership-store writability preflight before
  any policy write, rollback to pre-cycle disk state if the post-write
  persist fails; ownership recorded on every successful write so a
  transient readback mismatch self-corrects next cycle; idempotent
  re-apply keyed on backend hash
- verify/report: compliant | policy_not_applied | vscode_unsupported |
  mdm_managed | write_failed | verification_failed; applied_hash only
  when readback-confirmed; clear path reports nothing
- main: runIDEExtensionEnforce on the existing scheduled cycle, gated
  behind FeatureDevMDMPolicies (off until GA; override for dogfooding)
- CI: windows-latest job runs the windows-tagged writer tests natively
  (the macOS test job can only cross-compile them)
@raysubham raysubham changed the title feat(devmdm): enforce VS Code extension allowlist via OS-native policy feat(devmdm): enforce VS Code extension allowlist Jun 11, 2026
…policy locations

Rework the enforcement surface (plan rev 2026-06-11): the agent now
converges the `extensions.allowed` key in the user's VS Code
settings.json — one writer for Windows, macOS, and Linux — instead of
writing OS managed-policy locations. macOS moves from export-only to
agent-enforced; real MDM-managed devices are detected and yielded to.

- settings writer: format-preserving single-key JSONC merge via
  tailscale/hujson (RFC 6902 patch) — comments, trailing commas, and
  every other key survive byte-for-byte; atomic temp+fsync+rename with
  capped sibling backups (internal/aiagents/atomicfile); an
  unparseable settings.json is never rewritten; the file is never
  deleted, Clear removes only the key; per-OS path resolution
  (%APPDATA% / ~/Library/Application Support / $XDG_CONFIG_HOME)
- managed-policy probe: read-only AllowedExtensions presence check at
  the OS policy location (HKLM registry / /etc/vscode/policy.json /
  /Library/Managed Preferences plists, machine-wide or per-user) →
  mdm_managed, skip write — a real policy outranks user settings
- ladder rework: fetch → clear-if-owned → probe → idempotency →
  ownership-store preflight → drift detection (on-disk diverged from
  recorded written value → re-apply + drift_detected) → merge-write +
  readback → persist-every-write with rollback → verify → report;
  values compared in compacted canonical form
- drop the VS Code version floor end-to-end: min_vscode_version gone
  from the fetch contract (backend removed it), version inputs gone
  from Verify, vscode_unsupported replaced by drift_detected in the
  reportable states, IDE version detection unwired from main
- delete the HKLM/policy.json writers; retarget the windows CI job at
  the settings writer + registry probe
…files

# Conflicts:
#	internal/featuregate/featuregate.go
"devicepolicy" matches the backend's vocabulary (device_id, registered
devices, per-device compliance) and names what the package manages — the
centrally assigned policy for this device — rather than borrowing the
product label. The product/feature name stays "Developer MDM" where it
refers to the product: the feature gate (FeatureDevMDMPolicies,
"developer-mdm-policies"), the developer-mdm-agent auth channel, wire
paths, and the backend sync references are all unchanged.

Mechanical sweep, no behavior change:
- internal/devmdm -> internal/devicepolicy (package clause + doc refs)
- ownership state file: developer-mdm-policy-state.json ->
  device-policy-state.json (pre-ship, nothing has written it yet)
- cmd wiring: devMDMEnforceTimeout -> devicePolicyEnforceTimeout,
  AppendError source tag "devmdm" -> "devicepolicy"
- CI: test-windows-devmdm job -> test-windows-devicepolicy
- windows probe test key: ...\DevMDMProbe -> ...\DevicePolicyProbe
@raysubham raysubham changed the title feat(devmdm): enforce VS Code extension allowlist feat(devicepolicy): enforce VS Code extension allowlist Jun 12, 2026
…nto settings_writer.go

Pure file consolidation, no behavior or API change:
- fetch.go + report.go -> api.go (and their tests -> api_test.go): the
  Fetcher and Reporter are the two halves of the same developer-mdm-agent
  HTTP channel and share config gating, auth, timeout, and error style.
- writer.go -> settings_writer.go: the Writer interface now lives beside
  its only production implementation.

go doc -all diff before/after confirms the exported surface is identical
apart from the reworded file references in comments.
Rename FeatureDevMDMPolicies -> FeatureDevicePolicy and its gate string
"developer-mdm-policies" -> "device-policy" so the gate matches the
devicepolicy package and device-policy-state.json. The gate stays
commented out (pre-GA) and the string is local-only with no
remote-config consumer, so the rename is non-breaking.
…bak backup

Pin the safety net for editing a user-owned file: a Write backs up the
prior settings.json to a sibling .bak (via atomicfile) holding the exact
pre-write bytes, and a first-ever write (nothing to preserve) makes no
backup.
atomicfile is a generic atomic-write + timestamped-.bak-backup primitive
with no aiagents coupling, and devicepolicy already imports it. Promote it
out of internal/aiagents/ to a flat top-level internal package, matching
the repo's internal/lock, internal/paths layout. Pure git-mv + import-path
rewrite across 5 sites; no behavior change.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new internal/devicepolicy subsystem that fetches an effective VS Code extension allowlist policy, converges the local VS Code configuration to match, and reports compliance—behind a feature gate and with dedicated CI coverage on Windows.

Changes:

  • Introduces internal/devicepolicy (fetch/report API client, settings.json writer, managed-policy probe, reconciler + tests).
  • Adds a shared internal/atomicfile utility for temp+fsync+rename atomic writes with backup rotation, and reuses it from AI-agent adapters.
  • Wires enforcement into the scheduled agent cycle (main.go), adds a Windows-only CI job to execute Windows-specific devicepolicy tests.

Reviewed changes

Copilot reviewed 25 out of 28 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/featuregate/featuregate.go Adds a new device-policy feature gate (left disabled by default).
internal/devicepolicy/doc.go Package-level docs for the new enforcement subsystem.
internal/devicepolicy/api.go HTTP fetcher/reporter for effective policy and compliance reporting.
internal/devicepolicy/api_test.go Tests for fetch/report behavior and malformed payload handling.
internal/devicepolicy/cache.go Ownership/applied-state persistence for enforcement cycles.
internal/devicepolicy/cache_test.go Round-trip and failure-mode tests for state persistence.
internal/devicepolicy/verify.go Pure mapping from write/readback outcomes to compliance state.
internal/devicepolicy/verify_test.go Unit tests for compliance-state mapping.
internal/devicepolicy/settings_writer.go JSONC-preserving writer for VS Code settings.json key extensions.allowed.
internal/devicepolicy/settings_writer_test.go Extensive writer tests for JSONC preservation, idempotency, backups, and error cases.
internal/devicepolicy/reconcile.go Orchestrates fetch → probe → idempotency/drift → write/verify → report.
internal/devicepolicy/reconcile_test.go Fake-driven tests covering drift/rollback/idempotency/reporting ladder.
internal/devicepolicy/probe.go Shared probe helpers for detecting managed-policy presence.
internal/devicepolicy/probe_linux.go Linux probe for /etc/vscode/policy.json.
internal/devicepolicy/probe_windows.go Windows registry probe for VS Code managed policy value presence.
internal/devicepolicy/probe_windows_test.go Native Windows tests for registry probing logic.
internal/devicepolicy/probe_darwin.go macOS managed-preferences probe via byte scan.
internal/devicepolicy/probe_other.go Stub probe for unsupported GOOSes.
internal/devicepolicy/probe_test.go Tests for shared probe helper functions.
internal/atomicfile/atomicfile.go New atomic write + backup + rotation utility.
internal/atomicfile/atomicfile_test.go Tests for atomic write behavior and backup pruning semantics.
internal/aiagents/adapter/codex/settings.go Switches adapters to use shared internal/atomicfile.
internal/aiagents/adapter/claudecode/settings.go Switches adapters to use shared internal/atomicfile.
internal/aiagents/adapter/adapter.go Updates documentation reference to internal/atomicfile.
cmd/stepsecurity-dev-machine-guard/main.go Adds scheduled enforcement runner gated by FeatureDevicePolicy.
.github/workflows/tests.yml Adds Windows CI job for devicepolicy tests.
go.mod Adds github.com/tailscale/hujson dependency.
go.sum Updates checksums for added/updated dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/devicepolicy/cache.go Outdated
Comment on lines +86 to +90
var s AppliedState
if err := json.Unmarshal(b, &s); err != nil {
return AppliedState{}, false
}
return s, true

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment on lines +651 to +659
// runIDEExtensionEnforce fetches the device's effective IDE-extension policy
// and converges the user-scope VS Code settings.json (extensions.allowed) to
// match, then reports compliance — all on the existing scheduled cycle and the
// existing agent auth channel. Windows, macOS, and Linux are all enforced this
// way; a device whose VS Code is already governed by a real MDM policy
// (registry / policy.json / managed preferences) is detected by the
// reconciler's probe and reported mdm_managed instead. Gated behind
// FeatureDevicePolicy and a silent no-op in community mode (enterprise
// config missing). Failures are logged but never crash main.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR description.

ReadAppliedState validated SchemaVersion was written but never checked it on
read, leaving the field inert. A file whose schema_version exceeds this
build's CacheSchemaVersion may reuse written_value/applied_hash with changed
meaning, so acting on it risks a wrong ownership/drift decision. Treat such a
file as unreadable (owns nothing -> reconciler re-applies, never wrongly
clears); normalize a 0/absent version to current. Makes the version field
meaningful ahead of future schema bumps.
@raysubham raysubham marked this pull request as ready for review June 18, 2026 09:53
… migration

Restructure ~/.stepsecurity/device-policy-state.json from a single AppliedState
object to a schema-versioned wrapper keyed by category:

  { "schema_version": 1, "categories": { "ide_extension": { ... } } }

More enforcement categories are coming; a single-object file would force a
guaranteed v1->v2 migration the day category step-security#2 lands. Doing this now (pre-merge,
unshipped) avoids that. Stays schema_version 1 since nothing shipped externally.
DMG-only: the wire contract is unchanged (Category still travels via
ComplianceReport; backend never sees this local file).

- cache.go: replace AppliedState with AppliedStateFile{schema_version,categories}
  + per-category AppliedCategoryState. Helpers become category-scoped read-
  modify-write (ReadAppliedState/WriteAppliedState/ClearAppliedState) that
  preserve every other category. peekSchemaVersion separates a future file
  (refuse) from a corrupt one (recreate).
- Refuse to OVERWRITE a future-schema file (errFutureSchema), so an older agent
  can never clobber a newer agent's category metadata. This rides the existing
  enforce preflight: the refused write surfaces write_failed, settings.json is
  left untouched, and the future file stays byte-identical.
- Serialize the read-modify-write with a package mutex; public funcs lock then
  call the unlocked readStateFile (the lock is not reentrant).
- reconcile.go: category-scoped seams (writeState/clearState); handleClear drops
  ownership on entry-exists (hadPrev), reclaiming an empty preflight record.
- Tests: preserve-other-categories, future-refuse on write and clear, clear-one-
  category, empty-record reclaim, absent-category, legacy single-object reads as
  owns-nothing (no migration), and an end-to-end refuse-to-clobber cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants