Skip to content

Container-aware device identity: inject host device_id + namespace home_user per container#165

Open
sumit-badsara wants to merge 3 commits into
stagingfrom
feat/device-id-host-override
Open

Container-aware device identity: inject host device_id + namespace home_user per container#165
sumit-badsara wants to merge 3 commits into
stagingfrom
feat/device-id-host-override

Conversation

@sumit-badsara

@sumit-badsara sumit-badsara commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

What

Makes containers report a stable, non-fragmenting identity to the discovery backend — no backend change. Three commits:

  1. Inject host device_id. get_device_id() prefers an injected UNBOUND_DEVICE_SERIAL (validated via is_valid_serial()) so a container reports the host device instead of its ephemeral hostname.
  2. Log the decision branch. logger.info when the injected serial is accepted, logger.warning when present-but-invalid — so the attribution path is reconstructable from logs.
  3. Container-aware gating + per-container profiles (this update):
    • Gate the override on container detection. The injected serial is only honored when in_container() is true, so a stray UNBOUND_DEVICE_SERIAL on a native host can't mask the real auto-detected serial. in_container() combines /.dockerenv, /run/.containerenv, an overlay root mount, and a cgroup-v1 fallback (cgroup v2 shows 0::/ from inside, so the classic /proc/1/cgroup grep alone is unreliable).
    • Namespace home_user per container. All containers on a host share the one injected device_id, and the backend's report-ingest replace key is (device, tool_name, home_user). A plain home_user (ubuntu/root, fixed per image) therefore makes concurrent containers clobber each other's tool data. Inside a container, the report now sends home_user = "<container_id>_<home_user>", giving each container a distinct profile under the shared device. Native hosts are unaffected (home_user stays the real username).

Why

A container is ephemeral. get_device_id() auto-detects the host serial with a hostname fallback; inside a Linux container the Mac serial is unreadable, so it falls back to the per-launch container hostname → a new "device" every launch. Injecting the host serial collapses them onto one device. But once collapsed, multiple containers collide on (device, tool_name, home_user) and overwrite each other — so home_user is namespaced per container to keep distinct profiles without re-fragmenting the device.

container_id resolution & the stability trade-off

get_container_id() resolves to:

  1. Injected UNBOUND_CONTAINER_ID — the stable path. The host can derive this from the workspace / devcontainer name so a restarted workspace keeps a single profile (clean upsert).
  2. Container hostname — the ephemeral fallback. Unique per docker run, so the same workspace restarted under a fresh hostname creates a new profile (old ones linger). Acceptable for one-shot containers; set UNBOUND_CONTAINER_ID for long-lived / restarted workspaces.

Safety / scope

  • All behavior is container-only; native hosts auto-detect their serial and keep a plain home_user.
  • No backend changedevice_id is already the attribution key, and home_user is passed through verbatim by the ingest path.

Tests

tests/test_device_id_override.py (12 cases): override honored only in-container, ignored on native host, invalid/empty fall through; home_user namespaced in-container and untouched on host; two containers on one device get distinct profiles; in_container() / get_container_id() primitives.

Full suite: 833 passed, 1 skipped. Validated end-to-end in a real OrbStack Ubuntu container against staging (injection honored + container detected; no-injection baseline falls back to hostname).

🤖 Generated with Claude Code

Greptile Summary

This PR makes the discovery tool container-aware: get_device_id() now prefers an injected host serial (UNBOUND_DEVICE_SERIAL) only when running inside a container, and generate_single_tool_report() namespaces home_user with the container ID so concurrent containers on the same host don't clobber each other's backend profile (the ingest replace key is (device, tool_name, home_user)).

  • utils.py adds in_container() (multi-signal: dockerenv, overlay mount, cgroup-v1 markers; process-lifetime lru_cache) and get_container_id() (injected env var with hostname fallback).
  • ai_tools_discovery.py wires both helpers into get_device_id() (with info/warning logging for each branch) and into generate_single_tool_report() for home_user namespacing.
  • 12 new tests cover the override gating, namespacing, distinct-profile guarantee, and helper primitives.

Confidence Score: 5/5

Safe to merge — all new behaviour is gated behind container detection, native hosts are entirely unaffected, and the logic is well-tested.

The container-gating logic is correct and well-covered by tests. Native host code paths are unchanged. The only gap is that the home_user namespacing branch produces no log line when it fires, making post-hoc attribution debugging harder — but this does not affect correctness.

No files require special attention; the single observation is in ai_tools_discovery.py around the home_user namespacing block.

Important Files Changed

Filename Overview
scripts/coding_discovery_tools/ai_tools_discovery.py Added container-aware get_device_id() (honors UNBOUND_DEVICE_SERIAL only inside a container, with info/warning logging) and home_user namespacing in generate_single_tool_report() (prepends container ID to prevent profile collisions). The namespacing branch is silent — unlike the device_id path, it emits no log when activated.
scripts/coding_discovery_tools/utils.py Added in_container() (multi-signal: dockerenv, containerenv, overlay root mount, cgroup-v1 markers; lru_cache-d) and get_container_id() (UNBOUND_CONTAINER_ID env var with hostname fallback; lru_cache-d). Logic is well-documented and correctly handles macOS/Windows by catching all OSError.
tests/test_device_id_override.py New test file with 12 cases covering serial override (in-container vs native host, valid vs invalid vs missing), home_user namespacing (native unchanged, container namespaced, two containers get distinct profiles), and in_container()/get_container_id() primitives. Cache teardown is correctly handled in setUp/tearDown.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_device_id called] --> B{UNBOUND_DEVICE_SERIAL set?}
    B -- No --> Z[auto-detect via extractor]
    B -- Yes --> C{in_container?}
    C -- No --> D[logger.info: env var present but not in container\nauto-detect instead]
    D --> Z
    C -- Yes --> E{is_valid_serial?}
    E -- No --> F[logger.warning: invalid serial\nfalling back]
    F --> Z
    E -- Yes --> G[logger.info: using injected serial\nreturn injected value]

    H[generate_single_tool_report called] --> I{home_user set AND in_container?}
    I -- No --> J[report_home_user = home_user unchanged]
    I -- Yes --> K[container_id = get_container_id\nreport_home_user = container_id + _ + home_user]
    J --> L[build report dict]
    K --> L
Loading

Reviews (4): Last reviewed commit: "Gate device override on container detect..." | Re-trigger Greptile

A container is ephemeral: get_device_id() auto-detects via host serial with a
hostname fallback, so inside a container the Mac serial can't be read and it
falls back to the ephemeral container hostname — a new "device" every launch,
fragmenting users/devices on the backend.

get_device_id() now prefers an injected UNBOUND_DEVICE_SERIAL (set by the
container launch / socket proxy) when it passes is_valid_serial(), so all
activity from containers on a host syncs under the one host device. The env var
is only set when injected, so native hosts are unaffected and still auto-detect
their own serial. No backend change — device_id is already the attribution key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py
Sumit Badsara and others added 2 commits June 5, 2026 00:34
Address Greptile review: emit logger.info when an injected UNBOUND_DEVICE_SERIAL
is accepted and logger.warning when it is present but rejected, so the container
attribution path can be reconstructed from logs alone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…container

Two container-aware refinements on top of the UNBOUND_DEVICE_SERIAL override:

1. Gate the serial override on container detection. get_device_id() now only
   honors an injected UNBOUND_DEVICE_SERIAL when in_container() is true, so a
   stray env var on a native host can't mask the real auto-detected serial.
   in_container() combines /.dockerenv, /run/.containerenv, an overlay root
   mount, and a cgroup-v1 fallback (cgroup v2 shows "0::/" from inside, so the
   classic /proc/1/cgroup grep alone is unreliable).

2. Namespace home_user per container. All containers on a host share one
   injected device_id, and the report-ingest replace key is
   (device, tool_name, home_user) — so a plain home_user ("ubuntu"/"root",
   fixed per image) makes concurrent containers clobber each other's tool data.
   generate_single_tool_report() now sends home_user="<container_id>_<home_user>"
   inside a container, giving each container a distinct profile under the shared
   device. Native hosts are unaffected (home_user stays the real username).

container_id resolves to an injected UNBOUND_CONTAINER_ID (stable, host-derived
from the workspace so a restarted workspace keeps one profile) and falls back to
the container hostname (ephemeral — distinct per launch; see get_container_id
docstring for the trade-off). Tool-only change; no backend change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sumit-badsara sumit-badsara changed the title Report host device_id from containers via injected UNBOUND_DEVICE_SERIAL Container-aware device identity: inject host device_id + namespace home_user per container Jun 4, 2026
sumit-badsara pushed a commit that referenced this pull request Jun 4, 2026
…r flag

Implements step ③ of Approach B (user-centric device attribution). Each
device reports its OWN device_id and a plain home_user; supersedes PR #165's
device_id injection + home_user namespacing (not carried forward).

- Salvage the in_container() helper into utils.py (checks /.dockerenv,
  /run/.containerenv, overlay root mount, cgroup v1 fallback; lru_cached).
  Only this helper is brought over from feat/device-id-host-override — not
  the injection override or namespacing.
- Add is_container (bool) as a top-level field in the report payload built
  by generate_single_tool_report, set from in_container().
- Stable container device_id: when /etc/machine-id is empty/absent, fall
  back to a UUID persisted in the home-user's ~/.unbound/ (read if present,
  else generate uuid4 + write) instead of the ephemeral hostname, so a
  restarted container keeps one device row. Reuses cache._ensure_state_dir()
  for robust state-dir resolution (writable fallback) rather than a bare
  Path.home(). Machine-id-first behavior unchanged; never raises.
- Tests: is_container present/top-level in report; persisted-UUID fallback
  (generate+write, read-existing, unwritable dir, write-failure); in_container
  detection with lru_cache clearing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sumit-badsara sumit-badsara changed the base branch from main to staging June 5, 2026 15:45
@sumit-badsara sumit-badsara requested a review from a team June 5, 2026 15:45
@sumit-badsara sumit-badsara reopened this Jun 5, 2026
pugazhendhi-m added a commit that referenced this pull request Jun 10, 2026
* feat(discovery): Approach B tool change — own device_id + is_container flag

Implements step ③ of Approach B (user-centric device attribution). Each
device reports its OWN device_id and a plain home_user; supersedes PR #165's
device_id injection + home_user namespacing (not carried forward).

- Salvage the in_container() helper into utils.py (checks /.dockerenv,
  /run/.containerenv, overlay root mount, cgroup v1 fallback; lru_cached).
  Only this helper is brought over from feat/device-id-host-override — not
  the injection override or namespacing.
- Add is_container (bool) as a top-level field in the report payload built
  by generate_single_tool_report, set from in_container().
- Stable container device_id: when /etc/machine-id is empty/absent, fall
  back to a UUID persisted in the home-user's ~/.unbound/ (read if present,
  else generate uuid4 + write) instead of the ephemeral hostname, so a
  restarted container keeps one device row. Reuses cache._ensure_state_dir()
  for robust state-dir resolution (writable fallback) rather than a bare
  Path.home(). Machine-id-first behavior unchanged; never raises.
- Tests: is_container present/top-level in report; persisted-UUID fallback
  (generate+write, read-existing, unwritable dir, write-failure); in_container
  detection with lru_cache clearing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Harden persisted device-id: atomic write, UUID validation, warning logs

Addresses Greptile review on PR #170:

- Atomic write (P1): persist the device-id UUID via tempfile.mkstemp +
  os.replace in the same dir, mirroring cache.atomic_write_cache(). A
  kill mid-write can no longer leave a partial UUID on disk.
- UUID validation on read (P1): validate the persisted value with
  uuid.UUID() and treat a corrupt/truncated/non-UUID value as absent,
  regenerating a fresh UUID. Pairs with the atomic write.
- Log levels (P2): failures that cost the container its stable identity
  (no state dir, unreadable/corrupt file, failed write, outer catch) now
  log at warning; the benign absent-machine-id read stays at debug.

Extends tests/test_container_attribution.py to use a real temp dir so the
atomic write path is exercised for real, and adds corrupt-partial and
garbage-non-UUID regeneration cases plus a write-failure case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (#183)

* WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (fixes #1, #3)

Fix #1: the GitHub Copilot (VS Code) MCP extractor's _read_mcp_config called
json.loads() directly despite a docstring claiming it stripped comments, so any
hand-edited mcp.json with // or /* */ comments or a trailing comma threw
JSONDecodeError and silently yielded 0 servers. Now strips comments + trailing
commas before parsing, reusing the existing string-aware helpers.

Fix #3: removed the dead globalStorage/ms-vscode.vscode-github-copilot/mcp.json
fallback. That publisher/extension id does not exist (real ids are
GitHub.copilot / GitHub.copilot-chat) and VS Code never stores MCP config in
extension globalStorage, so the branch could never match a real install.

DRY: relocated _strip_jsonc_comments / _strip_trailing_commas into the shared
mcp_extraction_helpers.py as the single source of truth; macos/copilot_cli
re-exports them for back-compat. Applied across all three OS variants.

Fix #2 (profiles / Insiders / legacy settings.json MCP locations) is
intentionally deferred to a follow-up PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Trim verbose comments on JSONC strippers

Condense the explanatory block comments in mcp_extraction_helpers.py to
concise one-liners; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Drop ticket/task-specific references from test comments

Remove WEB-4703 / "fix #1" / "fix #3" labels from the JSONC test module
docstrings and section comments; keep the behavioral descriptions. No test
change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (#185)

* WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (fix #2, scoped A+B)

The GitHub Copilot (VS Code) MCP extractor read only the default-profile
Code/User/mcp.json. Since VS Code 1.102 MCP is per-profile, so any user on a
named profile (Code/User/profiles/<id>/mcp.json) had their servers missed; VS
Code Insiders users were also skipped even though detection already counts them
(detect_copilot._VSCODE_USER_DATA_DIRS).

Adds a shared, bounded, crash-safe enumerator enumerate_vscode_mcp_files() that
yields the default mcp.json plus sorted profiles/*/mcp.json for one Code/User
base. Each OS extractor now iterates [Code/User, Code - Insiders/User] through
it and attributes each config to its own dir (str(mcp_file.parent)), so distinct
profiles/variants surface as distinct sources.

Customer-agnostic and additive: a machine with only the default-profile mcp.json
produces byte-identical output. Reuses the JSONC strippers from #183.

Scoped to A+B. Fix C (legacy settings.json mcp key) is intentionally deferred:
1.102 auto-migrates settings.json -> mcp.json, so the remaining population is
small and shrinking, and it would add dedup/precedence complexity that profiles
+ Insiders (disjoint locations) do not need.

Predecessor: #183 (fixes #1 + #3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Log at debug when enumerate_vscode_mcp_files skips on FS error

Addresses Greptile P2: the two except blocks swallowed PermissionError/
OSError silently; log at debug to match the rest of the extraction layer
(still never raises). No behavior change to returned files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable (#186)

* WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable

On a shared-HOME macOS host where discovery runs under two uids -- classically a
root MDM agent and the login user on the same machine (the failing host is an EC2
Mac) -- discovery-cache.json is written 0600 by whichever uid runs first. The other
non-root uid then gets PermissionError [Errno 13] reading it. Surfaced as Sentry
DISCOVERY-TOOL-SCRIPT-11.

_ensure_state_dir() already falls back to a per-uid /var/tmp/unbound-{uid} dir, but
_try_state_dir() only probed the *directory's* writability, never whether an existing
discovery-cache.json was *readable*. A home whose dir is writable but whose cache file
is a foreign-owned 0600 was accepted as healthy, then read_cache() raised EACCES on it.

Probe the cache file's readability too: a candidate holding a cache file this uid
cannot read is rejected, so the resolver falls through to the per-uid temp dir -- the
same uid-namespacing utils._get_queue_file_path() already uses for the queue file.
os.access() uses the real uid, so root (which can read any file) keeps using its own
home cache unchanged, leaving root/all-users scans intact.

Scope: this fixes the cross-uid collision and the file-readability probe gap.
Downgrading the *expected* permission warning that read_cache/atomic_write_cache emit
to Sentry is tracked separately and intentionally not included here.

Fixes DISCOVERY-TOOL-SCRIPT-11

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4702: surface fallback reason in state-dir warning; condense probe comment

Address PR review: _ensure_state_dir's fallback warning now includes
last_lock_error, so a fall-back to the per-uid temp dir logs *why* the home dir
was rejected (unreadable cache vs. unwritable dir vs. OSError) before it is
cleared. Also condense the readability-probe comment to a single line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4702: trim verbose comments in the cache-fallback test

Condense the readability-fallback test's comments to match the source cleanup.
No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4707: layer Cursor permissions.json as a per-field override on the state.vscdb read (#191)

* WEB-4707: layer Cursor permissions.json as per-field override on SQLite read

Cursor added a file-based permissions.json (~/.cursor + <workspace>/.cursor)
that overrides the in-app/SQLite allowlists. Layer it on top of the existing
state.vscdb (composerState) read as a per-field override, collapsing into one
effective backend record.

- Per-field override: a file-defined field replaces the SQLite-derived value;
  fields are independent; SQLite stays the sole source for useYoloMode / run
  mode / modes4 / enabledMcpServers (no permissions.json equivalent).
- User + workspace arrays concatenate within a field (autoRun nested arrays
  merged independently).
- JSONC-tolerant parse; any read/parse failure falls back to SQLite-only.
- File-absent path is byte-identical (_parse_composer_state untouched; override
  is a post-processing layer in _apply_permissions_json_override).

mcpAllowlist -> mcp_tool_allowlist; terminalAllowlist -> Bash(cmd*) prefix rules
(distinct from the SQLite Bash(cmd *) command rules); autoRun -> raw_settings
verbatim for the permissions classifier. No gateway-data / unbound-fe changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4707: address elite-PR-review findings

- C1 (critical): scope macOS workspace walk to user_home (was rooting at /),
  matching the Linux/Windows siblings. Prevents cross-user workspace
  permission leakage under root/MDM and avoids a full-filesystem walk.
- W3: preserve unknown autoRun sub-keys (shallow-merge across files,
  last-file-wins) while still concatenating the documented
  allow_instructions/block_instructions arrays.
- I4: skip symlinked intermediate dirs in the Windows walk (junction-loop
  hardening; a .cursor dir that is itself a symlink still matches).
- Tests: M5/T5 pin the intentional empty-array override-to-empty; A7 pins
  autoRun unknown-subkey preservation. _parse_composer_state still untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4707: close test-coverage gaps from 2nd-pass review

Test-only additions (no production change):
- D2: _dedupe_preserve_order unhashable (dict) fallback branch
- T6: terminalAllowlist drops non-string/empty entries (no Bash(*) , no raise)
- W3/W4: Windows _walk_for_permissions real-walk coverage (workspace file
  applied; global ~/.cursor not double-counted) — the one new path not
  backed by a pre-tested shared helper

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4707: log info when permissions.json override is applied

Addresses greptile P2 (success-path diagnosability): emit an info log naming
which fields (mcp/terminal/autorun) the override replaced. Placed after the
no-file early-return, so the byte-identical file-absent path stays silent and
unchanged. Stdlib logging only (no new deps).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4718: use raw docstrings to silence invalid-escape SyntaxWarnings (#190)

* fix(discovery): detect home-rooted project-scope .mcp.json files (#166)

* fix(discovery): detect home-rooted project-scope .mcp.json files

The Claude Code project-scope MCP walk skipped a `.mcp.json` located
directly in a user's home directory (project root == home, e.g.
C:\Users\<u>\.mcp.json, /Users/<u>/.mcp.json, /home/<u>/.mcp.json).
`is_home_dotdir_descendant` — intended to skip the *contents of* hidden
home tool dirs (~/.cursor/, ~/.codex/) — also matched a home-rooted leaf
`.mcp.json`, so its servers were never reported. On a real device a
user-scope server in ~/.claude.json was detected while project-scope
servers in ~/.mcp.json (e.g. policycenter, playwright) were missed.

Fix: the walk's file branch now tests the file's parent directory
(`is_home_dotdir_descendant(entry.parent)`). A home-rooted `.mcp.json` is
read, while a `.mcp.json` inside a hidden home tool dir (~/.cursor/.mcp.json)
stays skipped. The directory-recursion branch and the sibling
directory-based walks (generic + skills) are intentionally unchanged.

Purely additive: for any path with >=5 segments old and new are identical;
only the 4-segment home-rooted leaf flips skip->read. No path flips
read->skip, so no previously-detected config can disappear.

Add tests/test_mcp_home_rooted_config.py: predicate regression tests
(POSIX + Windows-drive shapes), a walk-level test asserting the file branch
consults entry.parent (not the file), and a smoke test for normal projects.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(discovery): union MCP servers on path collision + cover Copilot CLI

Addresses principal-engineer review of the home-rooted .mcp.json fix.

H1: un-skipping a home-rooted ~/.mcp.json makes it emit a project entry
whose path == the home dir, which can collide with the ~/.claude.json
projects[<home>] (local-scope) entry. The merge functions overwrote
mcpServers by path (last-writer-wins), silently dropping one source's
servers. Add AIToolsDetector._union_mcp_servers (dedupe by name,
first/higher-precedence wins) and apply it at all three merge sites:
_merge_claude_mcp_configs_into_projects, _merge_mcp_configs_into_projects,
and the Copilot CLI inline merge. Also closes the pre-existing sub-folder
collision and stops additionalMcpData from clobbering an existing entry.

H2: Copilot CLI's Workspace .mcp.json uses the same project-scope walk, so
the home-rooted fix applies to it too; its inline merge now unions as well.

Tests: union helper, two-source home collision through the real Claude
merge, default-merge union, single-source-unchanged, Copilot shared-walk.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: detect built-in VS Code Copilot (all OS) + macOS Codex rules crash — direct to main (#169)

* fix(macos/github-copilot): detect built-in VS Code Copilot so its MCPs surface

VS Code now ships GitHub Copilot / Copilot Chat as BUILT-IN extensions in the
app bundle, which never appear in the per-user ~/.vscode/extensions/extensions.json
the detector reads. So users on built-in Copilot were never detected, and their
VS Code MCP servers (Code/User/mcp.json) were silently skipped.

_detect_vscode_for_user now falls back to scanning the VS Code app bundle's
built-in `copilot`/`copilot-chat` extension when no marketplace extension is
present, reading the version from package.json. Gated on the user actually
having a Code/User data dir so a machine-wide install isn't attributed to
unrelated users in a root scan. Detected "GitHub Copilot (VS Code)" then routes
through the existing MCP extractor, surfacing Code/User/mcp.json servers.

Verified on a real macOS box: built-in Copilot 0.51.0 detected -> 3 user MCP
servers (github-mcp-server, context7, playwright-mcp) extracted.

Also adds a regression test for the Codex should_process_directory('/' branch)
arg-count crash (the code fix already landed on staging; this guards it).

Full suite 835 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* review: log built-in Copilot detection outcomes + document single-result return

Addresses PR #169 review (4/5):
- _detect_vscode_builtin_copilot now logs all three outcomes (no VS Code data
  dir, built-in found w/ version+path, VS Code-but-no-built-in) at debug, so an
  admin scan resolving from a non-obvious app bundle is reconstructable.
- Documents the intentional at-most-one-result contract: one detection suffices
  to trigger downstream rules/MCP extraction; built-in Copilot bundles chat in
  the same `copilot` extension, so a second `copilot-chat` row would only
  double-process and duplicate MCP servers (unlike the marketplace path where
  the two are separate installs).

No behavior change. Detection tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(win/linux github-copilot): detect built-in VS Code Copilot (parity with macOS)

Extends the macOS built-in Copilot detection (this PR) to Windows and Linux, so
users on built-in VS Code Copilot — and their VS Code MCP servers
(%APPDATA%\Code\User\mcp.json / ~/.config/Code/User/mcp.json) — are no longer
missed on those OSes either.

- linux/github_copilot: _detect_vscode_for_user falls back to the VS Code
  install tree (deb/rpm, /opt, snap; stable + Insiders) when no marketplace
  extension is present, gated on a ~/.config/Code/User data dir.
- windows/github_copilot: same fallback over per-user (LocalAppData\Programs)
  and system (Program Files) installs, gated on %APPDATA%\Code\User. Also fixes
  an early-return that skipped detection entirely when .vscode\extensions was
  absent (the exact built-in-only case).
- Both log detection outcomes at debug and return at most one entry (a 2nd
  copilot-chat row would only duplicate the same MCP servers).

Adds Linux + Windows detection tests. Full suite 839 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* harden: guard non-dict package.json in built-in Copilot version read (audit P2)

A bundled copilot/package.json that parses as valid JSON but isn't an object
(array/string/number) made data.get("version") raise AttributeError, which
escapes the per-user loop and aborts detect_copilot() mid-iteration — silently
dropping ALL Copilot results (marketplace + built-in, every user) for the run.

Add an isinstance(data, dict) guard in _read_extension_version (macOS/Linux) and
the Windows built-in version read. Regression test with a non-dict package.json.

Full suite 840 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(macos/codex): pass root_path to should_process_directory (silent 0-rules crash)

main lacks the staging-only fix (#163): should_process_directory(dir_path) was
called with one arg but the helper requires (directory, root_path), raising a
TypeError on every '/' scan that was swallowed into 0 Codex project rules.
Included here so this main-targeted release carries both the fix and its
regression test (test_codex_rules_extraction.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(codex): exercise the real should_process_directory (not a stub)

Audit note: the regression test stubbed should_process_directory and only
asserted call arity, so it never exercised the real TypeError. Call THROUGH to
the real helper instead — a future signature/arity regression now raises the
actual production TypeError in the test (verified: fails with "missing 1
required positional argument: 'root_path'" when reverted to the one-arg call),
while still pinning the (directory, root_path) contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(github-copilot vscode): prove built-in fallback is purely additive

Add per-OS regression tests asserting that when a MARKETPLACE Copilot extension
is present, _detect_vscode_for_user returns it and the new built-in fallback is
never invoked (spy.assert_not_called) — locking in that existing detection
behavior is unchanged and the built-in path only runs when nothing was found.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(codex): make '/'-branch test cross-platform (fix Windows CI)

The codex regression test drove the macOS extractor's root_path==Path('/')
branch and asserted AGENTS.md was found. On Windows CI a C:\ temp path can't be
made relative to '/' (the walk's relative_to raises ValueError, skipping items),
so the file-finding assertion failed there — a test-only POSIX assumption, not a
product bug (the macOS extractor's '/' scan is POSIX-only).

- Keep the cross-platform contract check (should_process_directory called with
  (directory, root_path)); guard the AGENTS.md assertion to non-Windows.
- Add a cross-platform test exercising the REAL helper: one-arg call raises
  TypeError (the bug), two-arg returns bool — no '/' walk involved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(github-copilot vscode): label built-in as "GitHub Copilot Chat (VS Code)" (#172)

* fix(github-copilot vscode): label built-in as "GitHub Copilot Chat (VS Code)"

Follow-up to the built-in detection (#169). VS Code consolidates Copilot into the
`copilot` extension folder, whose manifest is name="copilot-chat",
displayName="GitHub Copilot Chat" ("AI chat features powered by Copilot") — it's
the Copilot Chat extension, and the one that consumes mcp.json. The built-in
detection was hardcoding "GitHub Copilot (VS Code)", mislabeling Chat as the
inline-completions product.

Derive the name from the bundle's package.json: name contains "copilot-chat"
(or displayName contains "chat") -> "GitHub Copilot Chat (VS Code)" (matching the
marketplace github.copilot-chat mapping); a plain "copilot" stays
"GitHub Copilot (VS Code)". Marketplace paths unchanged. Per-OS tests updated +
a plain-copilot generic case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* review: tighten chat heuristic + add Linux/Windows plain-copilot tests

Addresses Greptile (4/5) on #172:
- Narrow the displayName check from "chat" to "copilot chat" in all 3 detectors,
  so a hypothetical future variant like "GitHub Copilot (chat enabled)" isn't
  mislabeled as Chat. Still matches the real displayName "GitHub Copilot Chat".
- Add the plain-copilot generic-label test to the Linux and Windows classes
  (was macOS-only), so all 3 platforms cover both the chat and plain branches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(github-copilot): attach shared ~/.copilot skills to the VS Code Copilot surface (#173)

* feat(github-copilot): attach shared ~/.copilot skills to the VS Code Copilot surface

An IDE-only GitHub Copilot user's shared ~/.copilot/skills were read by
nobody: the CLI skills extractor owns ~/.copilot but the CLI isn't detected
(skills/ is a SHARED marker excluded from CLI detection by #164), and the IDE
Copilot branch read nothing from ~/.copilot + has no skills extractor. This
enriches the detected VS Code Copilot row with the shared skills it actually
consumes (VS Code Agent Skills docs) — EXTRACTION-ONLY; detection and the #164
markers are untouched, so it cannot re-introduce the CLI false positive.

- S6: memoized _get_copilot_cli_skills() — one filesystem walk per scan,
  shared by the CLI + VS Code branches; CLI output byte-identical.
- S2: _set_canonical_vscode_copilot() picks ONE VS Code row (prefer Chat),
  computed from the full detected list; only that row carries skills.
- S5: user-scope skills keyed by each skill's own owner-home (from file_path)
  so multi-user/MDM scans don't leak skills across users.
- JetBrains excluded; Linux a graceful no-op (no CLI skills extractor).
- copilot-instructions.md + mcp-config.json stay CLI-only; ~/.copilot
  instructions deferred to a follow-up.

Scope: skills only. detect_copilot.py / copilot_cli.py / backend untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot): make skill owner-home derivation OS-independent

_copilot_skill_owner_home ran the skill file_path through pathlib and
returned str(parent.parent). On a Windows interpreter, pathlib re-emits a
POSIX-style path with backslashes (str(WindowsPath('/Users/a')) ->
'\Users\a'), so the projects_dict key no longer matched the raw
forward-slash keys used everywhere else — failing the per-user keying
assertions on Windows CI (test_github_copilot_vscode_skills, 3 cases).

Derive the owner home by string-slicing at the .copilot/.agents marker
instead, preserving the input's separator style on any interpreter.
macOS and native-Windows output are byte-identical to before; only the
cross-OS (POSIX-path-on-Windows) case is corrected. Detection, the CLI
branch, copilot_cli.py and detect_copilot.py are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot): address PR review — surface swallowed skills failure + dedup user skills

Two greptile review findings on the VS Code Copilot skills enrichment:

- P2: _get_copilot_cli_skills swallowed extraction failures at DEBUG, but the
  accessor is memoized so the CLI branch's existing WARNING never fired for a
  walk failure — invisible at the prod INFO floor. Upgraded to WARNING, matching
  the sibling skills-extraction error log.
- P1: user-scope skills were appended without dedup while project-scope skills
  10 lines above call _deduplicate_project_items. Dedup the owner-home bucket
  to match (by file_path). IDE branch only; CLI branch output unchanged.

Adds test_duplicate_user_skills_deduped_on_owner_home. The third finding
(per-sub-flow observability metric) was declined on the PR — inconsistent with
the sibling rules/MCP/permissions sub-steps, out of scope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot-cli): demote hooks/ to SHARED — Unbound's own MDM hook caused fleet-wide phantom CLI (#174)

* fix(copilot-cli): demote hooks/ to a SHARED marker (Unbound's own MDM hook creates it)

#164 split ~/.copilot markers into STRONG (CLI-exclusive) vs SHARED, but kept
`hooks/` as STRONG. It is NOT CLI-exclusive: Unbound's OWN MDM onboarding
(websentry-ai/setup copilot/hooks/mdm/setup.py) runs for EVERY onboarded device
and does `(~/.copilot/hooks).mkdir(parents=True)` then writes unbound.json +
unbound.py — creating ~/.copilot/hooks/ from scratch on machines that never had
the CLI. So `hooks/` alone triggered a phantom "GitHub Copilot CLI" install
fleet-wide.

Confirmed in prod: device D2FJV74J5Q / user gowshik — ~/.copilot held only
hooks/unbound.json, `copilot --version` = not installed, no config/permissions.

Fix mirrors #164's ide/ demotion: move `hooks` STRONG->SHARED so it can never
alone declare a CLI install (it still enriches a real install). False-negative-
safe: a genuine CLI always also has a strong marker (config.json /
session-store.db / logs/). Detection of real installs is unchanged.

Adds regression tests: hooks/-only ~/.copilot is suppressed (gowshik repro), and
a real install with hooks/ + a strong marker is still detected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* address greptile review: complete the SHARED-marker docstring + Windows hooks guard

Two non-blocking gaps flagged on the PR:

1. `_copilot_dir_has_shared_artifact` docstring enumerated a stale shared-marker
   list that omitted both `ide/` (since #164) and the newly demoted `hooks/`, and
   wrongly attributed all shared markers to the IDE. Rewrote it to list all six
   (skills/agents/instructions/copilot-instructions.md/ide/hooks) with their true
   origins: IDE-read, IDE-written (ide/), and Unbound-MDM-written (hooks/).

2. The Windows detection test class had no hooks-specific assertion. Added
   `test_hooks_dir_alone_not_detected` to the existing TestWindowsCopilotCliDetection
   so the SHARED demotion is guarded on Windows too — catches a future regression
   if the MacOSCopilotCliDetector inheritance is ever broken.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot-cli): stop per-user over-attribution + extractor over-collection (#175)

Two independent extraction/attribution bugs found alongside the Copilot CLI
phantom-detection work (#164/#174). Detection and marker sets are untouched.

FIX #2 — per-user over-attribution. The CLI's install_path is a per-user
~/.copilot owned by exactly one user, but main()'s per-user loop re-emits every
detected tool for every OS user. filter_tool_projects_by_user scopes a tool's
projects/permissions to the user but never rewrites install_path, so a second
user (e.g. gowshik_2) who never had ~/.copilot still got a phantom "GitHub
Copilot CLI" row pointing at gowshik's home with 0 projects. Add an ownership
gate at the per-user emit site (CLI only): emit iff the user owns the detected
install OR the filter produced data for them. Extract the path-normaliser to a
module-level _normalise_path shared with filter_tool_projects_by_user (DRY).
Scoped to the CLI — IDE tools legitimately share a machine-wide install_path.

FIX #3 — extractor over-collection. The rules/skills project walks descended
into OTHER tools' per-user config dirs and their installed-extension packages
(e.g. ~/.antigravity/extensions/<pkg>/.github/instructions/*), mis-attributing
those bundled files to Copilot CLI. Add traverses_other_tool_config_dir() and
skip those dirs in both the rules walk (macOS + Windows _should_skip) and the
skills walk — while still allowing the shared .claude/.agents skill dirs a real
repo root legitimately carries (Agent Skills convention). Also dedupe repo-root
rule files by realpath + content so an AGENTS.md symlinked to / copied as
CLAUDE.md emits once. NOTE: CLAUDE.md/GEMINI.md remain collected — the official
GitHub Copilot CLI custom-instructions reference confirms the CLI reads them at
the repo root; only the symlink/copy double-count is removed.

Tests: tests/test_copilot_cli_per_user_attribution.py (12) and
tests/test_copilot_cli_overcollection.py (10). Full suite 902 green.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot): per-user version probe + documented VS Code instructions/prompts dirs (H2/H4/H5) (#176)

* fix(copilot): per-user version probe + documented VS Code instructions/prompts dirs

Three doc-verified GitHub Copilot discovery gaps (H2, H4, H5). H3 from the same
audit was REFUTED against official docs and is intentionally excluded — see PR
body. Detection/markers and MCP extraction are untouched.

H2 (CLI) — version="unknown" on root/MDM scans. get_version() probed a bare
`copilot` on the scanner's PATH; root's PATH lacks the per-user install, so the
version always read "unknown" on MDM all-users scans (an existing TODO flagged
this). Resolve the per-user binary from self.user_home first — ~/.local/bin,
~/.bun/bin, ~/.nvm/versions/node/*/bin (macOS, X_OK-checked like
find_claude_binary_for_user); AppData/Roaming/npm/copilot.cmd, .local/bin and
.bun/bin .exe (Windows, shell=True for the .cmd shim) — then fall back to the
bare-PATH probe (zero regression for the running-user case). Best-effort; still
degrades to "unknown".

H4 (VS Code) — wrong instructions dir. Read the documented
.github/instructions/**/*.instructions.md (recursive, depth-gated) instead of
the undocumented, over-broad .github/copilot/*.md. Removing the legacy path is a
deliberate collection-scope reduction (consistent with #175's anti-over-collection
direction); pinned by a negative test.

H5 (VS Code) — prompt files never collected. Collect *.prompt.md from project
.github/prompts/ and the user Code/User/prompts/ dir (already opened but globbed
only *.instructions.md). Prompt files are emitted as ordinary project/user-scoped
rule dicts with NO extra fields, because the backend silently discards any rule
carrying a non-allowlisted key — locked down by test_prompt_rule_has_only_allowed_fields.

find_github_copilot_project_root generalized to walk to the nearest .github
ancestor (serves nested instructions + prompts) without regressing the
prompts/intellij/AGENTS.md branches. Windows mirrors macOS.

Tests: new tests/test_github_copilot_instructions_prompts.py + extended
test_copilot_cli_discovery.py and test_scanning_enhancements.py. Full suite 916
pass; the 1 failing test (test_main_cli_with_queue_drain) is a pre-existing
environmental flake that also fails on clean main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(copilot): gate POSIX-only version stub tests to fix Windows CI

test_version_resolved_from_local_bin_stub / _from_nvm_stub create a #!/bin/sh
executable stub and assert the macOS detector probes it. Windows can't exec a
shebang script, so get_version() returned None there (Windows CI red). The
exec path is inherently POSIX; the Windows per-user-binary path is already
covered portably by test_version_probed_from_per_user_npm_shim (which mocks
subprocess.run). Gate the two stub-exec tests with skipUnless(os.name=="posix").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot): resolve newest nvm Node version for CLI binary (greptile P2)

_resolve_copilot_binary iterated nvm version dirs in arbitrary glob order, so a
user with multiple nvm-managed Node versions (each with a copilot install) could
resolve a stale one. Sort by NUMERIC (major,minor,patch) parsed from the dir
name, newest first — note a plain string sort (greptile's suggestion) orders
"v9" after "v10"; the numeric key fixes that. Also makes a re-scan deterministic.
Adds a POSIX-gated test (macOS resolver) asserting v18 wins over v10/v9.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add Sync Staging with Main workflow (#179)

Keeps staging in sync with main after each release to main.
Triggers on push to main (and workflow_dispatch).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot-vscode): read .claude/rules + user ~/.copilot|.claude instructions; scope MCP per IDE surface (#178)

* fix(copilot-vscode): read .claude/rules + ~/.copilot|.claude user instructions; scope MCP per IDE surface

Two related GitHub Copilot VS Code Chat discovery fixes (macOS + Windows;
Linux tracked separately). Verified against the official VS Code Copilot
custom-instructions docs.

(1) Rules completeness — the VS Code rules extractor now reads the three
documented "Default file location" custom-instruction sources it was missing:
  - workspace .claude/rules/**/*.md  (Claude format)
  - user ~/.copilot/instructions/**/*.instructions.md
  - user ~/.claude/rules/**/*.md
find_github_copilot_project_root now resolves the nearest .github/.claude/.copilot
ancestor (keys these to the right repo root / user home) without regressing the
prompts/intellij/AGENTS.md branches. New _extract_claude_rules helper +
add_user_rules refactor. Guards: skip the user-home ~/.claude (collected as user
scope, no double-count since add_rule_to_project doesn't dedupe) and skip
other-tool config dirs / installed extension packages (traverses_other_tool_config_dir).

(2) MCP identity scoping — extract_mcp_config was identity-blind: it unioned
VS Code global + JetBrains global + workspace .vscode/mcp.json and returned that
to EVERY Copilot row, so on a machine with both IDE Copilots each row showed the
other's servers. Now gated by tool_name: a VS Code row gets VS Code global +
workspace; a JetBrains row gets JetBrains global only; tool_name=None keeps the
legacy union (back-compat). Pure narrowing — single-IDE users unaffected. Call
site passes the detected row name. Mirrors the already identity-aware rules
extractor.

Tests: all regression coverage lives in the one focused file
tests/test_github_copilot_instructions_prompts.py (.claude/rules project +
allowlist guard + extension-dir-skipped + user-home-not-double-collected; user
~/.copilot/instructions + ~/.claude/rules; MCP identity scoping). Full suite
green; the lone failure (test_main_cli_with_queue_drain) is a pre-existing
environmental flake.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(copilot-vscode): Linux MCP tool_name param (regression) + user-rules depth guard

Follow-up to the review of this PR.

- CRITICAL: the call site passes extract_mcp_config(tool_name=...) for all OSes,
  but the Linux GitHub Copilot MCP extractor's signature was still
  extract_mcp_config(self) -> TypeError (swallowed) -> Linux Copilot MCP servers
  returned empty (a regression vs main). Linux now accepts tool_name and applies
  the same VS Code / JetBrains surface gating as macOS/Windows.
- add_user_rules now depth-gates its globs (~/.copilot/instructions/**,
  ~/.claude/rules/**) with MAX_SEARCH_DEPTH, matching _extract_claude_rules
  (macOS + Windows).
- Windows user-rules debug log relabeled "Found VS Code Copilot user rule" (it
  now also covers ~/.copilot/instructions and ~/.claude/rules).

Note: the MCP JetBrains gate stays the simple `not is_vscode` (correct for every
github_copilot surface the detector emits today); a speculative positive
predicate was considered and dropped as over-engineering per principal review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (#183) (#184)

* WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (fixes #1, #3)

Fix #1: the GitHub Copilot (VS Code) MCP extractor's _read_mcp_config called
json.loads() directly despite a docstring claiming it stripped comments, so any
hand-edited mcp.json with // or /* */ comments or a trailing comma threw
JSONDecodeError and silently yielded 0 servers. Now strips comments + trailing
commas before parsing, reusing the existing string-aware helpers.

Fix #3: removed the dead globalStorage/ms-vscode.vscode-github-copilot/mcp.json
fallback. That publisher/extension id does not exist (real ids are
GitHub.copilot / GitHub.copilot-chat) and VS Code never stores MCP config in
extension globalStorage, so the branch could never match a real install.

DRY: relocated _strip_jsonc_comments / _strip_trailing_commas into the shared
mcp_extraction_helpers.py as the single source of truth; macos/copilot_cli
re-exports them for back-compat. Applied across all three OS variants.

Fix #2 (profiles / Insiders / legacy settings.json MCP locations) is
intentionally deferred to a follow-up PR.



* Trim verbose comments on JSONC strippers

Condense the explanatory block comments in mcp_extraction_helpers.py to
concise one-liners; no code change.



* Drop ticket/task-specific references from test comments

Remove WEB-4703 / "fix #1" / "fix #3" labels from the JSONC test module
docstrings and section comments; keep the behavioral descriptions. No test
change.



---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (#185) (#188)

* WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (fix #2, scoped A+B)

The GitHub Copilot (VS Code) MCP extractor read only the default-profile
Code/User/mcp.json. Since VS Code 1.102 MCP is per-profile, so any user on a
named profile (Code/User/profiles/<id>/mcp.json) had their servers missed; VS
Code Insiders users were also skipped even though detection already counts them
(detect_copilot._VSCODE_USER_DATA_DIRS).

Adds a shared, bounded, crash-safe enumerator enumerate_vscode_mcp_files() that
yields the default mcp.json plus sorted profiles/*/mcp.json for one Code/User
base. Each OS extractor now iterates [Code/User, Code - Insiders/User] through
it and attributes each config to its own dir (str(mcp_file.parent)), so distinct
profiles/variants surface as distinct sources.

Customer-agnostic and additive: a machine with only the default-profile mcp.json
produces byte-identical output. Reuses the JSONC strippers from #183.

Scoped to A+B. Fix C (legacy settings.json mcp key) is intentionally deferred:
1.102 auto-migrates settings.json -> mcp.json, so the remaining population is
small and shrinking, and it would add dedup/precedence complexity that profiles
+ Insiders (disjoint locations) do not need.

Predecessor: #183 (fixes #1 + #3).



* Log at debug when enumerate_vscode_mcp_files skips on FS error

Addresses Greptile P2: the two except blocks swallowed PermissionError/
OSError silently; log at debug to match the rest of the extraction layer
(still never raises). No behavior change to returned files.



---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable (#186) (#189)

* WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable

On a shared-HOME macOS host where discovery runs under two uids -- classically a
root MDM agent and the login user on the same machine (the failing host is an EC2
Mac) -- discovery-cache.json is written 0600 by whichever uid runs first. The other
non-root uid then gets PermissionError [Errno 13] reading it. Surfaced as Sentry
DISCOVERY-TOOL-SCRIPT-11.

_ensure_state_dir() already falls back to a per-uid /var/tmp/unbound-{uid} dir, but
_try_state_dir() only probed the *directory's* writability, never whether an existing
discovery-cache.json was *readable*. A home whose dir is writable but whose cache file
is a foreign-owned 0600 was accepted as healthy, then read_cache() raised EACCES on it.

Probe the cache file's readability too: a candidate holding a cache file this uid
cannot read is rejected, so the resolver falls through to the per-uid temp dir -- the
same uid-namespacing utils._get_queue_file_path() already uses for the queue file.
os.access() uses the real uid, so root (which can read any file) keeps using its own
home cache unchanged, leaving root/all-users scans intact.

Scope: this fixes the cross-uid collision and the file-readability probe gap.
Downgrading the *expected* permission warning that read_cache/atomic_write_cache emit
to Sentry is tracked separately and intentionally not included here.

Fixes DISCOVERY-TOOL-SCRIPT-11



* WEB-4702: surface fallback reason in state-dir warning; condense probe comment

Address PR review: _ensure_state_dir's fallback warning now includes
last_lock_error, so a fall-back to the per-uid temp dir logs *why* the home dir
was rejected (unreadable cache vs. unwritable dir vs. OSError) before it is
cleared. Also condense the readability-probe comment to a single line.



* WEB-4702: trim verbose comments in the cache-fallback test

Condense the readability-fallback test's comments to match the source cleanup.
No behavior change.



---------


(cherry picked from commit 87114e80bc9ffae25c6644ebb654eda8e9c2f9a8)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* WEB-4718: use raw docstrings to silence invalid-escape SyntaxWarnings

Windows-path docstrings (AppData\Roaming, %APPDATA%\JetBrains,
%USERPROFILE%\.claude, etc.) used non-raw string literals, so \R \J \.
\) parsed as invalid escape sequences. Python 3.12+ surfaces these as
visible SyntaxWarnings at compile time; because install.sh git-clones a
fresh copy per run (no cached .pyc) they print on every discovery run.
The factory imports all platform modules unconditionally, so they fire
on macOS/Linux too, not just Windows.

Prefix the 11 affected docstrings with r"""; __doc__ content is
byte-identical (these paths held no intended escape sequences), so the
change is cosmetic with no behavioral effect.

Verify: python3.12 -W error::SyntaxWarning -m compileall scripts/

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Aakash Velusamy <aakashvpsgtech@gmail.com>
Co-authored-by: pugazhendhi-m <132246623+pugazhendhi-m@users.noreply.github.com>
Co-authored-by: Pugazhendhi <pugazhendhi@unboundsecurity.ai>
Co-authored-by: MohamedAklamaash <aklamaash78@gmail.com>
Co-authored-by: vishnu <vishnuvinod072@gmail.com>
Co-authored-by: Vishnu <79318686+zeus-12@users.noreply.github.com>
Co-authored-by: Mohamed Aklamaash M.R <111295679+MohamedAklamaash@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Vignesh Subbiah <51325334+vigneshsubbiah16@users.noreply.github.com>

* [WEB-4725] Isolate the test retry-queue + CLI home so QUEUED-DEVICE can't leak to prod (#192)

* Isolate the test retry-queue + CLI home so QUEUED-DEVICE can't leak to prod

A phantom device "QUEUED-DEVICE" reached the production discovery dashboard.
Root cause: integration tests wrote report fixtures to the REAL per-UID retry
queue (/var/tmp/ai-discovery-queue-<uid>.json) via the import-time-cached
utils.QUEUE_FILE constant. An interrupted test left the fixture on disk, and a
later real agent run on the same machine drained it and POSTed it to prod. The
string exists only in the test suite.

Fix (harness-level test isolation; no behavior change for real runs):
- utils._get_queue_file_path() honors an AI_DISCOVERY_QUEUE_FILE env override
  (strip + expanduser/expandvars, matching the sibling _resolve_copilot_dir
  convention). Remove the import-time QUEUE_FILE constant so save/load/cleanup
  resolve the path per-call, including the value-imported cleanup in
  ai_tools_discovery.py.
- tests/__init__.py points AI_DISCOVERY_QUEUE_FILE at a throwaway temp file for
  the whole session. It lives in __init__.py, not conftest.py, so it arms under
  BOTH pytest and the `unittest discover` runner used in CI.
- test_discovery_flow.py: the CLI subprocess gets an isolated HOME so its
  ~/.unbound state + single-flight lock go to a throwaway dir (stops tests
  touching real home state and the lock-contention flakiness). Migrate the two
  queue tests off the old QUEUE_FILE monkeypatch; add a regression test that the
  env override is honored and the real /var/tmp path is never used.
- Remove the now-redundant tests/conftest.py.

Deferred as separate follow-ups (intentionally out of scope): backend device_id
sentinel denylist, DeviceDiscovery pruning, and the one-off delete of the
existing QUEUED-DEVICE row (this change is preventative, not retroactive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Trim verbose test comments to concise necessary ones

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Address PR review: clean up test temp dir + mkdir queue parent before write

- tests/__init__.py: register atexit cleanup for the session temp queue dir so
  it doesn't accumulate in /tmp on long-lived CI machines (greptile r3380308482).
- utils._write_file_secure: mkdir(parents=True, exist_ok=True) before write so a
  queue-path override with a missing parent can't silently drop failed reports
  (greptile r3380308569).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Accumulate all users' global MCP configs under root (fix multi-user first-match drop) (#195)

* Accumulate all users' global MCP configs under root (fix first-match drop)

On a multi-user macOS/Windows machine scanned as root, the global-MCP config
helpers returned only the FIRST user's config and silently dropped every other
user's. Three helper copies had this `if config: return config`-inside-the-loop
bug:
- extract_global_mcp_config_with_root_support  (Cursor, Cursor CLI, Gemini CLI,
  Windsurf, Antigravity)
- extract_codex_global_mcp_config_with_admin_support  (Codex; TOML)
- opencode's own copy  (macOS + Windows)
Claude Code / Cline / Roo Code already accumulated correctly and are untouched.

Each helper now accumulates every user's config into a list, de-duped by the
config's "path" (guards the Windows admin-own-home double-count), and keeps the
admin's own home as a FALLBACK ONLY (`if not configs:`) to preserve today's
single-user-root semantics exactly. The 14 thin callers change
`projects.append(global_config)` -> `projects.extend(self._extract_global_config())`;
`extract_mcp_config` still returns {"projects": [...]}.

Strictly additive: single-user and non-admin machines are byte-for-byte
unaffected (a 1-element list, identical content/ordering); output only grows for
multi-user-root, recovering the previously-dropped users. Nothing reported today
is dropped.

Adds tests/test_multiuser_global_mcp.py (19 cases) covering Cursor (JSON), Codex
(TOML), and opencode (macOS + the divergent Windows variant): both-users-recovered
regression, single-user-unchanged, dedup, and fallback-only/fallback-suppressed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Hoist the accumulate+dedup+fallback loop into one helper (no behavior change)

The first-match-drop fix landed the same accumulate + de-dup-by-path +
fallback-only inner loop in four places: the JSON helper
(extract_global_mcp_config_with_root_support), the Codex/TOML helper
(extract_codex_global_mcp_config_with_admin_support), and OpenCode's own
copy on macOS and Windows. Four copies of the same subtle loop is exactly
how the bug got duplicated in the first place.

Extract that loop into a single private helper,
_accumulate_per_user_with_fallback(user_homes, global_config_path,
reader_fn, tool_name, parent_levels), and have all four call sites defer to
it. The helper has exactly two seams: the already-resolved user_homes list
(empty when not admin) and the tool's reader_fn.

Detection deliberately stays at each call site, NOT unified:
- the JSON helper keeps its platform/is_admin block + _iter_admin_user_homes
- Codex reproduces its own user-dir filter verbatim
  ([d for d in users_dir.iterdir() if d.is_dir() and not d.name.startswith('.')])
- each OpenCode extractor keeps its own admin check
The helper's docstring and a call-site comment record WHY it must never grow
a Linux branch: Linux uses a separate per-user extractor, and an
accumulated-or-first-match walk here would silently drop users there.

Behavior-preserving: the extracted loop is byte-for-byte the prior inline
logic, so single-user and non-admin machines are unaffected and the
multi-user-root recovery is identical. Dead
extract_global_mcp_config_with_root_support imports are dropped from both
OpenCode extractors.

The three OpenCode test methods that whole-symbol-rebind a module-local
Path shim now also patch helpers.Path, since relative_to(Path.home())
resolution moved into the shared helper's module (the Codex/Cursor tests
mutate pathlib.Path.home directly, so they need no change). No test
behavior or assertions change; no new tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Keep per-user exists() inside the try (restore byte-identical skip-one-user)

The loop hoist in the previous commit narrowed the per-user try to cover only
relative_to(), leaving Path.exists() outside the guard. All four prior inline
loops wrapped exists() inside the try, so a single user's filesystem error
(ValueError, or an OSError such as EACCES from Path.exists() on a Python 3.9
locked / NFS-mounted home -- exists() re-raises anything not ENOENT/ENOTDIR/
EBADF/ELOOP) skipped only that user. Outside the guard the error instead
propagates out of the helper; the orchestrator catches it but drops the WHOLE
tool's MCP config, losing every other user on the machine -- a regression in
exactly the multi-user-root case this PR hardens.

Move exists() back inside the per-user try so the helper is byte-identical to
all four originals: a per-user path/permission error skips only that user.

Adds a deterministic lock-in test that drives the helper with one user whose
exists() raises PermissionError (iterated first) and asserts the other user
still survives -- it errors against the pre-fix exists()-outside-try shape.

Found by principal-engineer review of the refactor commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Merge pull request #196 from websentry-ai/WEB-4755

WEB-4755: Discovery self-timeout, signal cleanup, and dead-PID lock recovery

---------

Co-authored-by: Sumit Badsara <sumit@unboundsecurity.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Sumit Badsara <sumitbadsara.dev@gmail.com>
Co-authored-by: NandaPranesh <106886030+anonpran@users.noreply.github.com>
Co-authored-by: Aakash Velusamy <aakashvpsgtech@gmail.com>
Co-authored-by: MohamedAklamaash <aklamaash78@gmail.com>
Co-authored-by: vishnu <vishnuvinod072@gmail.com>
Co-authored-by: Vishnu <79318686+zeus-12@users.noreply.github.com>
Co-authored-by: Mohamed Aklamaash M.R <111295679+MohamedAklamaash@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Vignesh Subbiah <51325334+vigneshsubbiah16@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants