Container-aware device identity: inject host device_id + namespace home_user per container by sumit-badsara · Pull Request #165 · websentry-ai/coding-discovery-tool

sumit-badsara · 2026-06-04T18:52:38Z

What

Makes containers report a stable, non-fragmenting identity to the discovery backend — no backend change. Three commits:

Inject host device_id. get_device_id() prefers an injected UNBOUND_DEVICE_SERIAL (validated via is_valid_serial()) so a container reports the host device instead of its ephemeral hostname.
Log the decision branch. logger.info when the injected serial is accepted, logger.warning when present-but-invalid — so the attribution path is reconstructable from logs.
Container-aware gating + per-container profiles (this update):
- Gate the override on container detection. The injected serial is only honored when in_container() is true, so a stray UNBOUND_DEVICE_SERIAL on a native host can't mask the real auto-detected serial. in_container() combines /.dockerenv, /run/.containerenv, an overlay root mount, and a cgroup-v1 fallback (cgroup v2 shows 0::/ from inside, so the classic /proc/1/cgroup grep alone is unreliable).
- Namespace home_user per container. All containers on a host share the one injected device_id, and the backend's report-ingest replace key is (device, tool_name, home_user). A plain home_user (ubuntu/root, fixed per image) therefore makes concurrent containers clobber each other's tool data. Inside a container, the report now sends home_user = "<container_id>_<home_user>", giving each container a distinct profile under the shared device. Native hosts are unaffected (home_user stays the real username).

Why

A container is ephemeral. get_device_id() auto-detects the host serial with a hostname fallback; inside a Linux container the Mac serial is unreadable, so it falls back to the per-launch container hostname → a new "device" every launch. Injecting the host serial collapses them onto one device. But once collapsed, multiple containers collide on (device, tool_name, home_user) and overwrite each other — so home_user is namespaced per container to keep distinct profiles without re-fragmenting the device.

`container_id` resolution & the stability trade-off

get_container_id() resolves to:

Injected UNBOUND_CONTAINER_ID — the stable path. The host can derive this from the workspace / devcontainer name so a restarted workspace keeps a single profile (clean upsert).
Container hostname — the ephemeral fallback. Unique per docker run, so the same workspace restarted under a fresh hostname creates a new profile (old ones linger). Acceptable for one-shot containers; set UNBOUND_CONTAINER_ID for long-lived / restarted workspaces.

Safety / scope

All behavior is container-only; native hosts auto-detect their serial and keep a plain home_user.
No backend change — device_id is already the attribution key, and home_user is passed through verbatim by the ingest path.

Tests

tests/test_device_id_override.py (12 cases): override honored only in-container, ignored on native host, invalid/empty fall through; home_user namespaced in-container and untouched on host; two containers on one device get distinct profiles; in_container() / get_container_id() primitives.

Full suite: 833 passed, 1 skipped. Validated end-to-end in a real OrbStack Ubuntu container against staging (injection honored + container detected; no-injection baseline falls back to hostname).

🤖 Generated with Claude Code

Greptile Summary

This PR makes the discovery tool container-aware: get_device_id() now prefers an injected host serial (UNBOUND_DEVICE_SERIAL) only when running inside a container, and generate_single_tool_report() namespaces home_user with the container ID so concurrent containers on the same host don't clobber each other's backend profile (the ingest replace key is (device, tool_name, home_user)).

utils.py adds in_container() (multi-signal: dockerenv, overlay mount, cgroup-v1 markers; process-lifetime lru_cache) and get_container_id() (injected env var with hostname fallback).
ai_tools_discovery.py wires both helpers into get_device_id() (with info/warning logging for each branch) and into generate_single_tool_report() for home_user namespacing.
12 new tests cover the override gating, namespacing, distinct-profile guarantee, and helper primitives.

Confidence Score: 5/5

Safe to merge — all new behaviour is gated behind container detection, native hosts are entirely unaffected, and the logic is well-tested.

The container-gating logic is correct and well-covered by tests. Native host code paths are unchanged. The only gap is that the home_user namespacing branch produces no log line when it fires, making post-hoc attribution debugging harder — but this does not affect correctness.

No files require special attention; the single observation is in ai_tools_discovery.py around the home_user namespacing block.

Important Files Changed

Filename	Overview
scripts/coding_discovery_tools/ai_tools_discovery.py	Added container-aware `get_device_id()` (honors `UNBOUND_DEVICE_SERIAL` only inside a container, with info/warning logging) and `home_user` namespacing in `generate_single_tool_report()` (prepends container ID to prevent profile collisions). The namespacing branch is silent — unlike the device_id path, it emits no log when activated.
scripts/coding_discovery_tools/utils.py	Added `in_container()` (multi-signal: dockerenv, containerenv, overlay root mount, cgroup-v1 markers; `lru_cache`-d) and `get_container_id()` (`UNBOUND_CONTAINER_ID` env var with hostname fallback; `lru_cache`-d). Logic is well-documented and correctly handles macOS/Windows by catching all OSError.
tests/test_device_id_override.py	New test file with 12 cases covering serial override (in-container vs native host, valid vs invalid vs missing), home_user namespacing (native unchanged, container namespaced, two containers get distinct profiles), and `in_container()`/`get_container_id()` primitives. Cache teardown is correctly handled in setUp/tearDown.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_device_id called] --> B{UNBOUND_DEVICE_SERIAL set?}
    B -- No --> Z[auto-detect via extractor]
    B -- Yes --> C{in_container?}
    C -- No --> D[logger.info: env var present but not in container\nauto-detect instead]
    D --> Z
    C -- Yes --> E{is_valid_serial?}
    E -- No --> F[logger.warning: invalid serial\nfalling back]
    F --> Z
    E -- Yes --> G[logger.info: using injected serial\nreturn injected value]

    H[generate_single_tool_report called] --> I{home_user set AND in_container?}
    I -- No --> J[report_home_user = home_user unchanged]
    I -- Yes --> K[container_id = get_container_id\nreport_home_user = container_id + _ + home_user]
    J --> L[build report dict]
    K --> L

_{Reviews (4): Last reviewed commit: "Gate device override on container detect..." | Re-trigger Greptile}

A container is ephemeral: get_device_id() auto-detects via host serial with a hostname fallback, so inside a container the Mac serial can't be read and it falls back to the ephemeral container hostname — a new "device" every launch, fragmenting users/devices on the backend. get_device_id() now prefers an injected UNBOUND_DEVICE_SERIAL (set by the container launch / socket proxy) when it passes is_valid_serial(), so all activity from containers on a host syncs under the one host device. The env var is only set when injected, so native hosts are unaffected and still auto-detect their own serial. No backend change — device_id is already the attribution key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address Greptile review: emit logger.info when an injected UNBOUND_DEVICE_SERIAL is accepted and logger.warning when it is present but rejected, so the container attribution path can be reconstructed from logs alone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…container Two container-aware refinements on top of the UNBOUND_DEVICE_SERIAL override: 1. Gate the serial override on container detection. get_device_id() now only honors an injected UNBOUND_DEVICE_SERIAL when in_container() is true, so a stray env var on a native host can't mask the real auto-detected serial. in_container() combines /.dockerenv, /run/.containerenv, an overlay root mount, and a cgroup-v1 fallback (cgroup v2 shows "0::/" from inside, so the classic /proc/1/cgroup grep alone is unreliable). 2. Namespace home_user per container. All containers on a host share one injected device_id, and the report-ingest replace key is (device, tool_name, home_user) — so a plain home_user ("ubuntu"/"root", fixed per image) makes concurrent containers clobber each other's tool data. generate_single_tool_report() now sends home_user="<container_id>_<home_user>" inside a container, giving each container a distinct profile under the shared device. Native hosts are unaffected (home_user stays the real username). container_id resolves to an injected UNBOUND_CONTAINER_ID (stable, host-derived from the workspace so a restarted workspace keeps one profile) and falls back to the container hostname (ephemeral — distinct per launch; see get_container_id docstring for the trade-off). Tool-only change; no backend change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r flag Implements step ③ of Approach B (user-centric device attribution). Each device reports its OWN device_id and a plain home_user; supersedes PR #165's device_id injection + home_user namespacing (not carried forward). - Salvage the in_container() helper into utils.py (checks /.dockerenv, /run/.containerenv, overlay root mount, cgroup v1 fallback; lru_cached). Only this helper is brought over from feat/device-id-host-override — not the injection override or namespacing. - Add is_container (bool) as a top-level field in the report payload built by generate_single_tool_report, set from in_container(). - Stable container device_id: when /etc/machine-id is empty/absent, fall back to a UUID persisted in the home-user's ~/.unbound/ (read if present, else generate uuid4 + write) instead of the ephemeral hostname, so a restarted container keeps one device row. Reuses cache._ensure_state_dir() for robust state-dir resolution (writable fallback) rather than a bare Path.home(). Machine-id-first behavior unchanged; never raises. - Tests: is_container present/top-level in report; persisted-UUID fallback (generate+write, read-existing, unwritable dir, write-failure); in_container detection with lru_cache clearing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(discovery): Approach B tool change — own device_id + is_container flag Implements step ③ of Approach B (user-centric device attribution). Each device reports its OWN device_id and a plain home_user; supersedes PR #165's device_id injection + home_user namespacing (not carried forward). - Salvage the in_container() helper into utils.py (checks /.dockerenv, /run/.containerenv, overlay root mount, cgroup v1 fallback; lru_cached). Only this helper is brought over from feat/device-id-host-override — not the injection override or namespacing. - Add is_container (bool) as a top-level field in the report payload built by generate_single_tool_report, set from in_container(). - Stable container device_id: when /etc/machine-id is empty/absent, fall back to a UUID persisted in the home-user's ~/.unbound/ (read if present, else generate uuid4 + write) instead of the ephemeral hostname, so a restarted container keeps one device row. Reuses cache._ensure_state_dir() for robust state-dir resolution (writable fallback) rather than a bare Path.home(). Machine-id-first behavior unchanged; never raises. - Tests: is_container present/top-level in report; persisted-UUID fallback (generate+write, read-existing, unwritable dir, write-failure); in_container detection with lru_cache clearing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Harden persisted device-id: atomic write, UUID validation, warning logs Addresses Greptile review on PR #170: - Atomic write (P1): persist the device-id UUID via tempfile.mkstemp + os.replace in the same dir, mirroring cache.atomic_write_cache(). A kill mid-write can no longer leave a partial UUID on disk. - UUID validation on read (P1): validate the persisted value with uuid.UUID() and treat a corrupt/truncated/non-UUID value as absent, regenerating a fresh UUID. Pairs with the atomic write. - Log levels (P2): failures that cost the container its stable identity (no state dir, unreadable/corrupt file, failed write, outer catch) now log at warning; the benign absent-machine-id read stays at debug. Extends tests/test_container_attribution.py to use a real temp dir so the atomic write path is exercised for real, and adds corrupt-partial and garbage-non-UUID regeneration cases plus a write-failure case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (#183) * WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (fixes #1, #3) Fix #1: the GitHub Copilot (VS Code) MCP extractor's _read_mcp_config called json.loads() directly despite a docstring claiming it stripped comments, so any hand-edited mcp.json with // or /* */ comments or a trailing comma threw JSONDecodeError and silently yielded 0 servers. Now strips comments + trailing commas before parsing, reusing the existing string-aware helpers. Fix #3: removed the dead globalStorage/ms-vscode.vscode-github-copilot/mcp.json fallback. That publisher/extension id does not exist (real ids are GitHub.copilot / GitHub.copilot-chat) and VS Code never stores MCP config in extension globalStorage, so the branch could never match a real install. DRY: relocated _strip_jsonc_comments / _strip_trailing_commas into the shared mcp_extraction_helpers.py as the single source of truth; macos/copilot_cli re-exports them for back-compat. Applied across all three OS variants. Fix #2 (profiles / Insiders / legacy settings.json MCP locations) is intentionally deferred to a follow-up PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Trim verbose comments on JSONC strippers Condense the explanatory block comments in mcp_extraction_helpers.py to concise one-liners; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Drop ticket/task-specific references from test comments Remove WEB-4703 / "fix #1" / "fix #3" labels from the JSONC test module docstrings and section comments; keep the behavioral descriptions. No test change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (#185) * WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (fix #2, scoped A+B) The GitHub Copilot (VS Code) MCP extractor read only the default-profile Code/User/mcp.json. Since VS Code 1.102 MCP is per-profile, so any user on a named profile (Code/User/profiles/<id>/mcp.json) had their servers missed; VS Code Insiders users were also skipped even though detection already counts them (detect_copilot._VSCODE_USER_DATA_DIRS). Adds a shared, bounded, crash-safe enumerator enumerate_vscode_mcp_files() that yields the default mcp.json plus sorted profiles/*/mcp.json for one Code/User base. Each OS extractor now iterates [Code/User, Code - Insiders/User] through it and attributes each config to its own dir (str(mcp_file.parent)), so distinct profiles/variants surface as distinct sources. Customer-agnostic and additive: a machine with only the default-profile mcp.json produces byte-identical output. Reuses the JSONC strippers from #183. Scoped to A+B. Fix C (legacy settings.json mcp key) is intentionally deferred: 1.102 auto-migrates settings.json -> mcp.json, so the remaining population is small and shrinking, and it would add dedup/precedence complexity that profiles + Insiders (disjoint locations) do not need. Predecessor: #183 (fixes #1 + #3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Log at debug when enumerate_vscode_mcp_files skips on FS error Addresses Greptile P2: the two except blocks swallowed PermissionError/ OSError silently; log at debug to match the rest of the extraction layer (still never raises). No behavior change to returned files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable (#186) * WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable On a shared-HOME macOS host where discovery runs under two uids -- classically a root MDM agent and the login user on the same machine (the failing host is an EC2 Mac) -- discovery-cache.json is written 0600 by whichever uid runs first. The other non-root uid then gets PermissionError [Errno 13] reading it. Surfaced as Sentry DISCOVERY-TOOL-SCRIPT-11. _ensure_state_dir() already falls back to a per-uid /var/tmp/unbound-{uid} dir, but _try_state_dir() only probed the *directory's* writability, never whether an existing discovery-cache.json was *readable*. A home whose dir is writable but whose cache file is a foreign-owned 0600 was accepted as healthy, then read_cache() raised EACCES on it. Probe the cache file's readability too: a candidate holding a cache file this uid cannot read is rejected, so the resolver falls through to the per-uid temp dir -- the same uid-namespacing utils._get_queue_file_path() already uses for the queue file. os.access() uses the real uid, so root (which can read any file) keeps using its own home cache unchanged, leaving root/all-users scans intact. Scope: this fixes the cross-uid collision and the file-readability probe gap. Downgrading the *expected* permission warning that read_cache/atomic_write_cache emit to Sentry is tracked separately and intentionally not included here. Fixes DISCOVERY-TOOL-SCRIPT-11 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4702: surface fallback reason in state-dir warning; condense probe comment Address PR review: _ensure_state_dir's fallback warning now includes last_lock_error, so a fall-back to the per-uid temp dir logs *why* the home dir was rejected (unreadable cache vs. unwritable dir vs. OSError) before it is cleared. Also condense the readability-probe comment to a single line. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4702: trim verbose comments in the cache-fallback test Condense the readability-fallback test's comments to match the source cleanup. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4707: layer Cursor permissions.json as a per-field override on the state.vscdb read (#191) * WEB-4707: layer Cursor permissions.json as per-field override on SQLite read Cursor added a file-based permissions.json (~/.cursor + <workspace>/.cursor) that overrides the in-app/SQLite allowlists. Layer it on top of the existing state.vscdb (composerState) read as a per-field override, collapsing into one effective backend record. - Per-field override: a file-defined field replaces the SQLite-derived value; fields are independent; SQLite stays the sole source for useYoloMode / run mode / modes4 / enabledMcpServers (no permissions.json equivalent). - User + workspace arrays concatenate within a field (autoRun nested arrays merged independently). - JSONC-tolerant parse; any read/parse failure falls back to SQLite-only. - File-absent path is byte-identical (_parse_composer_state untouched; override is a post-processing layer in _apply_permissions_json_override). mcpAllowlist -> mcp_tool_allowlist; terminalAllowlist -> Bash(cmd*) prefix rules (distinct from the SQLite Bash(cmd *) command rules); autoRun -> raw_settings verbatim for the permissions classifier. No gateway-data / unbound-fe changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4707: address elite-PR-review findings - C1 (critical): scope macOS workspace walk to user_home (was rooting at /), matching the Linux/Windows siblings. Prevents cross-user workspace permission leakage under root/MDM and avoids a full-filesystem walk. - W3: preserve unknown autoRun sub-keys (shallow-merge across files, last-file-wins) while still concatenating the documented allow_instructions/block_instructions arrays. - I4: skip symlinked intermediate dirs in the Windows walk (junction-loop hardening; a .cursor dir that is itself a symlink still matches). - Tests: M5/T5 pin the intentional empty-array override-to-empty; A7 pins autoRun unknown-subkey preservation. _parse_composer_state still untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4707: close test-coverage gaps from 2nd-pass review Test-only additions (no production change): - D2: _dedupe_preserve_order unhashable (dict) fallback branch - T6: terminalAllowlist drops non-string/empty entries (no Bash(*) , no raise) - W3/W4: Windows _walk_for_permissions real-walk coverage (workspace file applied; global ~/.cursor not double-counted) — the one new path not backed by a pre-tested shared helper Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4707: log info when permissions.json override is applied Addresses greptile P2 (success-path diagnosability): emit an info log naming which fields (mcp/terminal/autorun) the override replaced. Placed after the no-file early-return, so the byte-identical file-absent path stays silent and unchanged. Stdlib logging only (no new deps). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4718: use raw docstrings to silence invalid-escape SyntaxWarnings (#190) * fix(discovery): detect home-rooted project-scope .mcp.json files (#166) * fix(discovery): detect home-rooted project-scope .mcp.json files The Claude Code project-scope MCP walk skipped a `.mcp.json` located directly in a user's home directory (project root == home, e.g. C:\Users\<u>\.mcp.json, /Users/<u>/.mcp.json, /home/<u>/.mcp.json). `is_home_dotdir_descendant` — intended to skip the *contents of* hidden home tool dirs (~/.cursor/, ~/.codex/) — also matched a home-rooted leaf `.mcp.json`, so its servers were never reported. On a real device a user-scope server in ~/.claude.json was detected while project-scope servers in ~/.mcp.json (e.g. policycenter, playwright) were missed. Fix: the walk's file branch now tests the file's parent directory (`is_home_dotdir_descendant(entry.parent)`). A home-rooted `.mcp.json` is read, while a `.mcp.json` inside a hidden home tool dir (~/.cursor/.mcp.json) stays skipped. The directory-recursion branch and the sibling directory-based walks (generic + skills) are intentionally unchanged. Purely additive: for any path with >=5 segments old and new are identical; only the 4-segment home-rooted leaf flips skip->read. No path flips read->skip, so no previously-detected config can disappear. Add tests/test_mcp_home_rooted_config.py: predicate regression tests (POSIX + Windows-drive shapes), a walk-level test asserting the file branch consults entry.parent (not the file), and a smoke test for normal projects. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(discovery): union MCP servers on path collision + cover Copilot CLI Addresses principal-engineer review of the home-rooted .mcp.json fix. H1: un-skipping a home-rooted ~/.mcp.json makes it emit a project entry whose path == the home dir, which can collide with the ~/.claude.json projects[<home>] (local-scope) entry. The merge functions overwrote mcpServers by path (last-writer-wins), silently dropping one source's servers. Add AIToolsDetector._union_mcp_servers (dedupe by name, first/higher-precedence wins) and apply it at all three merge sites: _merge_claude_mcp_configs_into_projects, _merge_mcp_configs_into_projects, and the Copilot CLI inline merge. Also closes the pre-existing sub-folder collision and stops additionalMcpData from clobbering an existing entry. H2: Copilot CLI's Workspace .mcp.json uses the same project-scope walk, so the home-rooted fix applies to it too; its inline merge now unions as well. Tests: union helper, two-source home collision through the real Claude merge, default-merge union, single-source-unchanged, Copilot shared-walk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: detect built-in VS Code Copilot (all OS) + macOS Codex rules crash — direct to main (#169) * fix(macos/github-copilot): detect built-in VS Code Copilot so its MCPs surface VS Code now ships GitHub Copilot / Copilot Chat as BUILT-IN extensions in the app bundle, which never appear in the per-user ~/.vscode/extensions/extensions.json the detector reads. So users on built-in Copilot were never detected, and their VS Code MCP servers (Code/User/mcp.json) were silently skipped. _detect_vscode_for_user now falls back to scanning the VS Code app bundle's built-in `copilot`/`copilot-chat` extension when no marketplace extension is present, reading the version from package.json. Gated on the user actually having a Code/User data dir so a machine-wide install isn't attributed to unrelated users in a root scan. Detected "GitHub Copilot (VS Code)" then routes through the existing MCP extractor, surfacing Code/User/mcp.json servers. Verified on a real macOS box: built-in Copilot 0.51.0 detected -> 3 user MCP servers (github-mcp-server, context7, playwright-mcp) extracted. Also adds a regression test for the Codex should_process_directory('/' branch) arg-count crash (the code fix already landed on staging; this guards it). Full suite 835 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * review: log built-in Copilot detection outcomes + document single-result return Addresses PR #169 review (4/5): - _detect_vscode_builtin_copilot now logs all three outcomes (no VS Code data dir, built-in found w/ version+path, VS Code-but-no-built-in) at debug, so an admin scan resolving from a non-obvious app bundle is reconstructable. - Documents the intentional at-most-one-result contract: one detection suffices to trigger downstream rules/MCP extraction; built-in Copilot bundles chat in the same `copilot` extension, so a second `copilot-chat` row would only double-process and duplicate MCP servers (unlike the marketplace path where the two are separate installs). No behavior change. Detection tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(win/linux github-copilot): detect built-in VS Code Copilot (parity with macOS) Extends the macOS built-in Copilot detection (this PR) to Windows and Linux, so users on built-in VS Code Copilot — and their VS Code MCP servers (%APPDATA%\Code\User\mcp.json / ~/.config/Code/User/mcp.json) — are no longer missed on those OSes either. - linux/github_copilot: _detect_vscode_for_user falls back to the VS Code install tree (deb/rpm, /opt, snap; stable + Insiders) when no marketplace extension is present, gated on a ~/.config/Code/User data dir. - windows/github_copilot: same fallback over per-user (LocalAppData\Programs) and system (Program Files) installs, gated on %APPDATA%\Code\User. Also fixes an early-return that skipped detection entirely when .vscode\extensions was absent (the exact built-in-only case). - Both log detection outcomes at debug and return at most one entry (a 2nd copilot-chat row would only duplicate the same MCP servers). Adds Linux + Windows detection tests. Full suite 839 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * harden: guard non-dict package.json in built-in Copilot version read (audit P2) A bundled copilot/package.json that parses as valid JSON but isn't an object (array/string/number) made data.get("version") raise AttributeError, which escapes the per-user loop and aborts detect_copilot() mid-iteration — silently dropping ALL Copilot results (marketplace + built-in, every user) for the run. Add an isinstance(data, dict) guard in _read_extension_version (macOS/Linux) and the Windows built-in version read. Regression test with a non-dict package.json. Full suite 840 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(macos/codex): pass root_path to should_process_directory (silent 0-rules crash) main lacks the staging-only fix (#163): should_process_directory(dir_path) was called with one arg but the helper requires (directory, root_path), raising a TypeError on every '/' scan that was swallowed into 0 Codex project rules. Included here so this main-targeted release carries both the fix and its regression test (test_codex_rules_extraction.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(codex): exercise the real should_process_directory (not a stub) Audit note: the regression test stubbed should_process_directory and only asserted call arity, so it never exercised the real TypeError. Call THROUGH to the real helper instead — a future signature/arity regression now raises the actual production TypeError in the test (verified: fails with "missing 1 required positional argument: 'root_path'" when reverted to the one-arg call), while still pinning the (directory, root_path) contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(github-copilot vscode): prove built-in fallback is purely additive Add per-OS regression tests asserting that when a MARKETPLACE Copilot extension is present, _detect_vscode_for_user returns it and the new built-in fallback is never invoked (spy.assert_not_called) — locking in that existing detection behavior is unchanged and the built-in path only runs when nothing was found. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(codex): make '/'-branch test cross-platform (fix Windows CI) The codex regression test drove the macOS extractor's root_path==Path('/') branch and asserted AGENTS.md was found. On Windows CI a C:\ temp path can't be made relative to '/' (the walk's relative_to raises ValueError, skipping items), so the file-finding assertion failed there — a test-only POSIX assumption, not a product bug (the macOS extractor's '/' scan is POSIX-only). - Keep the cross-platform contract check (should_process_directory called with (directory, root_path)); guard the AGENTS.md assertion to non-Windows. - Add a cross-platform test exercising the REAL helper: one-arg call raises TypeError (the bug), two-arg returns bool — no '/' walk involved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(github-copilot vscode): label built-in as "GitHub Copilot Chat (VS Code)" (#172) * fix(github-copilot vscode): label built-in as "GitHub Copilot Chat (VS Code)" Follow-up to the built-in detection (#169). VS Code consolidates Copilot into the `copilot` extension folder, whose manifest is name="copilot-chat", displayName="GitHub Copilot Chat" ("AI chat features powered by Copilot") — it's the Copilot Chat extension, and the one that consumes mcp.json. The built-in detection was hardcoding "GitHub Copilot (VS Code)", mislabeling Chat as the inline-completions product. Derive the name from the bundle's package.json: name contains "copilot-chat" (or displayName contains "chat") -> "GitHub Copilot Chat (VS Code)" (matching the marketplace github.copilot-chat mapping); a plain "copilot" stays "GitHub Copilot (VS Code)". Marketplace paths unchanged. Per-OS tests updated + a plain-copilot generic case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * review: tighten chat heuristic + add Linux/Windows plain-copilot tests Addresses Greptile (4/5) on #172: - Narrow the displayName check from "chat" to "copilot chat" in all 3 detectors, so a hypothetical future variant like "GitHub Copilot (chat enabled)" isn't mislabeled as Chat. Still matches the real displayName "GitHub Copilot Chat". - Add the plain-copilot generic-label test to the Linux and Windows classes (was macOS-only), so all 3 platforms cover both the chat and plain branches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(github-copilot): attach shared ~/.copilot skills to the VS Code Copilot surface (#173) * feat(github-copilot): attach shared ~/.copilot skills to the VS Code Copilot surface An IDE-only GitHub Copilot user's shared ~/.copilot/skills were read by nobody: the CLI skills extractor owns ~/.copilot but the CLI isn't detected (skills/ is a SHARED marker excluded from CLI detection by #164), and the IDE Copilot branch read nothing from ~/.copilot + has no skills extractor. This enriches the detected VS Code Copilot row with the shared skills it actually consumes (VS Code Agent Skills docs) — EXTRACTION-ONLY; detection and the #164 markers are untouched, so it cannot re-introduce the CLI false positive. - S6: memoized _get_copilot_cli_skills() — one filesystem walk per scan, shared by the CLI + VS Code branches; CLI output byte-identical. - S2: _set_canonical_vscode_copilot() picks ONE VS Code row (prefer Chat), computed from the full detected list; only that row carries skills. - S5: user-scope skills keyed by each skill's own owner-home (from file_path) so multi-user/MDM scans don't leak skills across users. - JetBrains excluded; Linux a graceful no-op (no CLI skills extractor). - copilot-instructions.md + mcp-config.json stay CLI-only; ~/.copilot instructions deferred to a follow-up. Scope: skills only. detect_copilot.py / copilot_cli.py / backend untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot): make skill owner-home derivation OS-independent _copilot_skill_owner_home ran the skill file_path through pathlib and returned str(parent.parent). On a Windows interpreter, pathlib re-emits a POSIX-style path with backslashes (str(WindowsPath('/Users/a')) -> '\Users\a'), so the projects_dict key no longer matched the raw forward-slash keys used everywhere else — failing the per-user keying assertions on Windows CI (test_github_copilot_vscode_skills, 3 cases). Derive the owner home by string-slicing at the .copilot/.agents marker instead, preserving the input's separator style on any interpreter. macOS and native-Windows output are byte-identical to before; only the cross-OS (POSIX-path-on-Windows) case is corrected. Detection, the CLI branch, copilot_cli.py and detect_copilot.py are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot): address PR review — surface swallowed skills failure + dedup user skills Two greptile review findings on the VS Code Copilot skills enrichment: - P2: _get_copilot_cli_skills swallowed extraction failures at DEBUG, but the accessor is memoized so the CLI branch's existing WARNING never fired for a walk failure — invisible at the prod INFO floor. Upgraded to WARNING, matching the sibling skills-extraction error log. - P1: user-scope skills were appended without dedup while project-scope skills 10 lines above call _deduplicate_project_items. Dedup the owner-home bucket to match (by file_path). IDE branch only; CLI branch output unchanged. Adds test_duplicate_user_skills_deduped_on_owner_home. The third finding (per-sub-flow observability metric) was declined on the PR — inconsistent with the sibling rules/MCP/permissions sub-steps, out of scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot-cli): demote hooks/ to SHARED — Unbound's own MDM hook caused fleet-wide phantom CLI (#174) * fix(copilot-cli): demote hooks/ to a SHARED marker (Unbound's own MDM hook creates it) #164 split ~/.copilot markers into STRONG (CLI-exclusive) vs SHARED, but kept `hooks/` as STRONG. It is NOT CLI-exclusive: Unbound's OWN MDM onboarding (websentry-ai/setup copilot/hooks/mdm/setup.py) runs for EVERY onboarded device and does `(~/.copilot/hooks).mkdir(parents=True)` then writes unbound.json + unbound.py — creating ~/.copilot/hooks/ from scratch on machines that never had the CLI. So `hooks/` alone triggered a phantom "GitHub Copilot CLI" install fleet-wide. Confirmed in prod: device D2FJV74J5Q / user gowshik — ~/.copilot held only hooks/unbound.json, `copilot --version` = not installed, no config/permissions. Fix mirrors #164's ide/ demotion: move `hooks` STRONG->SHARED so it can never alone declare a CLI install (it still enriches a real install). False-negative- safe: a genuine CLI always also has a strong marker (config.json / session-store.db / logs/). Detection of real installs is unchanged. Adds regression tests: hooks/-only ~/.copilot is suppressed (gowshik repro), and a real install with hooks/ + a strong marker is still detected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * address greptile review: complete the SHARED-marker docstring + Windows hooks guard Two non-blocking gaps flagged on the PR: 1. `_copilot_dir_has_shared_artifact` docstring enumerated a stale shared-marker list that omitted both `ide/` (since #164) and the newly demoted `hooks/`, and wrongly attributed all shared markers to the IDE. Rewrote it to list all six (skills/agents/instructions/copilot-instructions.md/ide/hooks) with their true origins: IDE-read, IDE-written (ide/), and Unbound-MDM-written (hooks/). 2. The Windows detection test class had no hooks-specific assertion. Added `test_hooks_dir_alone_not_detected` to the existing TestWindowsCopilotCliDetection so the SHARED demotion is guarded on Windows too — catches a future regression if the MacOSCopilotCliDetector inheritance is ever broken. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot-cli): stop per-user over-attribution + extractor over-collection (#175) Two independent extraction/attribution bugs found alongside the Copilot CLI phantom-detection work (#164/#174). Detection and marker sets are untouched. FIX #2 — per-user over-attribution. The CLI's install_path is a per-user ~/.copilot owned by exactly one user, but main()'s per-user loop re-emits every detected tool for every OS user. filter_tool_projects_by_user scopes a tool's projects/permissions to the user but never rewrites install_path, so a second user (e.g. gowshik_2) who never had ~/.copilot still got a phantom "GitHub Copilot CLI" row pointing at gowshik's home with 0 projects. Add an ownership gate at the per-user emit site (CLI only): emit iff the user owns the detected install OR the filter produced data for them. Extract the path-normaliser to a module-level _normalise_path shared with filter_tool_projects_by_user (DRY). Scoped to the CLI — IDE tools legitimately share a machine-wide install_path. FIX #3 — extractor over-collection. The rules/skills project walks descended into OTHER tools' per-user config dirs and their installed-extension packages (e.g. ~/.antigravity/extensions/<pkg>/.github/instructions/*), mis-attributing those bundled files to Copilot CLI. Add traverses_other_tool_config_dir() and skip those dirs in both the rules walk (macOS + Windows _should_skip) and the skills walk — while still allowing the shared .claude/.agents skill dirs a real repo root legitimately carries (Agent Skills convention). Also dedupe repo-root rule files by realpath + content so an AGENTS.md symlinked to / copied as CLAUDE.md emits once. NOTE: CLAUDE.md/GEMINI.md remain collected — the official GitHub Copilot CLI custom-instructions reference confirms the CLI reads them at the repo root; only the symlink/copy double-count is removed. Tests: tests/test_copilot_cli_per_user_attribution.py (12) and tests/test_copilot_cli_overcollection.py (10). Full suite 902 green. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot): per-user version probe + documented VS Code instructions/prompts dirs (H2/H4/H5) (#176) * fix(copilot): per-user version probe + documented VS Code instructions/prompts dirs Three doc-verified GitHub Copilot discovery gaps (H2, H4, H5). H3 from the same audit was REFUTED against official docs and is intentionally excluded — see PR body. Detection/markers and MCP extraction are untouched. H2 (CLI) — version="unknown" on root/MDM scans. get_version() probed a bare `copilot` on the scanner's PATH; root's PATH lacks the per-user install, so the version always read "unknown" on MDM all-users scans (an existing TODO flagged this). Resolve the per-user binary from self.user_home first — ~/.local/bin, ~/.bun/bin, ~/.nvm/versions/node/*/bin (macOS, X_OK-checked like find_claude_binary_for_user); AppData/Roaming/npm/copilot.cmd, .local/bin and .bun/bin .exe (Windows, shell=True for the .cmd shim) — then fall back to the bare-PATH probe (zero regression for the running-user case). Best-effort; still degrades to "unknown". H4 (VS Code) — wrong instructions dir. Read the documented .github/instructions/**/*.instructions.md (recursive, depth-gated) instead of the undocumented, over-broad .github/copilot/*.md. Removing the legacy path is a deliberate collection-scope reduction (consistent with #175's anti-over-collection direction); pinned by a negative test. H5 (VS Code) — prompt files never collected. Collect *.prompt.md from project .github/prompts/ and the user Code/User/prompts/ dir (already opened but globbed only *.instructions.md). Prompt files are emitted as ordinary project/user-scoped rule dicts with NO extra fields, because the backend silently discards any rule carrying a non-allowlisted key — locked down by test_prompt_rule_has_only_allowed_fields. find_github_copilot_project_root generalized to walk to the nearest .github ancestor (serves nested instructions + prompts) without regressing the prompts/intellij/AGENTS.md branches. Windows mirrors macOS. Tests: new tests/test_github_copilot_instructions_prompts.py + extended test_copilot_cli_discovery.py and test_scanning_enhancements.py. Full suite 916 pass; the 1 failing test (test_main_cli_with_queue_drain) is a pre-existing environmental flake that also fails on clean main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(copilot): gate POSIX-only version stub tests to fix Windows CI test_version_resolved_from_local_bin_stub / _from_nvm_stub create a #!/bin/sh executable stub and assert the macOS detector probes it. Windows can't exec a shebang script, so get_version() returned None there (Windows CI red). The exec path is inherently POSIX; the Windows per-user-binary path is already covered portably by test_version_probed_from_per_user_npm_shim (which mocks subprocess.run). Gate the two stub-exec tests with skipUnless(os.name=="posix"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot): resolve newest nvm Node version for CLI binary (greptile P2) _resolve_copilot_binary iterated nvm version dirs in arbitrary glob order, so a user with multiple nvm-managed Node versions (each with a copilot install) could resolve a stale one. Sort by NUMERIC (major,minor,patch) parsed from the dir name, newest first — note a plain string sort (greptile's suggestion) orders "v9" after "v10"; the numeric key fixes that. Also makes a re-scan deterministic. Adds a POSIX-gated test (macOS resolver) asserting v18 wins over v10/v9. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add Sync Staging with Main workflow (#179) Keeps staging in sync with main after each release to main. Triggers on push to main (and workflow_dispatch). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot-vscode): read .claude/rules + user ~/.copilot|.claude instructions; scope MCP per IDE surface (#178) * fix(copilot-vscode): read .claude/rules + ~/.copilot|.claude user instructions; scope MCP per IDE surface Two related GitHub Copilot VS Code Chat discovery fixes (macOS + Windows; Linux tracked separately). Verified against the official VS Code Copilot custom-instructions docs. (1) Rules completeness — the VS Code rules extractor now reads the three documented "Default file location" custom-instruction sources it was missing: - workspace .claude/rules/**/*.md (Claude format) - user ~/.copilot/instructions/**/*.instructions.md - user ~/.claude/rules/**/*.md find_github_copilot_project_root now resolves the nearest .github/.claude/.copilot ancestor (keys these to the right repo root / user home) without regressing the prompts/intellij/AGENTS.md branches. New _extract_claude_rules helper + add_user_rules refactor. Guards: skip the user-home ~/.claude (collected as user scope, no double-count since add_rule_to_project doesn't dedupe) and skip other-tool config dirs / installed extension packages (traverses_other_tool_config_dir). (2) MCP identity scoping — extract_mcp_config was identity-blind: it unioned VS Code global + JetBrains global + workspace .vscode/mcp.json and returned that to EVERY Copilot row, so on a machine with both IDE Copilots each row showed the other's servers. Now gated by tool_name: a VS Code row gets VS Code global + workspace; a JetBrains row gets JetBrains global only; tool_name=None keeps the legacy union (back-compat). Pure narrowing — single-IDE users unaffected. Call site passes the detected row name. Mirrors the already identity-aware rules extractor. Tests: all regression coverage lives in the one focused file tests/test_github_copilot_instructions_prompts.py (.claude/rules project + allowlist guard + extension-dir-skipped + user-home-not-double-collected; user ~/.copilot/instructions + ~/.claude/rules; MCP identity scoping). Full suite green; the lone failure (test_main_cli_with_queue_drain) is a pre-existing environmental flake. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(copilot-vscode): Linux MCP tool_name param (regression) + user-rules depth guard Follow-up to the review of this PR. - CRITICAL: the call site passes extract_mcp_config(tool_name=...) for all OSes, but the Linux GitHub Copilot MCP extractor's signature was still extract_mcp_config(self) -> TypeError (swallowed) -> Linux Copilot MCP servers returned empty (a regression vs main). Linux now accepts tool_name and applies the same VS Code / JetBrains surface gating as macOS/Windows. - add_user_rules now depth-gates its globs (~/.copilot/instructions/**, ~/.claude/rules/**) with MAX_SEARCH_DEPTH, matching _extract_claude_rules (macOS + Windows). - Windows user-rules debug log relabeled "Found VS Code Copilot user rule" (it now also covers ~/.copilot/instructions and ~/.claude/rules). Note: the MCP JetBrains gate stays the simple `not is_vscode` (correct for every github_copilot surface the detector emits today); a speculative positive predicate was considered and dropped as over-engineering per principal review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (#183) (#184) * WEB-4703: VS Code Copilot MCP — JSONC-tolerant parse + remove dead globalStorage fallback (fixes #1, #3) Fix #1: the GitHub Copilot (VS Code) MCP extractor's _read_mcp_config called json.loads() directly despite a docstring claiming it stripped comments, so any hand-edited mcp.json with // or /* */ comments or a trailing comma threw JSONDecodeError and silently yielded 0 servers. Now strips comments + trailing commas before parsing, reusing the existing string-aware helpers. Fix #3: removed the dead globalStorage/ms-vscode.vscode-github-copilot/mcp.json fallback. That publisher/extension id does not exist (real ids are GitHub.copilot / GitHub.copilot-chat) and VS Code never stores MCP config in extension globalStorage, so the branch could never match a real install. DRY: relocated _strip_jsonc_comments / _strip_trailing_commas into the shared mcp_extraction_helpers.py as the single source of truth; macos/copilot_cli re-exports them for back-compat. Applied across all three OS variants. Fix #2 (profiles / Insiders / legacy settings.json MCP locations) is intentionally deferred to a follow-up PR. * Trim verbose comments on JSONC strippers Condense the explanatory block comments in mcp_extraction_helpers.py to concise one-liners; no code change. * Drop ticket/task-specific references from test comments Remove WEB-4703 / "fix #1" / "fix #3" labels from the JSONC test module docstrings and section comments; keep the behavioral descriptions. No test change. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (#185) (#188) * WEB-4703: VS Code Copilot MCP — read named profiles + Insiders (fix #2, scoped A+B) The GitHub Copilot (VS Code) MCP extractor read only the default-profile Code/User/mcp.json. Since VS Code 1.102 MCP is per-profile, so any user on a named profile (Code/User/profiles/<id>/mcp.json) had their servers missed; VS Code Insiders users were also skipped even though detection already counts them (detect_copilot._VSCODE_USER_DATA_DIRS). Adds a shared, bounded, crash-safe enumerator enumerate_vscode_mcp_files() that yields the default mcp.json plus sorted profiles/*/mcp.json for one Code/User base. Each OS extractor now iterates [Code/User, Code - Insiders/User] through it and attributes each config to its own dir (str(mcp_file.parent)), so distinct profiles/variants surface as distinct sources. Customer-agnostic and additive: a machine with only the default-profile mcp.json produces byte-identical output. Reuses the JSONC strippers from #183. Scoped to A+B. Fix C (legacy settings.json mcp key) is intentionally deferred: 1.102 auto-migrates settings.json -> mcp.json, so the remaining population is small and shrinking, and it would add dedup/precedence complexity that profiles + Insiders (disjoint locations) do not need. Predecessor: #183 (fixes #1 + #3). * Log at debug when enumerate_vscode_mcp_files skips on FS error Addresses Greptile P2: the two except blocks swallowed PermissionError/ OSError silently; log at debug to match the rest of the extraction layer (still never raises). No behavior change to returned files. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable (#186) (#189) * WEB-4702: fall back to per-uid state dir when home discovery-cache.json is unreadable On a shared-HOME macOS host where discovery runs under two uids -- classically a root MDM agent and the login user on the same machine (the failing host is an EC2 Mac) -- discovery-cache.json is written 0600 by whichever uid runs first. The other non-root uid then gets PermissionError [Errno 13] reading it. Surfaced as Sentry DISCOVERY-TOOL-SCRIPT-11. _ensure_state_dir() already falls back to a per-uid /var/tmp/unbound-{uid} dir, but _try_state_dir() only probed the *directory's* writability, never whether an existing discovery-cache.json was *readable*. A home whose dir is writable but whose cache file is a foreign-owned 0600 was accepted as healthy, then read_cache() raised EACCES on it. Probe the cache file's readability too: a candidate holding a cache file this uid cannot read is rejected, so the resolver falls through to the per-uid temp dir -- the same uid-namespacing utils._get_queue_file_path() already uses for the queue file. os.access() uses the real uid, so root (which can read any file) keeps using its own home cache unchanged, leaving root/all-users scans intact. Scope: this fixes the cross-uid collision and the file-readability probe gap. Downgrading the *expected* permission warning that read_cache/atomic_write_cache emit to Sentry is tracked separately and intentionally not included here. Fixes DISCOVERY-TOOL-SCRIPT-11 * WEB-4702: surface fallback reason in state-dir warning; condense probe comment Address PR review: _ensure_state_dir's fallback warning now includes last_lock_error, so a fall-back to the per-uid temp dir logs *why* the home dir was rejected (unreadable cache vs. unwritable dir vs. OSError) before it is cleared. Also condense the readability-probe comment to a single line. * WEB-4702: trim verbose comments in the cache-fallback test Condense the readability-fallback test's comments to match the source cleanup. No behavior change. --------- (cherry picked from commit 87114e80bc9ffae25c6644ebb654eda8e9c2f9a8) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * WEB-4718: use raw docstrings to silence invalid-escape SyntaxWarnings Windows-path docstrings (AppData\Roaming, %APPDATA%\JetBrains, %USERPROFILE%\.claude, etc.) used non-raw string literals, so \R \J \. \) parsed as invalid escape sequences. Python 3.12+ surfaces these as visible SyntaxWarnings at compile time; because install.sh git-clones a fresh copy per run (no cached .pyc) they print on every discovery run. The factory imports all platform modules unconditionally, so they fire on macOS/Linux too, not just Windows. Prefix the 11 affected docstrings with r"""; __doc__ content is byte-identical (these paths held no intended escape sequences), so the change is cosmetic with no behavioral effect. Verify: python3.12 -W error::SyntaxWarning -m compileall scripts/ Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Aakash Velusamy <aakashvpsgtech@gmail.com> Co-authored-by: pugazhendhi-m <132246623+pugazhendhi-m@users.noreply.github.com> Co-authored-by: Pugazhendhi <pugazhendhi@unboundsecurity.ai> Co-authored-by: MohamedAklamaash <aklamaash78@gmail.com> Co-authored-by: vishnu <vishnuvinod072@gmail.com> Co-authored-by: Vishnu <79318686+zeus-12@users.noreply.github.com> Co-authored-by: Mohamed Aklamaash M.R <111295679+MohamedAklamaash@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Vignesh Subbiah <51325334+vigneshsubbiah16@users.noreply.github.com> * [WEB-4725] Isolate the test retry-queue + CLI home so QUEUED-DEVICE can't leak to prod (#192) * Isolate the test retry-queue + CLI home so QUEUED-DEVICE can't leak to prod A phantom device "QUEUED-DEVICE" reached the production discovery dashboard. Root cause: integration tests wrote report fixtures to the REAL per-UID retry queue (/var/tmp/ai-discovery-queue-<uid>.json) via the import-time-cached utils.QUEUE_FILE constant. An interrupted test left the fixture on disk, and a later real agent run on the same machine drained it and POSTed it to prod. The string exists only in the test suite. Fix (harness-level test isolation; no behavior change for real runs): - utils._get_queue_file_path() honors an AI_DISCOVERY_QUEUE_FILE env override (strip + expanduser/expandvars, matching the sibling _resolve_copilot_dir convention). Remove the import-time QUEUE_FILE constant so save/load/cleanup resolve the path per-call, including the value-imported cleanup in ai_tools_discovery.py. - tests/__init__.py points AI_DISCOVERY_QUEUE_FILE at a throwaway temp file for the whole session. It lives in __init__.py, not conftest.py, so it arms under BOTH pytest and the `unittest discover` runner used in CI. - test_discovery_flow.py: the CLI subprocess gets an isolated HOME so its ~/.unbound state + single-flight lock go to a throwaway dir (stops tests touching real home state and the lock-contention flakiness). Migrate the two queue tests off the old QUEUE_FILE monkeypatch; add a regression test that the env override is honored and the real /var/tmp path is never used. - Remove the now-redundant tests/conftest.py. Deferred as separate follow-ups (intentionally out of scope): backend device_id sentinel denylist, DeviceDiscovery pruning, and the one-off delete of the existing QUEUED-DEVICE row (this change is preventative, not retroactive). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Trim verbose test comments to concise necessary ones Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Address PR review: clean up test temp dir + mkdir queue parent before write - tests/__init__.py: register atexit cleanup for the session temp queue dir so it doesn't accumulate in /tmp on long-lived CI machines (greptile r3380308482). - utils._write_file_secure: mkdir(parents=True, exist_ok=True) before write so a queue-path override with a missing parent can't silently drop failed reports (greptile r3380308569). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Accumulate all users' global MCP configs under root (fix multi-user first-match drop) (#195) * Accumulate all users' global MCP configs under root (fix first-match drop) On a multi-user macOS/Windows machine scanned as root, the global-MCP config helpers returned only the FIRST user's config and silently dropped every other user's. Three helper copies had this `if config: return config`-inside-the-loop bug: - extract_global_mcp_config_with_root_support (Cursor, Cursor CLI, Gemini CLI, Windsurf, Antigravity) - extract_codex_global_mcp_config_with_admin_support (Codex; TOML) - opencode's own copy (macOS + Windows) Claude Code / Cline / Roo Code already accumulated correctly and are untouched. Each helper now accumulates every user's config into a list, de-duped by the config's "path" (guards the Windows admin-own-home double-count), and keeps the admin's own home as a FALLBACK ONLY (`if not configs:`) to preserve today's single-user-root semantics exactly. The 14 thin callers change `projects.append(global_config)` -> `projects.extend(self._extract_global_config())`; `extract_mcp_config` still returns {"projects": [...]}. Strictly additive: single-user and non-admin machines are byte-for-byte unaffected (a 1-element list, identical content/ordering); output only grows for multi-user-root, recovering the previously-dropped users. Nothing reported today is dropped. Adds tests/test_multiuser_global_mcp.py (19 cases) covering Cursor (JSON), Codex (TOML), and opencode (macOS + the divergent Windows variant): both-users-recovered regression, single-user-unchanged, dedup, and fallback-only/fallback-suppressed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Hoist the accumulate+dedup+fallback loop into one helper (no behavior change) The first-match-drop fix landed the same accumulate + de-dup-by-path + fallback-only inner loop in four places: the JSON helper (extract_global_mcp_config_with_root_support), the Codex/TOML helper (extract_codex_global_mcp_config_with_admin_support), and OpenCode's own copy on macOS and Windows. Four copies of the same subtle loop is exactly how the bug got duplicated in the first place. Extract that loop into a single private helper, _accumulate_per_user_with_fallback(user_homes, global_config_path, reader_fn, tool_name, parent_levels), and have all four call sites defer to it. The helper has exactly two seams: the already-resolved user_homes list (empty when not admin) and the tool's reader_fn. Detection deliberately stays at each call site, NOT unified: - the JSON helper keeps its platform/is_admin block + _iter_admin_user_homes - Codex reproduces its own user-dir filter verbatim ([d for d in users_dir.iterdir() if d.is_dir() and not d.name.startswith('.')]) - each OpenCode extractor keeps its own admin check The helper's docstring and a call-site comment record WHY it must never grow a Linux branch: Linux uses a separate per-user extractor, and an accumulated-or-first-match walk here would silently drop users there. Behavior-preserving: the extracted loop is byte-for-byte the prior inline logic, so single-user and non-admin machines are unaffected and the multi-user-root recovery is identical. Dead extract_global_mcp_config_with_root_support imports are dropped from both OpenCode extractors. The three OpenCode test methods that whole-symbol-rebind a module-local Path shim now also patch helpers.Path, since relative_to(Path.home()) resolution moved into the shared helper's module (the Codex/Cursor tests mutate pathlib.Path.home directly, so they need no change). No test behavior or assertions change; no new tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Keep per-user exists() inside the try (restore byte-identical skip-one-user) The loop hoist in the previous commit narrowed the per-user try to cover only relative_to(), leaving Path.exists() outside the guard. All four prior inline loops wrapped exists() inside the try, so a single user's filesystem error (ValueError, or an OSError such as EACCES from Path.exists() on a Python 3.9 locked / NFS-mounted home -- exists() re-raises anything not ENOENT/ENOTDIR/ EBADF/ELOOP) skipped only that user. Outside the guard the error instead propagates out of the helper; the orchestrator catches it but drops the WHOLE tool's MCP config, losing every other user on the machine -- a regression in exactly the multi-user-root case this PR hardens. Move exists() back inside the per-user try so the helper is byte-identical to all four originals: a per-user path/permission error skips only that user. Adds a deterministic lock-in test that drives the helper with one user whose exists() raises PermissionError (iterated first) and asserts the other user still survives -- it errors against the pre-fix exists()-outside-try shape. Found by principal-engineer review of the refactor commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Merge pull request #196 from websentry-ai/WEB-4755 WEB-4755: Discovery self-timeout, signal cleanup, and dead-PID lock recovery --------- Co-authored-by: Sumit Badsara <sumit@unboundsecurity.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Sumit Badsara <sumitbadsara.dev@gmail.com> Co-authored-by: NandaPranesh <106886030+anonpran@users.noreply.github.com> Co-authored-by: Aakash Velusamy <aakashvpsgtech@gmail.com> Co-authored-by: MohamedAklamaash <aklamaash78@gmail.com> Co-authored-by: vishnu <vishnuvinod072@gmail.com> Co-authored-by: Vishnu <79318686+zeus-12@users.noreply.github.com> Co-authored-by: Mohamed Aklamaash M.R <111295679+MohamedAklamaash@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Vignesh Subbiah <51325334+vigneshsubbiah16@users.noreply.github.com>

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py

Sumit Badsara and others added 2 commits June 5, 2026 00:34

sumit-badsara changed the title ~~Report host device_id from containers via injected UNBOUND_DEVICE_SERIAL~~ Container-aware device identity: inject host device_id + namespace home_user per container Jun 4, 2026

sumit-badsara mentioned this pull request Jun 4, 2026

Approach B: own device_id + is_container flag + stable container id #170

Merged

anonpran approved these changes Jun 5, 2026

View reviewed changes

sumit-badsara changed the base branch from main to staging June 5, 2026 15:45

sumit-badsara requested a review from a team June 5, 2026 15:45

sumit-badsara closed this Jun 5, 2026

sumit-badsara reopened this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Container-aware device identity: inject host device_id + namespace home_user per container#165

Container-aware device identity: inject host device_id + namespace home_user per container#165
sumit-badsara wants to merge 3 commits into
stagingfrom
feat/device-id-host-override

sumit-badsara commented Jun 4, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sumit-badsara commented Jun 4, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

container_id resolution & the stability trade-off

Safety / scope

Tests

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sumit-badsara commented Jun 4, 2026 •

edited by greptile-apps Bot

Loading

`container_id` resolution & the stability trade-off