WEB-4679: declare a scan manifest so the backend can prune uninstalled tools by anonpran · Pull Request #187 · websentry-ai/coding-discovery-tool

anonpran · 2026-06-08T13:08:19Z

Summary

The discovery agent never told the backend which tools were still installed, so an uninstalled tool's row persisted on the dashboard forever. The completed scan event now carries a manifest of the (home_user, tool_name) pairs successfully scanned this run, plus covered_home_users, so the backend can reconcile by set-difference and soft-delete what's gone.

Changes

ai_tools_discovery.py: accumulate scanned_manifest on the send-success and dedup hash-match paths only — a tool that errored on read is deliberately left out so it's never mistaken for uninstalled. Pass manifest + covered_home_users (= all enumerated OS users, so a user who removed their last tool is still in scope) to the completed send_scan_event.
utils.py: send_scan_event gains optional manifest / covered_home_users, inserted into the payload only when present (backward compatible).

Stdlib-only; the accumulation is pure in-memory and cannot raise (runs on customer machines).

Cross-repo

Pairs with the ai-gateway-data WEB-4679 PR (reconcile + removed_at soft-delete). Forward/backward compatible — an old backend simply ignores the new fields, deploy order independent.

Test plan

tests/test_scan_completed_manifest.py (8 tests): manifest carried on the completed event; success + hash-match included, errored read excluded; covered_home_users includes a zero-tool user; legacy call omits both keys; non-completed events carry no manifest.

🤖 Generated with Claude Code

Note

Medium Risk
Changes affect inventory reconciliation (false manifest entries could soft-delete live tools); mitigations include incomplete-scan fail-closed, detection-time presence, and per-user detector scoping, but backend pruning behavior must stay aligned.

Overview
WEB-4679 fixes stale dashboard rows by telling the backend which (home_user, tool_name) pairs are still present when a scan finishes.

The completed lifecycle event can now include manifest and covered_home_users. send_scan_event adds those fields only when provided, so older callers stay compatible. Discovery builds scanned_manifest during per-user detection (not from upload success), keeps tools on read/upload failures, and omits the manifest entirely when any detector errors mark the run incomplete so the backend does not prune from partial data. Reporting is gated so globally deduped tools are only attributed to users who actually detected them; Copilot CLI / Augment ownership skips discard manifest entries. Metrics gain manifest_size and scan_incomplete.

Cline, Kilo Code, and Roo Code on macOS, Linux, and Windows now respect detector.user_home from detect_tool_for_user instead of re-enumerating all homes under elevated scans.

New tests/test_scan_completed_manifest.py covers payload shape, detection vs extraction failures, detector errors, phantom ownership, and per-user detector scoping.

^{Reviewed by Cursor Bugbot for commit 05f400c. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

This PR adds a scan manifest so the backend can prune tools that are no longer installed. The main changes are:

Completed scan events can include detected (home_user, tool_name) pairs.
Completed scan events can include the covered home-user scope.
Detector failures suppress prune-bearing completed payload fields.
Per-user extension scans now avoid cross-user attribution for Cline, Kilo Code, and Roo Code.
Tests cover manifest payloads, read and upload failures, detector failures, ownership scoping, and stable prune keys.

Confidence Score: 5/5

This looks safe to merge.

No blocking issues found in the changed code.

Important Files Changed

Filename	Overview
scripts/coding_discovery_tools/ai_tools_discovery.py	Builds the completed-scan manifest from detected per-user presence and suppresses prune scope when detector errors make the scan incomplete.
scripts/coding_discovery_tools/utils.py	Adds optional manifest and covered-home-user fields to scan lifecycle payloads.
scripts/coding_discovery_tools/linux/cline/cline.py	Scopes Linux Cline detection to the active per-user home when available.
scripts/coding_discovery_tools/linux/kilocode/kilocode.py	Scopes Linux Kilo Code detection to the active per-user home when available.
scripts/coding_discovery_tools/linux/roo_code/roo_code.py	Scopes Linux Roo Code detection to the active per-user home when available.
scripts/coding_discovery_tools/macos/cline/cline.py	Scopes macOS Cline detection to the active per-user home when available.
scripts/coding_discovery_tools/macos/kilocode/kilocode.py	Scopes macOS Kilo Code detection to the active per-user home when available.
scripts/coding_discovery_tools/macos/roo_code/roo_code.py	Scopes macOS Roo Code detection to the active per-user home when available.
scripts/coding_discovery_tools/windows/cline/cline.py	Scopes Windows Cline detection to the active per-user home when available.
scripts/coding_discovery_tools/windows/kilocode/kilocode.py	Scopes Windows Kilo Code detection to the active per-user home when available.
scripts/coding_discovery_tools/windows/roo_code/roo_code.py	Scopes Windows Roo Code detection to the active per-user home when available.
tests/test_scan_completed_manifest.py	Adds coverage for manifest emission, failure handling, per-user ownership, and stable tool names.

Comments Outside Diff (2)

scripts/coding_discovery_tools/ai_tools_discovery.py, line 2646-2649 (link)

Upload failure silently drops tool from manifest — may trigger false soft-delete

When send_report_to_backend returns (False, retryable=True), the report is queued for the next run but the tool is never appended to scanned_manifest. The backend then receives a covered_home_users entry for the user but no manifest entry for this tool, which it may interpret as "tool uninstalled" and issue a soft-delete — even though the tool is still present; the upload just failed transiently.

The PR description states the exclusion criterion as "a tool that errored on read", and the comment on line 2644 echoes "Successfully read and uploaded." Upload failures are not read failures, so by the stated design intent the tool should still be manifested. The fix is to record the tool in scanned_manifest as soon as filter_tool_projects_by_user and generate_single_tool_report succeed (i.e., after the report is built), independent of whether the HTTP upload succeeds.
scripts/coding_discovery_tools/ai_tools_discovery.py, line 2712-2736 (link)

No metrics instrumented for the new manifest/pruning flow

The existing sentry_metrics_payload (sent via send_discovery_metrics) tracks tool_count and user_count, but adds nothing for the new manifest. Without manifest-specific metrics it will be impossible to diagnose backend pruning anomalies in production. Concrete metrics to add to the metadata block:
- manifest_size: len(scanned_manifest) — how many (tool, user) pairs were recorded this run.
- manifest_excluded_reads: tools that errored on read (the key correctness signal for the errored-read exclusion logic).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (11): Last reviewed commit: "fix(web-4679): scope Linux/Windows exten..." | Re-trigger Greptile}

…d tools The discovery agent never told the backend which tools were still installed, so an uninstalled tool's row persisted on the dashboard forever. The completed scan event now carries a manifest of the (home_user, tool_name) pairs successfully scanned this run plus covered_home_users, letting the backend reconcile by set-difference and soft-delete what's gone. - ai_tools_discovery.py: accumulate scanned_manifest on the send-success and dedup hash-match paths only (a tool that errored on read is left out so it is never mistaken for uninstalled); pass manifest + covered_home_users (= all enumerated OS users, so a user who removed their last tool is still in scope) to the completed send_scan_event. - utils.py: send_scan_event gains optional manifest / covered_home_users, inserted into the payload only when present (backward compatible). Stdlib-only; the accumulation is pure in-memory and cannot raise. Pairs with the ai-gateway-data WEB-4679 reconcile change (forward/backward compatible: an old backend simply ignores the new fields). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…covery-never-prunes-uninstalled-tools-removed-tools

Addresses Greptile P2: accumulate the manifest as a set of tuples (a pair can never be double-recorded) and serialize to a sorted list of dicts at the send site. Functionally equivalent today (the success and hash-match branches are mutually exclusive, one entry per (tool, user)), but removes the latent duplicate risk and makes the output deterministic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tighten comments Greptile flag: a tool whose READ succeeded but whose upload failed transiently was dropped from the manifest, so the backend could mistake a network blip for an uninstall and prune a live tool (enforce mode). Record it in the send-failure branch too — the manifest tracks what was SEEN, not what uploaded. Adds a regression test (read-success + upload-failure stays in the manifest). Also condense the WEB-4679 comments to concise one-liners. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ve tool A tool whose read errors without a scan_event=failed (the generic per-user except, the per-tool except, or a PermissionError whose failed-event send itself fails) leaves the manifest possibly missing an installed tool — the backend would then set-diff it as uninstalled and wrongly prune it. Track scanned_manifest_complete and send manifest=None (backend treats as legacy = no prune) when the run wasn't fully read, deferring cleanup to a clean run. Adds a test forcing a generic (non-Permission) read error -> manifest=None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… success A read/extraction error no longer drops a live tool from the manifest (presence is recorded before extraction) and never fail-closes the whole manifest to None — so one tool's failure can't block pruning every other tool on the device. A detector that errors is kept in the manifest (presence unknown, not an uninstall). Removes the global scanned_manifest_complete fail-close. Tests rewritten to the new contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vigneshsubbiah16 · 2026-06-26T18:00:50Z

🛡️ Automated Security Review (consensus)

1 finding — 0 high-confidence, 1 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

Findings

🟡 TRIAGE: Device-wide error over-expands manifest across all users

scripts/coding_discovery_tools/ai_tools_discovery.py:2696

Impact: On a device-wide processing exception, the handler adds (u, tool_name) for every enumerated user, even if only one user had the tool detected — stale inventory rows may persist and backend pruning is blocked for unaffected users.

Fix: On device-wide failure, add manifest entries only for users where the tool was actually detected this run (e.g., track per-user detection before the outer try, or re-use per-user failure sets), rather than iterating all_users.

Flagged by: Cursor

Notes

Claude, Semgrep, Gitleaks: no traditional security issues (no secrets, injection, auth bypass, or new sensitive data exposure).
Greptile upload-failure / dedup comments: superseded by the current diff — manifest is recorded at detection time (before read/upload) and scanned_manifest is a set; covered by test_upload_failure_keeps_tool_in_manifest.

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head bcb482e2 · 2026-06-26T18:00Z

…talled-tools-removed-tools Conflicts were purely additive — union both sides: - utils.send_scan_event: keep staging's system_user param alongside WEB-4679's manifest + covered_home_users (and their docstrings); body already adds all three. - ai_tools_discovery completed event: pass system_user + manifest + covered_home_users. Manifest-from-presence fix auto-merged cleanly. Manifest tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vigneshsubbiah16 · 2026-06-26T18:20:57Z

🛡️ Automated Security Review (consensus)

1 finding — 0 high-confidence, 1 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

Findings

🟡 TRIAGE: Device-wide `process_single_tool` failure expands manifest to all users

scripts/coding_discovery_tools/ai_tools_discovery.py:3217

Impact: On a device-wide extraction exception, the manifest adds (user, tool_name) for every entry in covered_home_users, which can block backend pruning for users who no longer have that tool (stale inventory persists on the dashboard).

Fix: Scope manifest entries to users where per-user detection actually found the tool, or send a separate “presence unknown” signal for device-wide failures instead of manifesting the tool for all covered users.

Reviewers: Cursor (Bugbot), Lead

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head e7e7a9e5 · 2026-06-26T18:20Z

…n detector error Addresses two Greptile P1s on the scan-manifest: - Phantom ownership: the manifest was added for every (user x globally-deduped tool), so a user-scoped tool one user has was attributed to co-resident users who never detected it -> backend could never prune their stale rows. Build the manifest from per-user detection, and only emit a report for a (user, tool) the user actually detected (gate on manifest membership). Add the missing Copilot CLI ownership discard (only Augment had it). - Umbrella names: a detector failure recorded detector.tool_name (e.g. 'GitHub Copilot'), not the concrete row names ('GitHub Copilot (VS Code)'), so the set-diff would prune the real surface rows. Instead mark the scan unclean (a 'failed' event before 'completed') so the backend skips pruning. Also add manifest_size + scan_incomplete to discovery metrics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vigneshsubbiah16 · 2026-06-27T06:01:22Z

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

✅ Security consensus: no issues found. (reviewers: Cursor, Claude, Semgrep, Gitleaks)

Previously acknowledged (not re-flagged)

Cross-user / phantom manifest entries — Addressed in this revision: manifest is built from per-user detection plus an ownership guard/discard; PR design intent is “only users who detected the tool get an entry.”
Upload failure drops tool from manifest — False positive on an earlier revision; manifest is recorded at detection/presence time (before read/upload), so transient upload failures do not look like uninstalls.
Detector failure uses umbrella tool_name — Code design choice: detector errors mark the run ScanIncomplete (failed event) instead of adding umbrella labels that could mismatch concrete backend row names.
Incomplete scan still sends completed with manifest — Accepted design: emit ScanIncomplete/failed before completed so the backend skips pruning when presence is unknown; inventory-reconciliation correctness, not a new exploit surface.
Missing manifest observability metrics — Addressed: manifest_size and scan_incomplete added to discovery metrics metadata.
scanned_manifest deduplication — Addressed: manifest accumulator is a set of (home_user, tool_name) tuples.

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 83a19b37 · 2026-06-27T06:01Z

Regression guard for the discovery prune key: the JetBrains tool name must stay the bare display_name (version + license/plan kept in separate fields), so a version bump or Free<->Licensed change never orphans the install row against the manifest. Fails CI if a future change re-embeds version/plan into the name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vigneshsubbiah16 · 2026-06-27T14:10:59Z

🛡️ Automated Security Review (consensus)

1 finding — 0 high-confidence, 1 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks.

Findings

🟡 TRIAGE: Incomplete scan still ships prune manifest on `completed`

scripts/coding_discovery_tools/ai_tools_discovery.py:3286-3307
Impact: When detector/extraction errors mark the run unclean, a failed/ScanIncomplete event is sent but its success is ignored and completed still carries manifest + covered_home_users; if the failed event is lost, backend reconcile may soft-delete tools that were skipped due to errors.
Fix: Skip manifest fields on completed when incomplete_reasons is non-empty, or abort/retry the completed send unless the failed event was acknowledged.
Flagged by: Cursor

Previously acknowledged (not re-flagged)

Upload failure drops tool from manifest — Addressed by design: manifest is recorded at per-user detection time, not upload success (test_upload_failure_keeps_tool_in_manifest).
Phantom cross-user manifest entries from global all_tools dedup — Addressed by design: manifest built per-user at detection with a (user_name, tool_name) not in scanned_manifest guard before reporting.
Detector failure uses umbrella tool_name and can mis-target prune keys — Accepted design choice: detector errors mark the scan unclean (ScanIncomplete) and skip adding umbrella names; backend skips pruning for that run.
scanned_manifest list deduplication — Addressed in code: uses a set of (home_user, tool_name) tuples.
Missing manifest/pruning metrics — Addressed in code: manifest_size and scan_incomplete added to discovery metrics.

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head a62873de · 2026-06-27T14:10Z

…gnal Addresses the re-review on the prior commit: - Phantom ownership (Greptile P1): Roo/Cline/Kilo detect() now scope to the per-user home set by detect_tool_for_user, so a root multi-user scan no longer self-enumerates /Users and attributes every user's extension to the outer-loop user. - Incomplete-scan false-delete (Cursor + Greptile P1): an incomplete scan now sends NO manifest on the completed event itself (atomic), instead of a separate 'failed' event whose loss could leave the backend pruning a partial inventory. Removed the fragile separate event. - Trimmed verbose comments; folded the standalone JetBrains determinism test into test_scan_completed_manifest.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a7799b7. Configure here.}

vigneshsubbiah16 · 2026-06-27T15:01:12Z

🛡️ Automated Security Review (consensus)

1 finding — 0 high-confidence, 1 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks, Lead.

Findings

🟡 TRIAGE: Incomplete scan sends prune scope without inventory

scripts/coding_discovery_tools/ai_tools_discovery.py:3286-3293
Impact: When detector errors occur, manifest is omitted (None) but covered_home_users is still sent; if the backend reconciles from user scope without requiring a manifest, live tool rows could be soft-deleted and disappear from security monitoring.
Fix: Omit covered_home_users whenever manifest is None, or add an explicit skip-prune signal on the completed event (e.g. scan_incomplete: true) so reconciliation cannot run on a partial scan.
Reviewers: Greptile, Cursor (prior), Lead

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head a7799b7d · 2026-06-27T15:01Z

Comment-only: concise module/class/test docstrings; fix stale references; no test logic change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vigneshsubbiah16

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks, Lead.

✅ Security consensus: no issues found. (reviewers: Cursor, Claude, Semgrep, Gitleaks, Lead)

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 75ac3747 · 2026-06-27T15:11Z

… covered scope on incomplete scan Greptile re-review residuals (both P1, on the latest head): - Phantom ownership: the per-user scoping fix was macOS-only. Linux + Windows Cline/Kilo/Roo detect() now also scope to the per-user home set by detect_tool_for_user, so an elevated scan can't attribute one user's extension to another. - Incomplete scope: on a detector failure the completed event now omits covered_home_users too (not just manifest), so the backend never sees a prune scope without an inventory. Tests: extension-detector-scopes-to-user-home + detector-error-omits-covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vigneshsubbiah16 · 2026-06-27T18:11:16Z

🛡️ Automated Security Review (consensus)

0 findings — 0 high-confidence, 0 to triage. Reviewers: Cursor, Claude, Semgrep, Gitleaks, Lead.

✅ Security consensus: no issues found. (reviewers: Cursor, Claude, Semgrep, Gitleaks, Lead)

🤖 consensus review · reviewers: Cursor,Claude,Semgrep,Gitleaks · head 05f400c0 · 2026-06-27T18:11Z

anonpran and others added 2 commits June 8, 2026 18:35

Merge remote-tracking branch 'origin/staging' into nanda/web-4679-dis…

6402e54

…covery-never-prunes-uninstalled-tools-removed-tools

anonpran requested a review from a team June 8, 2026 13:08

greptile-apps Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

anonpran and others added 4 commits June 8, 2026 19:08

cursor Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

greptile-apps Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

greptile-apps Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py

cursor Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

greptile-apps Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

anonpran force-pushed the nanda/web-4679-discovery-never-prunes-uninstalled-tools-removed-tools branch from 7fddbe9 to a7799b7 Compare June 27, 2026 14:50

greptile-apps Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py

cursor Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py

chore(web-4679): trim verbose docstrings/comments in manifest tests

75ac374

Comment-only: concise module/class/test docstrings; fix stale references; no test logic change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py

Comment thread scripts/coding_discovery_tools/ai_tools_discovery.py Outdated

vigneshsubbiah16 reviewed Jun 27, 2026

View reviewed changes

Uh oh!

Conversation

anonpran commented Jun 8, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Cross-repo

Test plan

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

vigneshsubbiah16 commented Jun 26, 2026

🛡️ Automated Security Review (consensus)

Findings

🟡 TRIAGE: Device-wide error over-expands manifest across all users

Notes

Uh oh!

vigneshsubbiah16 commented Jun 26, 2026

🛡️ Automated Security Review (consensus)

Findings

🟡 TRIAGE: Device-wide process_single_tool failure expands manifest to all users

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vigneshsubbiah16 commented Jun 27, 2026

🛡️ Automated Security Review (consensus)

Previously acknowledged (not re-flagged)

Uh oh!

vigneshsubbiah16 commented Jun 27, 2026

🛡️ Automated Security Review (consensus)

Findings

🟡 TRIAGE: Incomplete scan still ships prune manifest on completed

Previously acknowledged (not re-flagged)

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vigneshsubbiah16 commented Jun 27, 2026

🛡️ Automated Security Review (consensus)

Findings

🟡 TRIAGE: Incomplete scan sends prune scope without inventory

Uh oh!

Uh oh!

Uh oh!

vigneshsubbiah16 left a comment

Choose a reason for hiding this comment

🛡️ Automated Security Review (consensus)

Uh oh!

vigneshsubbiah16 commented Jun 27, 2026

🛡️ Automated Security Review (consensus)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anonpran commented Jun 8, 2026 •

edited by greptile-apps Bot

Loading

🟡 TRIAGE: Device-wide `process_single_tool` failure expands manifest to all users

🟡 TRIAGE: Incomplete scan still ships prune manifest on `completed`