Staging by zeus-12 · Pull Request #221 · websentry-ai/coding-discovery-tool

zeus-12 · 2026-07-01T05:45:12Z

Note

Medium Risk
Changes inventory accuracy on Linux and adds authenticated backend calls reading local Claude session data; sweep failures are isolated from scan success but wrong detection still affects reported tools.

Overview
Adds a best-effort connector UUID sweep at the end of each discovery run: the agent fetches unresolved UUIDs from the backend, maps them from local Claude Code / CoWork session JSON, and POSTs display names and tools via sweep_connectors.run_sweep (also runnable standalone with curl).

Linux detection false positives are tightened to match macOS/Windows: Cursor and Windsurf no longer treat leftover ~/.cursor / ~/.windsurf as installs—only real binaries count. Replit Desktop drops the which replit and replit --version paths that collided with the PyPI replit package; detection stays on the Electron resource tree (app.asar or package.json).

Tests cover residue-only homes, real binaries, and the Replit PyPI collision case.

^{Reviewed by Cursor Bugbot for commit 99e8c54. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

This PR tightens Linux AI-tool discovery and adds a Claude connector sweep. The main changes are:

Removed residue-only Linux detection for Cursor and Windsurf.
Removed Replit PATH/version fallbacks that collide with the PyPI replit command.
Added tests for residue and version-detection behavior.
Added a best-effort sweep that resolves Claude connector UUIDs from local session files.

Confidence Score: 1/5

The connector sweep can silently skip valid discovery runs that use a scheme-less domain.

The Linux detector changes are focused and covered by targeted tests.
The new sweep runs after the main scan, so it should not break normal discovery reporting.
The sweep builds endpoint URLs differently from the existing reporting path.
Opaque connector IDs are normalized before being sent back to the backend.

scripts/coding_discovery_tools/sweep_connectors.py

Important Files Changed

Filename	Overview
scripts/coding_discovery_tools/sweep_connectors.py	Adds the connector UUID sweep; URL normalization and opaque UUID casing need follow-up.
scripts/coding_discovery_tools/ai_tools_discovery.py	Runs the connector sweep after the main discovery flow and keeps it best-effort.
scripts/coding_discovery_tools/linux/cursor/cursor.py	Removes the `~/.cursor` residue fallback while preserving binary-based detection.
scripts/coding_discovery_tools/linux/replit/replit.py	Requires a real Replit resource tree and avoids the colliding `replit` command.
scripts/coding_discovery_tools/linux/windsurf/windsurf.py	Removes the `~/.windsurf` residue fallback while preserving binary-based detection.

_{Reviews (1): Last reviewed commit: "Merge pull request #220 from websentry-a..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

Context used:

Rule used - Ensure that the confidence score is always within ... (source)

Learned From
websentry-ai/ai-gateway-data#448

…ndsurf, replit (WEB-4771) (#205) * fix(linux/discovery): drop residue/collision fallbacks for cursor, windsurf, replit (WEB-4771) Three Linux-only detectors reported a tool that wasn't really installed: - Cursor: fell back to ~/.cursor dir existence (survives uninstall, shared with Cursor CLI/rules) -> phantom row. Drop it; binary gates only. - Windsurf: fell back to ~/.windsurf dir existence (~475 MB, survives uninstall) -> phantom row. Drop it; binary gates only. - Replit: `which replit` backstop name-collided with the PyPI `replit` package's console script -> phantom Replit Desktop for any dev who `pip install replit`'d. Drop the backstop; the /usr/lib/replit resource-tree gate is authoritative. Remove the now-orphaned _check_replit_command helper. The macOS/Windows variants already gate on the app/binary only, so this just brings Linux to parity. Pure detection-accuracy fix. Tests: new test_cursor_residue_detection + test_windsurf_residue_detection (residue-only -> not detected; real binary -> still detected); reworked the replit backstop test into test_which_replit_pypi_collision_not_detected (`which replit` resolves but no resource tree -> not detected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(linux/replit): drop replit --version fallback; scrub tracking ids from comments Address review on the version side of the same name-collision: get_version's `replit --version` fallback (_version_via_command) is subject to the identical PyPI `replit` collision — an asar-only Desktop install on a machine with the PyPI package would report the PyPI version instead of "Unknown", contradicting the docstring. Drop the fallback (and the now-unused run_command/VERSION_TIMEOUT imports); an asar-only install yields None -> detect() reports "Unknown" as documented. Update the affected version/residue tests (their run_command stubs are now obsolete). Also remove ticket/PR identifiers from the in-code comments and docstrings added by this change, keeping the explanations professional. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Fetch the backend's unresolved-UUID worklist, match each against the local Claude session files (Claude Code + CoWork folders), and report the resolved name + tools back via the single-server scan endpoint with the originating UUID. Only UUIDs the backend asked for are sent; HTTP via curl per the Zscaler constraint.

Call run_sweep at the end of the discovery run so connector UUIDs self-heal on the tool's existing periodic cadence. Best-effort — never affects the discovery outcome.

Device sweep: resolve Claude connector UUIDs

greptile-apps · 2026-07-01T05:47:23Z

+def _normalize_url(url):
+    return (url or "").rstrip("/")


Scheme-Less Domains Skip Sweep

When discovery is run with a domain like api.example.com, the main reporting path accepts it by adding https://, but this new sweep builds api.example.com/api/v1/... directly. Curl then fails before fetching unresolved UUIDs, and the caller only logs that at debug level, so connector resolution silently never runs for that accepted input.

Suggested change

def _normalize_url(url):

return (url or "").rstrip("/")

def _normalize_url(url):

url = (url or "").strip()

if url and "://" not in url:

url = f"https://{url}"

return url.rstrip("/")

greptile-apps · 2026-07-01T05:47:24Z

+        for entry in (data.get("remoteMcpServersConfig") or []):
+            if not isinstance(entry, dict):
+                continue
+            uuid = (entry.get("uuid") or "").strip().lower()


Opaque UUIDs Are Lowercased

The backend worklist is an opaque list of connector UUIDs, but the sweep lowercases both the local key and the value later sent back as connector_uuid. If the backend stored a mixed-case opaque ID and compares it exactly, the local session entry matches but the POST uses a different ID, so the resolver can reject it or update the wrong lookup key.

cursor

Cursor Bugbot has reviewed your changes using high effort and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.}

cursor · 2026-07-01T05:48:54Z

+
+
+def _normalize_url(url):
+    return (url or "").rstrip("/")


Sweep skips https scheme normalization

High Severity

The _normalize_url function in sweep_connectors.py only strips trailing slashes, unlike the utils.normalize_url used by other discovery components which also adds an https:// scheme. This difference means the connector sweep can fail to reach the backend when provided with a bare domain, even if other discovery operations succeed.

^{Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.}

cursor · 2026-07-01T05:48:54Z

+
+    files = []
+    for sub in SESSION_SUBDIRS:
+        folder = base / sub


Sweep reads one user’s Claude dir

Medium Severity

The read_local_connectors function only checks the current user's Claude session files. On Linux, when the sweep runs as root or a service account, this means it misses connectors located in other users' home directories, preventing their resolution. This contrasts with how other discovery processes find user-specific data.

^{Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.}

cursor · 2026-07-01T05:48:54Z

+            sent, failed, matched = run_sweep(args.domain, args.api_key)
+            logger.info(f"Connector UUID sweep: resolved {sent}, failed {failed}, matched {matched}")
+        except Exception as sweep_err:
+            logger.debug(f"Connector UUID sweep failed: {sweep_err}")


Sweep runs under watchdog

Medium Severity

run_sweep runs at the end of the main try, after the scan completed event, but before finally sets _finished and cancels the watchdog timer. Each unresolved connector can spend up to ~90s in subprocess timeouts, so a long sweep can still trigger _abort for exceeding args.timeout, sending a failed scan event and os._exit(1) after the backend already recorded completion.

Additional Locations (1)

scripts/coding_discovery_tools/ai_tools_discovery.py#L2771-L2776

^{Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.}

AakashVelusamy and others added 4 commits June 29, 2026 18:19

Run the connector sweep as part of discovery

c728c8d

Call run_sweep at the end of the discovery run so connector UUIDs self-heal on the tool's existing periodic cadence. Best-effort — never affects the discovery outcome.

Merge pull request #220 from websentry-ai/vv/mcp-connector-sweep

99e8c54

Device sweep: resolve Claude connector UUIDs

zeus-12 requested a review from a team July 1, 2026 05:45

MohamedAklamaash approved these changes Jul 1, 2026

View reviewed changes

zeus-12 merged commit 5ee75aa into main Jul 1, 2026
9 checks passed

greptile-apps Bot reviewed Jul 1, 2026

View reviewed changes

cursor Bot reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Staging#221

Staging#221
zeus-12 merged 4 commits into
mainfrom
staging

zeus-12 commented Jul 1, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

greptile-apps Bot Jul 1, 2026

Uh oh!

greptile-apps Bot Jul 1, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jul 1, 2026

Uh oh!

cursor Bot Jul 1, 2026

Uh oh!

cursor Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-def _normalize_url(url):
-    return (url or "").rstrip("/")
+def _normalize_url(url):
+    url = (url or "").strip()
+    if url and "://" not in url:
+        url = f"https://{url}"
+    return url.rstrip("/")

Uh oh!

Conversation

zeus-12 commented Jul 1, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Uh oh!

Uh oh!

greptile-apps Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Sweep skips https scheme normalization

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Sweep reads one user’s Claude dir

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Sweep runs under watchdog

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zeus-12 commented Jul 1, 2026 •

edited by greptile-apps Bot

Loading