Skip to content

Staging#221

Merged
zeus-12 merged 4 commits into
mainfrom
staging
Jul 1, 2026
Merged

Staging#221
zeus-12 merged 4 commits into
mainfrom
staging

Conversation

@zeus-12

@zeus-12 zeus-12 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Note

Medium Risk
Changes inventory accuracy on Linux and adds authenticated backend calls reading local Claude session data; sweep failures are isolated from scan success but wrong detection still affects reported tools.

Overview
Adds a best-effort connector UUID sweep at the end of each discovery run: the agent fetches unresolved UUIDs from the backend, maps them from local Claude Code / CoWork session JSON, and POSTs display names and tools via sweep_connectors.run_sweep (also runnable standalone with curl).

Linux detection false positives are tightened to match macOS/Windows: Cursor and Windsurf no longer treat leftover ~/.cursor / ~/.windsurf as installs—only real binaries count. Replit Desktop drops the which replit and replit --version paths that collided with the PyPI replit package; detection stays on the Electron resource tree (app.asar or package.json).

Tests cover residue-only homes, real binaries, and the Replit PyPI collision case.

Reviewed by Cursor Bugbot for commit 99e8c54. Bugbot is set up for automated code reviews on this repo. Configure here.

Greptile Summary

This PR tightens Linux AI-tool discovery and adds a Claude connector sweep. The main changes are:

  • Removed residue-only Linux detection for Cursor and Windsurf.
  • Removed Replit PATH/version fallbacks that collide with the PyPI replit command.
  • Added tests for residue and version-detection behavior.
  • Added a best-effort sweep that resolves Claude connector UUIDs from local session files.

Confidence Score: 1/5

The connector sweep can silently skip valid discovery runs that use a scheme-less domain.

  • The Linux detector changes are focused and covered by targeted tests.
  • The new sweep runs after the main scan, so it should not break normal discovery reporting.
  • The sweep builds endpoint URLs differently from the existing reporting path.
  • Opaque connector IDs are normalized before being sent back to the backend.

scripts/coding_discovery_tools/sweep_connectors.py

Important Files Changed

Filename Overview
scripts/coding_discovery_tools/sweep_connectors.py Adds the connector UUID sweep; URL normalization and opaque UUID casing need follow-up.
scripts/coding_discovery_tools/ai_tools_discovery.py Runs the connector sweep after the main discovery flow and keeps it best-effort.
scripts/coding_discovery_tools/linux/cursor/cursor.py Removes the ~/.cursor residue fallback while preserving binary-based detection.
scripts/coding_discovery_tools/linux/replit/replit.py Requires a real Replit resource tree and avoids the colliding replit command.
scripts/coding_discovery_tools/linux/windsurf/windsurf.py Removes the ~/.windsurf residue fallback while preserving binary-based detection.

Reviews (1): Last reviewed commit: "Merge pull request #220 from websentry-a..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Rule used - Ensure that the confidence score is always within ... (source)

Learned From
websentry-ai/ai-gateway-data#448

AakashVelusamy and others added 4 commits June 29, 2026 18:19
…ndsurf, replit (WEB-4771) (#205)

* fix(linux/discovery): drop residue/collision fallbacks for cursor, windsurf, replit (WEB-4771)

Three Linux-only detectors reported a tool that wasn't really installed:

- Cursor:   fell back to ~/.cursor dir existence (survives uninstall, shared
            with Cursor CLI/rules) -> phantom row. Drop it; binary gates only.
- Windsurf: fell back to ~/.windsurf dir existence (~475 MB, survives
            uninstall) -> phantom row. Drop it; binary gates only.
- Replit:   `which replit` backstop name-collided with the PyPI `replit`
            package's console script -> phantom Replit Desktop for any dev who
            `pip install replit`'d. Drop the backstop; the /usr/lib/replit
            resource-tree gate is authoritative. Remove the now-orphaned
            _check_replit_command helper.

The macOS/Windows variants already gate on the app/binary only, so this just
brings Linux to parity. Pure detection-accuracy fix.

Tests: new test_cursor_residue_detection + test_windsurf_residue_detection
(residue-only -> not detected; real binary -> still detected); reworked the
replit backstop test into test_which_replit_pypi_collision_not_detected
(`which replit` resolves but no resource tree -> not detected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(linux/replit): drop replit --version fallback; scrub tracking ids from comments

Address review on the version side of the same name-collision: get_version's
`replit --version` fallback (_version_via_command) is subject to the identical
PyPI `replit` collision — an asar-only Desktop install on a machine with the
PyPI package would report the PyPI version instead of "Unknown", contradicting
the docstring. Drop the fallback (and the now-unused run_command/VERSION_TIMEOUT
imports); an asar-only install yields None -> detect() reports "Unknown" as
documented. Update the affected version/residue tests (their run_command stubs
are now obsolete).

Also remove ticket/PR identifiers from the in-code comments and docstrings added
by this change, keeping the explanations professional.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Fetch the backend's unresolved-UUID worklist, match each against the local Claude
session files (Claude Code + CoWork folders), and report the resolved name + tools
back via the single-server scan endpoint with the originating UUID. Only UUIDs the
backend asked for are sent; HTTP via curl per the Zscaler constraint.
Call run_sweep at the end of the discovery run so connector UUIDs self-heal on the
tool's existing periodic cadence. Best-effort — never affects the discovery outcome.
Device sweep: resolve Claude connector UUIDs
@zeus-12 zeus-12 requested a review from a team July 1, 2026 05:45
@zeus-12 zeus-12 merged commit 5ee75aa into main Jul 1, 2026
9 checks passed
Comment on lines +37 to +38
def _normalize_url(url):
return (url or "").rstrip("/")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Scheme-Less Domains Skip Sweep

When discovery is run with a domain like api.example.com, the main reporting path accepts it by adding https://, but this new sweep builds api.example.com/api/v1/... directly. Curl then fails before fetching unresolved UUIDs, and the caller only logs that at debug level, so connector resolution silently never runs for that accepted input.

Suggested change
def _normalize_url(url):
return (url or "").rstrip("/")
def _normalize_url(url):
url = (url or "").strip()
if url and "://" not in url:
url = f"https://{url}"
return url.rstrip("/")

for entry in (data.get("remoteMcpServersConfig") or []):
if not isinstance(entry, dict):
continue
uuid = (entry.get("uuid") or "").strip().lower()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Opaque UUIDs Are Lowercased

The backend worklist is an opaque list of connector UUIDs, but the sweep lowercases both the local key and the value later sent back as connector_uuid. If the backend stored a mixed-case opaque ID and compares it exactly, the local session entry matches but the POST uses a different ID, so the resolver can reject it or update the wrong lookup key.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.



def _normalize_url(url):
return (url or "").rstrip("/")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweep skips https scheme normalization

High Severity

The _normalize_url function in sweep_connectors.py only strips trailing slashes, unlike the utils.normalize_url used by other discovery components which also adds an https:// scheme. This difference means the connector sweep can fail to reach the backend when provided with a bare domain, even if other discovery operations succeed.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.


files = []
for sub in SESSION_SUBDIRS:
folder = base / sub

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweep reads one user’s Claude dir

Medium Severity

The read_local_connectors function only checks the current user's Claude session files. On Linux, when the sweep runs as root or a service account, this means it misses connectors located in other users' home directories, preventing their resolution. This contrasts with how other discovery processes find user-specific data.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.

sent, failed, matched = run_sweep(args.domain, args.api_key)
logger.info(f"Connector UUID sweep: resolved {sent}, failed {failed}, matched {matched}")
except Exception as sweep_err:
logger.debug(f"Connector UUID sweep failed: {sweep_err}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweep runs under watchdog

Medium Severity

run_sweep runs at the end of the main try, after the scan completed event, but before finally sets _finished and cancels the watchdog timer. Each unresolved connector can spend up to ~90s in subprocess timeouts, so a long sweep can still trigger _abort for exceeding args.timeout, sending a failed scan event and os._exit(1) after the backend already recorded completion.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 99e8c54. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants