Distributed ux polish by DhirenMhatre · Pull Request #7 · test-org-codity/LocalAI

DhirenMhatre · 2026-05-20T11:46:11Z

Description

This PR fixes #

Notes for Reviewers

Yes, I signed my commits.

Before this change `DistributedBackendManager.CheckUpgrades` delegated to the local manager, which read backends from the frontend filesystem. In distributed deployments the frontend has no backends installed locally — they live on workers — so the upgrade-detection loop never ran and the UI silently never surfaced upgrades even when the gallery advertised newer versions or digests. Worker-side: NATS backend.list reply now carries Version, URI and Digest for each installed backend (read from metadata.json). Frontend-side: DistributedBackendManager.ListBackends aggregates per-node refs (name, status, version, digest) instead of deduping, and CheckUpgrades feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint factored out of CheckBackendUpgrades so both paths share the same core logic. Cluster drift policy: when per-node version/digest tuples disagree, the backend is flagged upgradeable regardless of whether any single node matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so operators can see *why* it is out of sync. The next upgrade-all realigns the cluster. Tests cover: drift detection, unanimous-match (no upgrade), and the empty-installed-version path that the old distributed code silently missed.

The System page (Manage.jsx) only showed updates as a tiny inline arrow, so operators routinely missed them. Port the Backend Gallery's upgrade UX so System speaks the same visual language: - Yellow banner at the top of the Backends tab when upgrades are pending, with an "Upgrade all" button (serial fan-out, matches the gallery) and a "Updates only" filter toggle. - Warning pill (↑ N) next to the tab label so the count is glanceable even when the banner is scrolled out of view. - Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button that silently flipped semantics between Reinstall and Upgrade), plus an "Update available" badge in the new Version column. - New columns: Version (with upgrade + drift chips), Nodes (per-node attribution badges for distributed mode, degrading to a compact "on N nodes · M offline" chip above three nodes), Installed (relative time). - System backends render a "Protected" chip instead of a bare "—" so rows still align and the reason is obvious. - Delete uses the softer btn-danger-ghost so rows don't scream red; the ConfirmDialog still owns the "are you sure". The upgrade checker also needed the same per-worker fix as the previous commit: NewUpgradeChecker now takes a BackendManager getter so its periodic runs call the distributed CheckUpgrades (which asks workers) instead of the empty frontend filesystem. Without this the /api/backends/ upgrades endpoint stayed empty in distributed mode even with the protocol change in place. New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack, .cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in App.css so other pages can adopt them without duplicating styles.

The Nodes page was the biggest visual liability in distributed mode. Rework the main dashboard surfaces in place without changing behavior: StatCards: uniform height (96px min), left accent bar colored by the metric's semantic (success/warning/error/primary), icon lives in a 36x36 soft-tinted chip top-right, value is left-aligned and large. Grid auto-fills so the row doesn't collapse on narrow viewports. This replaces the previous thin-bordered boxes with inconsistent heights. Table rows: expandable rows now show a chevron cue on the left (rotates on expand) so users know rows open. Status cell became a dedicated chip with an LED-style halo dot instead of a bare bullet. Action buttons gained labels — "Approve", "Resume", "Drain" — so the icons aren't doing all the semantic work; the destructive remove action uses the softer btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog still owning the real "are you sure". Applied cell-mono/cell-muted utility classes so label chips and addresses share one spacing/font grammar instead of re-declaring inline styles everywhere. Expanded drawer: empty states for Loaded Models and Installed Backends now render as a proper drawer-empty card (dashed border, icon, one-line hint) instead of a plain muted string that read like broken formatting. Tabs: three inline-styled buttons became the shared .tab class so they inherit focus ring, hover state, and the rest of the design system — matches the System page. "Add more workers" toggle turned into a .nodes-add-worker dashed-border button labelled "Register a new worker" (action voice) instead of a chevron + muted link that operators kept mistaking for broken text. New shared CSS primitives carry over to other pages: .stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty, .nodes-add-worker.

Two connected problems handled together: 1) Backend delete/install/upgrade used to silently skip non-healthy nodes, so a delete during an outage left a zombie on the offline node once it returned. The fan-out now records intent in a new pending_backend_ops table before attempting the NATS round-trip. Currently-healthy nodes get an immediate attempt; everyone else is queued. Unique index on (node_id, backend, op) means reissuing the same operation refreshes next_retry_at instead of stacking duplicates. 2) Loaded-model state could drift from reality: a worker OOM'd, got killed, or restarted a backend process would leave a node_models row claiming the model was still loaded, feeding ghost entries into the /api/nodes/models listing and the router's scheduling decisions. The existing ReplicaReconciler gains two new passes that run under a fresh KeyStateReconciler advisory lock (non-blocking, so one wedged frontend doesn't freeze the cluster): - drainPendingBackendOps: retries queued ops whose next_retry_at has passed on currently-healthy nodes. Success deletes the row; failure bumps attempts and pushes next_retry_at out with exponential backoff (30s → 15m cap). ErrNoResponders also marks the node unhealthy. - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are loaded but hasn't seen touched in the last probeStaleAfter (2m). Unreachable addresses are removed from the registry. A pluggable ModelProber lets tests substitute a fake without standing up gRPC. DistributedBackendManager exposes DeleteBackendDetailed so the HTTP handler can surface per-node outcomes ("2 succeeded, 1 queued") to the UI in a follow-up commit; the existing DeleteBackend still returns error-only for callers that don't care about node breakdown. Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx on a new key so N frontends coordinate — the same pattern the health monitor and replica reconciler already rely on. Single-node mode runs both passes inline (adapter is nil, state drain is a no-op). Tests cover the upsert semantics, backoff math, the probe removing an unreachable model but keeping a reachable one, and filtering by probeStaleAfter.

When a frontend restarted in distributed mode, models that workers had already loaded weren't visible until the operator clicked into each node manually — the /api/models/capabilities endpoint only knew about configs on the frontend's filesystem, not the registry-backed truth. /api/models/capabilities now joins in ListAllLoadedModels() when the registry is active, returning loaded_on[] with node id/name/state/status for each model. Models that live in the registry but lack a local config (the actual ghosts, not recovered from the frontend's file cache) still surface with source="registry-only" so operators can see and persist them; without that emission they'd be invisible to this frontend. Manage → Models replaces the old Running/Idle pill with a distribution cell that lists the first three nodes the model is loaded on as chips colored by state (green loaded, blue loading, amber anything else). On wider clusters the remaining count collapses into a +N chip with a title-attribute breakdown. Disabled / single-node behavior unchanged. Adopted models get an extra "Adopted" ghost-icon chip with hover copy explaining what it means and how to make it permanent. Distributed mode also enables a 10s auto-refresh and a "Last synced Xs ago" indicator next to the Update button so ghost rows drop off within one reconcile tick after their owning process dies. Non-distributed mode is untouched — no polling, no cell-stack, same old Running/Idle.

Large clusters were going to break the Manage → Backends Nodes column: the old inline logic rendered every node as a badge and would shred the layout at >10 workers, plus the Manage → Models distribution cell had copy-pasted its own slightly-different version. NodeDistributionChip handles any cluster size with two render modes: - small (≤3 nodes): inline chips of node names, colored by health. - large: a single "on N nodes · M offline · K drift" summary chip; clicking opens a Popover with a per-node table (name, status, version, digest for backends; name, status, state for models). Drift counting mirrors the backend's summarizeNodeDrift so the UI number matches UpgradeInfo.NodeDrift. Digests are truncated to the docker-style 12-char form with the full value preserved in the title. Popover is a new general-purpose primitive: fixed positioning anchored to the trigger, flips above when there's no room below, closes on outside-click or Escape, returns focus to the trigger. Uses .card as its surface so theming is inherited. Also useful for a future labels-editor popup and the user menu. Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell and uses the shared chip with context="backends" / "models" respectively. Delete code removes ~40 lines of ad-hoc logic.

The Backends gallery had a nice search + chip + toggle strip; the System page had nothing, so the two surfaces felt like different apps. Lift the pattern into a reusable FilterBar and wire both System tabs through it. New component core/http/react-ui/src/components/FilterBar.jsx renders a search input, a role="tablist" chip row (aria-selected for a11y), and optional toggles / right slot. Chips support an optional `count` which the System page uses to show "User 3", "Updates 1" etc. System Models tab: search by id or backend; chips for All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in distributed mode. "Last synced" + Update button live in the right slot. System Backends tab: search by name/alias/meta-backend-for; chips for All/User/System/Meta plus conditional Updates / Offline-nodes chips when relevant. The old ad-hoc "Updates only" toggle from the upgrade banner folded into the Updates chip — one source of truth for that filter. Offline chip only appears in distributed mode when at least one backend has an unhealthy node, so the chip row stays quiet on healthy clusters. Filter state persists in URL query params (mq/mf/bq/bf) so deep links and tab switches keep the operator's filter context instead of resetting every time. Also adds an "Adopted" distribution path: when a model in /api/models/capabilities carries source="registry-only" (discovered on a worker but not configured locally), the Models tab shows a ghost chip labelled "Adopted" with hover copy explaining how to persist it — this is what closes the loop on the ghost-model story end-to-end.

pending_backend_ops rows targeting agent-type workers looped forever: the reconciler fan-out hit a NATS subject the worker doesn't subscribe to, returned ErrNoResponders, we marked the node unhealthy, and the health monitor flipped it back to healthy on the next heartbeat. Next tick, same row, same failure. Three related fixes: 1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend. Agent workers handle agent NATS subjects, not backend.install / delete / list, so enqueueing for them guarantees an infinite retry loop. Silent skip is correct — they aren't consumers of these ops. 2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on nats.ErrNoResponders: mark the node unhealthy before recording the failure, so subsequent ListDuePendingBackendOps (filters by status=healthy) stops picking the row until the node actually recovers. Matches the synchronous fan-out path. 3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of exponential backoff the row is a poison message; further retries just thrash NATS. Row is deleted and logged at ERROR so it stays visible without staying infinite. Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows that target agent-type nodes, non-existent nodes, or carry an empty backend name. Guarded by the same schema-migration advisory lock so only one instance performs it. The guards above prevent new rows of this shape; this closes the migration gap for existing ones. Tests: the prune migration (valid row stays, agent + empty-name rows drop) on top of existing upsert / backoff coverage.

codity-dm · 2026-05-20T11:46:22Z

Policy Check Failed

✗ 3/3 policy checks failed:

• Need 2 more approval(s) (0/2) — comment LGTM or approve via review
• Missing ticket reference (expected: JIRA-, ENG-, #*)
• 20 code file(s) changed but no test files added

To merge this PR:

Address the failed checks listed above
Ensure branch protection requires the codity/policy-check status

Configure policies in your dashboard

codity-dm · 2026-05-20T11:46:30Z

PR Summary

What Changed

Fixed distributed-mode upgrade detection by routing checks through the active backend manager instead of empty local filesystem.
Added durable PendingBackendOp queue for backend lifecycle operations with automatic retry and exponential backoff.
Built new React UI components for cluster visibility: node distribution chips, drift detection badges, and persistent URL-based filtering.

Key Changes by Area

Distributed Backend Management

core/services/nodes/managers_distributed.go: Replaced fire-and-forget operations with enqueueAndDrainBackendOp using persistent queue.
core/services/nodes/reconciler.go: Added reconcileState with model health probing and stale probe detection.

Upgrade Detection

core/gallery/upgrade.go: New CheckUpgradesAgainst function using aggregated SystemBackends from workers.
core/application/upgrade_checker.go: Added backendManagerFn lazy getter for proper distributed routing.

React UI

core/http/react-ui/src/components/NodeDistributionChip.jsx: Visualizes model/backend distribution with inline chips or popover detail.
core/http/react-ui/src/pages/Manage.jsx: Added drift warnings, upgrade banners, URL-persisted filters, and 10s auto-refresh.

Data Model

core/gallery/backends.go: Extended SystemBackend with Nodes []NodeBackendRef for per-node attribution.
core/services/nodes/registry.go: New PendingBackendOp table with composite unique index and retry management.

Files Changed

File	Changes Summary
core/application/distributed.go	Added `Adapter`, `RegistrationToken`, `ProbeStaleAfter` to `ReplicaReconcilerOptions`
core/application/startup.go	Passes lazy getter to `NewUpgradeChecker` for distributed mode
core/application/upgrade_checker.go	Added `backendManagerFn` lazy getter for proper backend routing
core/cli/worker.go	Worker reports `Version`, `URI`, `Digest` in lifecycle events
core/gallery/backends.go	Added `Nodes []NodeBackendRef` and `NodeDriftInfo` types
core/gallery/upgrade.go	New `CheckUpgradesAgainst` and `summarizeNodeDrift` functions
core/gallery/upgrade_test.go	Added drift detection test suite
core/http/react-ui/src/App.css	Table utilities, stat cards, popover system, status indicators
core/http/react-ui/src/components/FilterBar.jsx	Shared search/filter/toggle control strip
core/http/react-ui/src/components/NodeDistributionChip.jsx	Node distribution visualization with drift detection
core/http/react-ui/src/components/Popover.jsx	Accessible anchored popover with focus management
core/http/react-ui/src/pages/Manage.jsx	Distributed mode badges, filters, auto-refresh, upgrade banners
core/http/react-ui/src/pages/Nodes.jsx	Stat cards, loaded models table, improved accessibility
core/http/routes/ui_api.go	Added `LoadedOn` field and `Source="registry-only"` for ghost models
core/services/advisorylock/keys.go	Added `KeyStateReconciler` constant
core/services/galleryop/service.go	Extended `SystemBackend` metadata fields
core/services/messaging/subjects.go	New messaging subjects for backend operations
core/services/nodes/managers_distributed.go	`enqueueAndDrainBackendOp` with durable intent queue
core/services/nodes/reconciler.go	`reconcileState`, `drainPendingBackendOps`, model health probing
core/services/nodes/reconciler_test.go	`fakeProber` helper and `probeLoadedModels` tests
core/services/nodes/registry.go	`PendingBackendOp` CRUD, exponential backoff, startup pruning

Review Focus Areas

Queue durability: Verify PendingBackendOp retry logic handles node restarts without duplicate operations.
Drift detection accuracy: Check summarizeNodeDrift handles mixed-version clusters correctly.
Reconciler lock splitting: Confirm non-blocking state lock prevents frontend stampede while allowing scaling operations.

Architecture

Design Decisions

Lazy getter pattern in upgrade_checker.go defers backend manager resolution until check time, avoiding circular dependencies during startup.
Composite unique index (node_id, backend, op) prevents duplicate intents but allows same operation on different nodes.
Split advisory locks: replica scaling blocks, state reconciliation tries non-blocking to prioritize user-facing operations over cleanup.

Scalability & Extensibility

Popover component uses fixed positioning with scroll handling for large node lists.
NodeDistributionChip switches from inline to summary mode at 3 nodes, tunable threshold for future density needs.

Risks

Intentional: Exponential backoff caps at 15 minutes, operations may lag behind actual cluster state during recovery.
Unintentional: probeStaleAfter default (2m) may mask genuinely slow model loads on resource-constrained nodes.

codity-dm · 2026-05-20T11:49:35Z

+          if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
+            // One-shot reconciliation — the old state becomes the new chip.
+            setBackendsFilter('upgradable')
+            setShowOnlyUpgradable(false)


Calling state setters during render here can trigger React re-render loops, so move this reconciliation into a useEffect or derive the filter without mutating state in render.

Suggested fix

useEffect(() => { if (showOnlyUpgradable && backendsFilter !== 'upgradable') { setBackendsFilter('upgradable') setShowOnlyUpgradable(false) } }, [showOnlyUpgradable, backendsFilter])

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/pages/Manage.jsx Lines: 642-645 Issue Type: functional-high Severity: high Issue Description: Calling state setters during render here can trigger React re-render loops, so move this reconciliation into a useEffect or derive the filter without mutating state in render. Current Code: if (showOnlyUpgradable && backendsFilter !== 'upgradable') { // One-shot reconciliation — the old state becomes the new chip. setBackendsFilter('upgradable') setShowOnlyUpgradable(false) } --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T11:49:35Z

+func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool {
+	client := grpcclient.NewClientWithToken(address, false, nil, false, g.token)
+	probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second)
+	defer cancel()
+	ok, _ := client.HealthCheck(probeCtx)
+	return ok


The liveness probe trusts the persisted model address without validation, so a compromised or stale node_models row can trigger server-side requests to arbitrary internal endpoints; restrict probes to the owning node's registered host or validate the address before dialing.

Suggested fix

func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool { if !isAllowedProbeAddress(address) { return false } client := grpcclient.NewClientWithToken(address, false, nil, false, g.token) probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second) defer cancel() ok, _ := client.HealthCheck(probeCtx) return ok }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/reconciler.go Lines: 28-33 Issue Type: functional-high Severity: high Issue Description: The liveness probe trusts the persisted model address without validation, so a compromised or stale node_models row can trigger server-side requests to arbitrary internal endpoints; restrict probes to the owning node's registered host or validate the address before dialing. Current Code: func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool { client := grpcclient.NewClientWithToken(address, false, nil, false, g.token) probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second) defer cancel() ok, _ := client.HealthCheck(probeCtx) return ok } --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T11:49:35Z

+                <button
+                  key={f.key}
+                  role="tab"
+                  aria-selected={activeFilter === f.key}
+                  className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
+                  onClick={() => onFilterChange(f.key)}
+                >


The tab buttons are missing a type="button" attribute, so embedding this component inside a form will submit the form on filter clicks; set an explicit button type.

Suggested fix

<button key={f.key} type="button" role="tab" aria-selected={activeFilter === f.key} className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`} onClick={() => onFilterChange(f.key)} >

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/components/FilterBar.jsx Lines: 54-60 Issue Type: robustness-medium Severity: medium Issue Description: The tab buttons are missing a `type="button"` attribute, so embedding this component inside a form will submit the form on filter clicks; set an explicit button type. Current Code: <button key={f.key} role="tab" aria-selected={activeFilter === f.key} className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`} onClick={() => onFilterChange(f.key)} > --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T11:49:36Z

+        ariaLabel={context === 'models' ? 'Model distribution' : 'Backend distribution'}
+      >
+        <div className="popover__header">
+          <strong>Installed on {total} node{total === 1 ? '' : 's'}</strong>


The popover header always says "Installed on" even for models where the component is documenting loaded state, so use a context-specific label.

Suggested change

<strong>Installed on {total} node{total === 1 ? '' : 's'}</strong>

<strong>{context === 'models' ? 'Loaded on ' : 'Installed on '}{total} node{total === 1 ? '' : 's'}</strong>

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/components/NodeDistributionChip.jsx Lines: 97-97 Issue Type: functional-medium Severity: medium Issue Description: The popover header always says "Installed on" even for models where the component is documenting loaded state, so use a context-specific label. Current Code: <strong>Installed on {total} node{total === 1 ? '' : 's'}</strong> --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T11:49:36Z

+		default:
+			xlog.Warn("Reconciler: unknown pending op", "op", op.Op, "id", op.ID)
+			continue


Unknown pending op types are skipped forever without updating attempts or deleting the row, so a malformed queued operation can persist indefinitely; dead-letter or mark failure for unsupported op values.

Suggested fix

default: xlog.Warn("Reconciler: unknown pending op", "op", op.Op, "id", op.ID) _ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, fmt.Sprintf("unsupported op: %s", op.Op)) if op.Attempts+1 >= maxPendingBackendOpAttempts { _ = rc.registry.DeletePendingBackendOp(ctx, op.ID) } continue

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/reconciler.go Lines: 197-199 Issue Type: security-medium Severity: medium Issue Description: Unknown pending op types are skipped forever without updating attempts or deleting the row, so a malformed queued operation can persist indefinitely; dead-letter or mark failure for unsupported op values. Current Code: default: xlog.Warn("Reconciler: unknown pending op", "op", op.Op, "id", op.ID) continue --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T11:50:56Z

Security Scan Summary

Metric	Value
Vulnerabilities	Critical: 0
Overall Risk	Clean
Files Scanned	21

No critical security issues detected

Scan completed in 42.6s

Security scan powered by Codity.ai

codity-dm · 2026-05-20T11:52:26Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	6
Scanner	npm audit

View vulnerability details (6 items)

1. brace-expansion various

CVE: GHSA-f886-m6hf-6m8v
Severity: MODERATE
Fixed in: True

brace-expansion: Zero-step sequence causes process hang and memory exhaustion

2. express-rate-limit various

CVE: N/A
Severity: MODERATE
Fixed in: True

Vulnerability in express-rate-limit

3. fast-uri various

CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
Severity: HIGH
Fixed in: True

fast-uri vulnerable to path traversal via percent-encoded dot segments

4. hono various

CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
Severity: MODERATE
Fixed in: True

Hono has CSS Declaration Injection via Style Object Values in JSX SSR

5. ip-address various

CVE: GHSA-v2v4-37r5-5v8g
Severity: MODERATE
Fixed in: True

ip-address has XSS in Address6 HTML-emitting methods

6. postcss various

CVE: GHSA-qx2v-qp2m-jg93
Severity: MODERATE
Fixed in: True

PostCSS has XSS via Unescaped </style> in its CSS Stringify Output

Powered by Codity.ai · Docs

codity-dm · 2026-05-20T11:52:27Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	4
Scanner	pip-audit

View vulnerability details (4 items)

1. pip 24.0

CVE: GHSA-4xh5-x5gv-qwph
Fixed in: 25.3

When extracting a tar archive pip may not check symbolic links point into the extraction directory if the tarfile module doesn't implement PEP 706. Note that upgrading pip to a "fixed" version for thi

2. pip 24.0

CVE: GHSA-6vgw-5pg2-w6jp
Fixed in: 26.0

When pip is installing and extracting a maliciously crafted wheel archive, files may be extracted outside the installation directory. The path traversal is limited to prefixes of the installation dire

3. pip 24.0

CVE: GHSA-58qw-9mgm-455v
Fixed in:

pip handles concatenated tar and ZIP files as ZIP files regardless of filename or whether a file is both a tar and ZIP file. This behavior could result in confusing installation behavior, such as inst

4. pip 24.0

CVE: GHSA-jp4c-xjxw-mgf9
Fixed in: 26.1

pip prior to version 26.1 would run self-update check functionality after installing wheel files which required importing well-known Python modules names. These module imports were intentionally defer

Powered by Codity.ai · Docs

greptile-apps · 2026-05-20T11:53:21Z

Greptile Summary

This PR polishes the distributed mode UX across the backend: it fixes the long-standing bug where upgrade detection always read the empty frontend filesystem instead of aggregating from workers, introduces a durable pending_backend_ops queue so backend lifecycle ops survive worker downtime, and adds cluster-drift detection to surface nodes running divergent backend versions. The frontend receives matching work — per-node distribution chips, upgrade banners, richer filter bars, and auto-refresh in distributed mode.

Backend (Go): CheckUpgradesAgainst replaces the single-node-only CheckBackendUpgrades for distributed callers; enqueueAndDrainBackendOp replaces fire-and-forget fan-outs with durable intent rows retried by the new reconcileState pass; gRPC health probes prune ghost model rows automatically.
Frontend (React): New NodeDistributionChip, Popover, and FilterBar components; Manage.jsx gains per-tab search/filter with URL persistence, an upgrade banner, and a "Upgrade all" action; Nodes.jsx moves from inline styles to design-system CSS classes.

Confidence Score: 3/5

The Go backend changes are solid and well-tested; the frontend Upgrade all action unconditionally shows a success notice even when every upgrade call fails, which will mislead operators on a broken cluster.

The core distributed infrastructure work is carefully implemented with good test coverage. The frontend misinforms the operator when all backends fail to upgrade — a user who clicks Upgrade all and sees only error toasts followed immediately by a success banner has no reliable signal about the true cluster state.

core/http/react-ui/src/pages/Manage.jsx — handleUpgradeAll success-toast logic needs fixing before this ships to operators.

Important Files Changed

Filename	Overview
core/http/react-ui/src/pages/Manage.jsx	Significant UI overhaul adding per-tab filtering, URL-synced filter state, auto-refresh in distributed mode, upgrade banner, and NodeDistributionChip; has a bug where the success toast fires unconditionally even when all individual upgrades fail, plus dead showOnlyUpgradable state.
core/services/nodes/managers_distributed.go	Major refactor extracting enqueueAndDrainBackendOp with durable intent recording and per-node result reporting; has a narrow race between the direct fan-out and the reconciler both applying the same op on a healthy node.
core/application/upgrade_checker.go	Adds lazy backendManagerFn indirection so the checker routes through the distributed backend manager in cluster mode; reduces initial delay from 30 s to 10 s which may be too short for workers to fully register.
core/services/nodes/registry.go	Adds PendingBackendOp GORM model with unique composite index, full CRUD, exponential-backoff failure recording, and startup pruning of stale agent-node rows.
core/services/nodes/reconciler.go	Adds reconcileState with pending-op drain and gRPC-health-based ghost-model pruning passes; uses separate advisory lock from replica scaling, overridable ModelProber for testing.
core/gallery/upgrade.go	Splits CheckBackendUpgrades into a single-node wrapper plus CheckUpgradesAgainst accepting pre-aggregated SystemBackends; adds cluster-drift detection via summarizeNodeDrift majority-vote logic.
core/http/routes/ui_api.go	Extends the models list API to populate loaded_on distribution data from the node registry and emits registry-only ghost entries for models with no local config file.

Sequence Diagram

sequenceDiagram
    participant UI as React UI
    participant API as ui_api.go
    participant DBM as DistributedBackendManager
    participant Reg as NodeRegistry (DB)
    participant Rec as ReplicaReconciler
    participant W as Worker Node (NATS)

    UI->>API: POST /api/backends/upgrade/:name
    API->>DBM: UpgradeBackend(ctx, name)
    DBM->>Reg: UpsertPendingBackendOp(nodeID, name, upgrade)
    alt node is healthy
        DBM->>W: InstallBackend (NATS RPC)
        W-->>DBM: reply.Success
        DBM->>Reg: deletePendingRow(nodeID, name, upgrade)
    else node is offline/draining
        DBM-->>API: "status=queued"
        Note over Rec,W: Reconciler picks up later
    end

    loop Every 30s
        Rec->>Reg: ListDuePendingBackendOps()
        Reg-->>Rec: due ops for healthy nodes
        Rec->>W: InstallBackend / DeleteBackend (NATS)
        alt success
            Rec->>Reg: DeletePendingBackendOp(id)
        else failure
            Rec->>Reg: RecordPendingBackendOpFailure(id, err)
            Note over Rec: exponential backoff, dead-letter at 10 attempts
        end
        Rec->>Reg: query stale loaded model rows
        Rec->>W: gRPC HealthCheck (probe)
        alt unreachable
            Rec->>Reg: RemoveNodeModel(nodeID, modelName)
        end
    end

    UI->>API: GET /api/system/models
    API->>Reg: ListAllLoadedModels + List nodes
    API-->>UI: modelCapability with LoadedOn and Source

_{Reviews (1): Last reviewed commit: "fix(distributed): stop queue loops on ag..." | Re-trigger Greptile}

greptile-apps · 2026-05-20T11:53:25Z

+      for (const name of names) {
+        try {
+          await backendsApi.upgrade(name)
+        } catch (err) {
+          addToast(`Upgrade failed for ${name}: ${err.message}`, 'error')
+        }
+      }
+      addToast(`Upgrade started for ${names.length} backend${names.length === 1 ? '' : 's'}`, 'info')


Success toast always fires regardless of upgrade outcome. Every call to backendsApi.upgrade is individually caught, so the outer try block never throws. The "Upgrade started for N backends" toast fires even when every individual upgrade failed — an operator who sees N error toasts followed by a success notice will be misled into thinking work is in progress when none of it is.

Suggested change

for (const name of names) {

try {

await backendsApi.upgrade(name)

} catch (err) {

addToast(`Upgrade failed for ${name}: ${err.message}`, 'error')

}

}

addToast(`Upgrade started for ${names.length} backend${names.length === 1 ? '' : 's'}`, 'info')

let succeeded = 0

for (const name of names) {

try {

await backendsApi.upgrade(name)

succeeded++

} catch (err) {

addToast(`Upgrade failed for ${name}: ${err.message}`, 'error')

}

}

if (succeeded > 0) {

addToast(`Upgrade started for ${succeeded} backend${succeeded === 1 ? '' : 's'}`, 'info')

}

greptile-apps · 2026-05-20T11:53:26Z

+          // Legacy "showOnlyUpgradable" toggle is now the 'upgradable' chip —
+          // keep backward-compat by mapping it onto the new filter.
+          if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
+            // One-shot reconciliation — the old state becomes the new chip.
+            setBackendsFilter('upgradable')
+            setShowOnlyUpgradable(false)
+          }


Dead state and unreachable migration block. showOnlyUpgradable is initialised to false and setShowOnlyUpgradable(true) is never called anywhere in the component, so this compatibility shim can never run. The state variable and this block can be removed entirely.

Suggested change

// Legacy "showOnlyUpgradable" toggle is now the 'upgradable' chip —

// keep backward-compat by mapping it onto the new filter.

if (showOnlyUpgradable && backendsFilter !== 'upgradable') {

// One-shot reconciliation — the old state becomes the new chip.

setBackendsFilter('upgradable')

setShowOnlyUpgradable(false)

}

// (showOnlyUpgradable legacy migration removed — state never set)

greptile-apps · 2026-05-20T11:53:27Z

+		if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
+			xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
+			result.Nodes = append(result.Nodes, NodeOpStatus{
+				NodeID: node.ID, NodeName: node.Name, Status: "error",
+				Error: fmt.Sprintf("enqueue failed: %v", err),
+			})
+			continue
+		}
+
 		if node.Status != StatusHealthy {
+			// Intent is recorded; reconciler will retry when the node recovers.
+			result.Nodes = append(result.Nodes, NodeOpStatus{
+				NodeID: node.ID, NodeName: node.Name, Status: "queued",
+				Error: fmt.Sprintf("node %s, will retry when healthy", node.Status),
+			})
 			continue
 		}
-		if _, delErr := d.adapter.DeleteBackend(node.ID, name); delErr != nil {
-			if errors.Is(delErr, nats.ErrNoResponders) {
-				// Node's NATS subscription is gone — likely restarted with a new ID.
-				// Mark it unhealthy so future fan-outs skip it.
-				xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
-				d.registry.MarkUnhealthy(context.Background(), node.ID)
-				continue
+
+		applyErr := apply(node)
+		if applyErr == nil {
+			// Find the row we just upserted and delete it; cheap but requires
+			// a lookup since UpsertPendingBackendOp doesn't return the ID.
+			if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil {
+				xlog.Debug("Failed to clear pending backend op after success", "error", err)
 			}
-			xlog.Warn("Failed to propagate backend deletion to worker", "node", node.Name, "backend", name, "error", delErr)
-			errs = append(errs, fmt.Errorf("node %s: %w", node.Name, delErr))
+			result.Nodes = append(result.Nodes, NodeOpStatus{
+				NodeID: node.ID, NodeName: node.Name, Status: "success",
+			})
+			continue


Reconciler can race the initial fan-out on healthy nodes. UpsertPendingBackendOp sets NextRetryAt = time.Now(), so the row is immediately eligible for ListDuePendingBackendOps. If a reconciler tick fires in the ≤30 s window between the upsert and the successful apply/deletePendingRow call, both the direct fan-out path and drainPendingBackendOps will call the same NATS subject for the same node concurrently. For delete the second call may see a "not found" error that triggers unnecessary retries. Setting NextRetryAt ahead (e.g., time.Now().Add(interval + 5*time.Second)) would close the window.

greptile-apps · 2026-05-20T11:53:28Z

+	// Initial delay: don't slow down startup. Short enough that operators
+	// don't stare at an empty upgrade banner for long; long enough that
+	// workers have registered and reported their installed backends.
+	initialDelay := 10 * time.Second


10 s initial delay is likely too short for a distributed cluster. Worker registration, NATS subscription setup, and the initial ListBackends fan-out typically take longer than 10 s on any real cluster. Running CheckUpgrades before workers are visible means the upgrade banner silently shows nothing until the next scheduled check (6 h later). The old 30 s value was closer to the right order of magnitude.

Suggested change

initialDelay := 10 * time.Second

initialDelay := 30 * time.Second

DhirenMhatre · 2026-05-20T12:04:22Z

@codity review

codity-dm · 2026-05-20T12:04:35Z

Policy Check Failed

✗ 3/3 policy checks failed:

• Need 2 more approval(s) (0/2) — comment LGTM or approve via review
• Missing ticket reference (expected: JIRA-, ENG-, #*)
• 20 code file(s) changed but no test files added

To merge this PR:

Address the failed checks listed above
Ensure branch protection requires the codity/policy-check status

Configure policies in your dashboard

codity-dm · 2026-05-20T12:04:45Z

PR Summary

What Changed

Added durable backend lifecycle operations with a pending-ops queue that retries delete/install/upgrade when nodes recover, preventing "zombie" backends.
Implemented distributed upgrade detection and cluster drift monitoring that flags version/digest mismatches across nodes.
Polished the React UI with new components for node distribution visibility, auto-refresh, and persistent filters.

Key Changes by Area

Backend Lifecycle

managers_distributed.go: Replaced fire-and-forget operations with enqueueAndDrainBackendOp() that persists intent and retries on node recovery.
reconciler.go: Added drain loop for pending ops with exponential backoff and ghost model cleanup via gRPC health probes.

Upgrade & Drift Detection

upgrade_checker.go: Added lazy backendManagerFn to route checks through active backend manager (local or distributed).
upgrade.go: New CheckUpgradesAgainst() function detects cluster drift when nodes disagree on version/digest.

UI Components

NodeDistributionChip.jsx: Shows model/backend distribution across nodes; inline for ≤3 nodes, popover summary for larger clusters with drift warnings.
FilterBar.jsx: Extracted search + filter + toggle controls for reuse across System tabs.
Popover.jsx: Minimal accessible popover with auto-positioning and focus management.

API & Registry

ui_api.go: Models endpoint now joins node registry to populate loaded_on field and surfaces registry-only ghost models.
registry.go: New PendingBackendOp table with queue management methods and startup cleanup migration.

Files Changed

File	Changes Summary
core/application/distributed.go	`ReplicaReconciler` receives `Adapter`, `RegistrationToken`, `ProbeStaleAfter: 2m`
core/application/startup.go	Startup cleanup migration for malformed pending ops rows
core/application/upgrade_checker.go	Lazy `backendManagerFn` for routing upgrade checks through active manager
core/cli/worker.go	Worker reports `Version`, `URI`, `Digest` in backend lifecycle events
core/gallery/backends.go	Added `Nodes []NodeBackendRef` and `NodeDriftInfo` for per-node attribution
core/gallery/upgrade.go	New `CheckUpgradesAgainst()` with cluster drift detection
core/gallery/upgrade_test.go	Tests for upgrade and drift detection logic
core/http/react-ui/src/App.css	New CSS utilities: badges, stat cards, popover system, filter layout, status indicators
core/http/react-ui/src/components/FilterBar.jsx	New shared search + filter + toggle control strip
core/http/react-ui/src/components/NodeDistributionChip.jsx	Smart cluster distribution display with inline/popover modes
core/http/react-ui/src/components/Popover.jsx	Minimal accessible popover with auto-positioning
core/http/react-ui/src/pages/Manage.jsx	Auto-refresh, URL-persisted filters, upgrade banner, drift warnings, `NodeDistributionChip` integration
core/http/react-ui/src/pages/Nodes.jsx	`StatCard` CSS classes, expandable rows, cluster stat cards
core/http/routes/ui_api.go	Models endpoint joins node registry for `loaded_on`, surfaces registry-only models
core/services/advisorylock/keys.go	New `KeyStateReconciler` (107) for reconciler coordination
core/services/galleryop/service.go	Integration with pending ops queue
core/services/messaging/subjects.go	Extended `NodeBackendInfo` with `Version`, `URI`, `Digest`
core/services/nodes/managers_distributed.go	`enqueueAndDrainBackendOp()` with durable queue, `DeleteBackendDetailed()` for UI visibility
core/services/nodes/reconciler.go	Pending ops drain loop, ghost model cleanup, advisory lock separation
core/services/nodes/reconciler_test.go	Tests for reconciler drain and cleanup logic
core/services/nodes/registry.go	`PendingBackendOp` table, queue management methods, startup cleanup

Review Focus Areas

Dead letter handling: Operations abandoned after 10 attempts in reconciler.go:171-242. Confirm this is acceptable vs. alerting.
Advisory lock contention: KeyStateReconciler vs replica-reconciler separation prevents blocking, but verify no deadlock scenarios.
Drift detection accuracy: CheckUpgradesAgainst() flags version/digest mismatches. Review if partial cluster consensus should trigger alerts.

Architecture

Design Decisions

Chose pending-ops table over in-memory queue for durability across restarts. Tradeoff: database writes on every operation.
Separated replica scaling and state reconciliation locks intentionally. Slow reconciler passes should not block urgent scaling decisions.
Lazy backendManagerFn in upgrade checker defers local vs. distributed decision to call time, avoiding startup ordering complexity.

Scalability & Extensibility

NodeDistributionChip scales to large clusters via popover summary. Out of scope: virtualization for 100+ node clusters.
Exponential backoff (30s-15min) caps retry storm risk. Dead lettering after 10 attempts prevents infinite retry on permanent failures.

Risks

Intentional: Ghost model cleanup via gRPC probes may remove temporarily unreachable models during network partitions. Acceptable tradeoff for stale data removal.
Unintentional: UpsertPendingBackendOp conflict resolution resets retry time on re-issue. Review if this allows retry starvation via rapid re-issues.

Merge Status

NOT MERGEABLE — PR Score 23/100, below threshold (50)

[H1] 1 CRITICAL security vulnerability found (SEC-001)
[H4] PR quality score (23) is below merge floor (50)
[H5] 3 HIGH-severity inline review findings need resolution (threshold: 3)
[H6] Code quality raw score (15) is below merge floor (40)

codity-dm · 2026-05-20T12:08:01Z

+		if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil {
+			xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
+			continue
+		}
+		xlog.Warn("Reconciler: model unreachable, removed from registry",


The code removes the registry row before unloading the worker, so unload failures leave an unmanaged loaded backend; unload first or restore the row on failure.

Suggested fix

if err := rc.unloader.UnloadModelOnNode(m.NodeID, m.ModelName); err != nil { xlog.Warn("Reconciler: failed to unload unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err) continue } if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil { xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err) continue } xlog.Warn("Reconciler: model unreachable, removed from registry", "node", m.NodeID, "model", m.ModelName, "address", m.Address)

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/reconciler.go Lines: 279-283 Issue Type: robustness-high Severity: high Issue Description: The code removes the registry row before unloading the worker, so unload failures leave an unmanaged loaded backend; unload first or restore the row on failure. Current Code: if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil { xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err) continue } xlog.Warn("Reconciler: model unreachable, removed from registry", "node", m.NodeID, "model", m.ModelName, "address", m.Address) --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T12:08:01Z

+	ctx := context.Background()
+	_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error {
+		_, err := d.adapter.DeleteBackend(node.ID, name)
+		return err
+	})
+	return err


DeleteBackend uses context.Background instead of a request-scoped context, so cancellation and deadlines from the caller are ignored; pass a caller-provided context through this path.

Suggested fix

func (d *DistributedBackendManager) DeleteBackend(ctx context.Context, name string) error { if err := d.local.DeleteBackend(name); err != nil { if !errors.Is(err, gallery.ErrBackendNotFound) { return err } xlog.Debug("Backend not found locally, will attempt deletion on workers", "backend", name) } _, err := d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error { _, err := d.adapter.DeleteBackend(node.ID, name) return err }) return err }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/managers_distributed.go Lines: 200-205 Issue Type: security-medium Severity: medium Issue Description: DeleteBackend uses context.Background instead of a request-scoped context, so cancellation and deadlines from the caller are ignored; pass a caller-provided context through this path. Current Code: ctx := context.Background() _, err := d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error { _, err := d.adapter.DeleteBackend(node.ID, name) return err }) return err --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T12:09:20Z

Security Scan Summary

Metric	Value
Vulnerabilities	Critical: 0
Overall Risk	Clean
Files Scanned	21

No critical security issues detected

Scan completed in 46.8s

Security scan powered by Codity.ai

codity-dm · 2026-05-20T12:11:06Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	6
Scanner	npm audit

View vulnerability details (6 items)

1. brace-expansion various

CVE: GHSA-f886-m6hf-6m8v
Severity: MODERATE
Fixed in: True

brace-expansion: Zero-step sequence causes process hang and memory exhaustion

2. express-rate-limit various

CVE: N/A
Severity: MODERATE
Fixed in: True

Vulnerability in express-rate-limit

3. fast-uri various

CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
Severity: HIGH
Fixed in: True

fast-uri vulnerable to path traversal via percent-encoded dot segments

4. hono various

CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
Severity: MODERATE
Fixed in: True

Hono has CSS Declaration Injection via Style Object Values in JSX SSR

5. ip-address various

CVE: GHSA-v2v4-37r5-5v8g
Severity: MODERATE
Fixed in: True

ip-address has XSS in Address6 HTML-emitting methods

6. postcss various

CVE: GHSA-qx2v-qp2m-jg93
Severity: MODERATE
Fixed in: True

PostCSS has XSS via Unescaped </style> in its CSS Stringify Output

Powered by Codity.ai · Docs

codity-dm · 2026-05-20T12:11:08Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	4
Scanner	pip-audit

View vulnerability details (4 items)

1. pip 24.0

CVE: GHSA-4xh5-x5gv-qwph
Fixed in: 25.3

When extracting a tar archive pip may not check symbolic links point into the extraction directory if the tarfile module doesn't implement PEP 706. Note that upgrading pip to a "fixed" version for thi

2. pip 24.0

CVE: GHSA-6vgw-5pg2-w6jp
Fixed in: 26.0

When pip is installing and extracting a maliciously crafted wheel archive, files may be extracted outside the installation directory. The path traversal is limited to prefixes of the installation dire

3. pip 24.0

CVE: GHSA-58qw-9mgm-455v
Fixed in:

pip handles concatenated tar and ZIP files as ZIP files regardless of filename or whether a file is both a tar and ZIP file. This behavior could result in confusing installation behavior, such as inst

4. pip 24.0

CVE: GHSA-jp4c-xjxw-mgf9
Fixed in: 26.1

pip prior to version 26.1 would run self-update check functionality after installing wheel files which required importing well-known Python modules names. These module imports were intentionally defer

Powered by Codity.ai · Docs

codity-dm · 2026-05-20T12:13:32Z

License Compliance Scan

Metric	Value
Packages Scanned	778
High Risk (Strong Copyleft)	0
Medium Risk (Weak Copyleft)	6
Low Risk (Permissive)	646
Unknown License	126

Weak copyleft licenses found - verify compatibility

Some packages have unknown licenses - manual review required

Medium Risk Licenses - 6 packages

(MPL-2.0 OR Apache-2.0) (1 packages):

dompurify 3.4.0

MPL-2.0 (5 packages):

github.com/philippgille/chromem-go 0.7.0
github.com/libp2p/go-yamux/v5 5.0.1
github.com/hashicorp/golang-lru 1.0.2
github.com/hashicorp/golang-lru/v2 2.0.7
github.com/shoenig/go-m1cpu 0.1.6

Unknown Licenses - 126 packages

github.com/nats-io/nkeys 0.4.15
github.com/moby/moby/client 0.4.0
github.com/moby/moby/api 1.54.1
github.com/nats-io/nuid 1.0.1
github.com/wk8/go-ordered-map/v2 2.1.8
github.com/ProtonMail/go-crypto 1.1.6
github.com/PuerkitoBio/goquery 1.10.3
github.com/RoaringBitmap/roaring/v2 2.4.5
github.com/andybalholm/cascadia 1.3.3
github.com/antchfx/htmlquery 1.3.4
github.com/bits-and-blooms/bitset 1.24.0
github.com/antchfx/xmlquery 1.4.4
github.com/antchfx/xpath 1.3.4
github.com/blevesearch/bleve/v2 2.5.7
github.com/blevesearch/bleve_index_api 1.2.11
github.com/blevesearch/geo 0.2.4
github.com/blevesearch/gtreap 0.1.1
github.com/blevesearch/go-porterstemmer 1.0.3
github.com/blevesearch/mmap-go 1.0.4
github.com/blevesearch/scorch_segment_api/v2 2.3.13

...and 106 more

Powered by Codity.ai · Docs

codity-dm · 2026-05-20T12:14:01Z

Code Quality Report — test-org-codity/LocalAI · PR #7

Scanned: 2026-05-20 12:14 UTC | Score: 15/100 | Provider: github

Executive Summary

Severity	Count
Critical	0
High	0
Medium	5
Low	211

Top Findings

[CQ-LLM-002] `core/application/distributed.go:242` (Documentation · MEDIUM)

Issue: Missing detailed documentation for the new parameters in the ReplicaReconcilerOptions struct.
Suggestion: Add JSDoc comments for the new fields in ReplicaReconcilerOptions.

// Create ReplicaReconciler for auto-scaling model replicas. Adapter + ...

[CQ-LLM-001] `core/application/distributed.go:243` (Complexity · MEDIUM)

Issue: The function initDistributed has increased cyclomatic complexity due to additional parameters and logic.
Suggestion: Consider breaking down the function into smaller, more manageable functions.

func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB) (*Distributed, error) { ... }

[CQ-LLM-005] `core/application/upgrade_checker.go:40` (Testability · MEDIUM)

Issue: The use of a global state (backendManagerFn) can make unit testing difficult.
Suggestion: Consider using dependency injection to pass the backend manager to the UpgradeChecker.

backendManagerFn func() galleryop.BackendManager

[CQ-LLM-003] `core/application/upgrade_checker.go:146` (Error_Handling · MEDIUM)

Issue: Potentially swallowing errors when checking upgrades without handling the case where both upgrades and error are nil.
Suggestion: Add error handling to ensure that if both upgrades and err are nil, it is logged or handled appropriately.

if upgrades == nil && err == nil { ... }

[CQ-LLM-004] `core/application/upgrade_checker.go:158` (Performance · MEDIUM)

Issue: The runCheck function may lead to performance issues due to the nested calls to backendManagerFn and CheckBackendUpgrades.
Suggestion: Consider optimizing the logic to avoid unnecessary calls and improve performance.

if uc.backendManagerFn != nil { if bm := uc.backendManagerFn(); bm != nil { ... } }

[CQ-008] `core/application/distributed.go:256` (Maintainability · LOW)

Issue: Magic number 30 in code
Suggestion: Extract to a named constant

Interval:          30 * time.Second,

[CQ-009] `core/application/upgrade_checker.go:53` (Style · LOW)

Issue: Line exceeds 120 characters (163 chars)
Suggestion: Break long lines into multiple lines for readability

func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB, backendManagerFn func() ...

[CQ-008] `core/application/upgrade_checker.go:81` (Maintainability · LOW)

Issue: Magic number 10 in code
Suggestion: Extract to a named constant

initialDelay := 10 * time.Second

[CQ-LLM-006] `core/gallery/upgrade.go:25` (Maintainability · LOW)

Issue: The NodeDrift field in UpgradeInfo lacks a clear explanation of its purpose.
Suggestion: Add a comment explaining the purpose of NodeDrift in the UpgradeInfo struct.

// NodeDrift lists nodes whose installed version or digest differs from ...

[CQ-009] `core/gallery/upgrade.go:59` (Style · LOW)

Issue: Line exceeds 120 characters (175 chars)
Suggestion: Break long lines into multiple lines for readability

func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBac...

Per-File Breakdown

File	Medium	Low	Total
`core/application/distributed.go`	2	1	3
`core/application/upgrade_checker.go`	3	2	5
`core/gallery/upgrade.go`	0	2	2
`core/http/react-ui/src/App.css`	0	14	14
`core/http/react-ui/src/components/FilterBar.jsx`	0	1	1
`core/http/react-ui/src/components/NodeDistributionChip.jsx`	0	11	11
`core/http/react-ui/src/components/Popover.jsx`	0	2	2
`core/http/react-ui/src/pages/Manage.jsx`	0	122	122
`core/http/react-ui/src/pages/Nodes.jsx`	0	36	36
`core/services/advisorylock/keys.go`	0	1	1
`core/services/nodes/managers_distributed.go`	0	1	1
`core/services/nodes/reconciler.go`	0	2	2
`core/services/nodes/reconciler_test.go`	0	9	9
`core/services/nodes/registry.go`	0	7	7

Recommendations

Run automated tests after applying fixes to verify no regressions.

DhirenMhatre · 2026-05-20T15:55:25Z

@codity review

codity-dm · 2026-05-20T15:55:39Z

Policy Check Failed

✗ 3/3 policy checks failed:

• Need 2 more approval(s) (0/2) — comment LGTM or approve via review
• Missing ticket reference (expected: JIRA-, ENG-, #*)
• 20 code file(s) changed but no test files added

To merge this PR:

Address the failed checks listed above
Ensure branch protection requires the codity/policy-check status

Configure policies in your dashboard

codity-dm · 2026-05-20T15:55:47Z

PR Summary

What Changed

Fixed distributed upgrade detection by routing checks through the active backend manager instead of empty local filesystem.
Added durable backend lifecycle operations with a pending-ops queue that retries when nodes recover.
Built cluster drift visibility into the UI with node distribution chips, version mismatch warnings, and per-backend upgrade controls.

Key Changes by Area

Distributed Backend Management

PendingBackendOp table with unique constraints prevents duplicate install/delete/upgrade intents across nodes.
Reconciler drains queue with exponential backoff (30s to 15m cap) and dead-letters after 10 attempts.
DeleteBackendDetailed returns per-node status for UI visibility.

Cluster Upgrade Detection

SystemBackend.Nodes tracks per-node version, digest, and URI for drift detection.
CheckUpgradesAgainst aggregates worker state instead of reading frontend filesystem.
Worker lifecycle events now report version metadata.

State Reconciliation

ModelProber interface with gRPC health checks removes orphaned model processes.
probeLoadedModels clears ghost entries for stale addresses (older than 2m).
Split advisory locks: replica scaling vs. state reconciliation use separate keys.

UI/UX

NodeDistributionChip shows inline chips (≤3 nodes) or summary popover with drift detection.
URL-persisted filters survive tab switches; 10s auto-refresh in distributed mode.
"Adopted" badge for registry-only models discovered on workers.

Files Changed

File	Changes Summary
core/application/distributed.go	Startup wiring for distributed mode
core/application/startup.go	Initialization sequence updates
core/application/upgrade_checker.go:47-62	`backendManagerFn` lazy getter for routing upgrade checks
core/cli/worker.go:738-743	Worker reports version, URI, digest in lifecycle events
core/gallery/backends.go:394-414	`SystemBackend.Nodes` field and `NodeBackendRef` struct
core/gallery/upgrade.go:23-129	`NodeDrift` field, `CheckUpgradesAgainst` function, `summarizeNodeDrift`
core/gallery/upgrade_test.go	Tests for cluster upgrade detection
core/http/react-ui/src/App.css:1529-1540	Badge utilities, popover animations, status indicators, filter layouts
core/http/react-ui/src/components/FilterBar.jsx	Reusable search + chip filter component
core/http/react-ui/src/components/NodeDistributionChip.jsx	Node distribution display with drift detection
core/http/react-ui/src/components/Popover.jsx	Accessible auto-positioning popover
core/http/react-ui/src/pages/Manage.jsx:32-821	Drift badges, URL-persisted filters, auto-refresh, backend upgrade controls
core/http/react-ui/src/pages/Nodes.jsx	Stat cards, tab classes, worker registration improvements
core/http/routes/ui_api.go	`loaded_on` array for model distribution; registry-only model surfacing
core/services/advisorylock/keys.go	`KeyStateReconciler` constant
core/services/galleryop/service.go	`BackendManager()` accessor
core/services/messaging/subjects.go	`NodeBackendInfo` version/digest/URI fields
core/services/nodes/managers_distributed.go:97-167	Pending-ops queue integration, agent node skipping
core/services/nodes/reconciler.go:171-242	`drainPendingBackendOps` and `probeLoadedModels` reconciliation passes
core/services/nodes/reconciler_test.go	Tests for state reconciliation and model probing
core/services/nodes/registry.go:107-128	`PendingBackendOp` table, queue operations, startup cleanup

Review Focus Areas

Dead-letter behavior after 10 failed attempts: verify this matches operational expectations for backend operations.
Advisory lock splitting: confirm KeyStateReconciler non-blocking behavior doesn't starve replica scaling under load.
gRPC health check timeout in ModelProber: check it handles slow-but-healthy model processes correctly.

Architecture

Design Decisions

Pending-ops queue over synchronous fan-out: intentional tradeoff for durability when nodes are transiently offline. Accepts complexity of retry bookkeeping.
Lazy backendManagerFn getter: minimal fix for upgrade checker without restructuring initialization order.
Separate advisory locks: state reconciliation can run without blocking replica scaling, but may race; non-blocking TryWithLockCtx skips rather than waits.

Scalability & Extensibility

NodeDistributionChip collapses to summary at >3 nodes: keeps DOM weight bounded for large clusters.
Exponential backoff caps at 15 minutes: prevents thundering herd on recovering nodes.

Risks

Intentional: Dead-lettering after 10 attempts leaves backends in indeterminate state; requires manual intervention or external monitoring.
Unintentional: probeLoadedModels uses 2m staleness threshold without configurability; may be too aggressive for slow-starting models.

Merge Status

NOT MERGEABLE — PR Score 8/100, below threshold (50)

[H1] 1 CRITICAL security vulnerability found (SEC-001)
[H4] PR quality score (8) is below merge floor (50)
[H5] 5 HIGH-severity inline review findings need resolution (threshold: 3)
[H6] Code quality raw score (14) is below merge floor (40)

codity-dm · 2026-05-20T15:59:22Z

+          if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
+            // One-shot reconciliation — the old state becomes the new chip.
+            setBackendsFilter('upgradable')
+            setShowOnlyUpgradable(false)


Calling state setters during render causes render-loop risk and should be moved into a useEffect that watches showOnlyUpgradable and backendsFilter.

Suggested fix

useEffect(() => { if (showOnlyUpgradable && backendsFilter !== 'upgradable') { setBackendsFilter('upgradable') setShowOnlyUpgradable(false) } }, [showOnlyUpgradable, backendsFilter])

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/pages/Manage.jsx Lines: 642-645 Issue Type: robustness-high Severity: high Issue Description: Calling state setters during render causes render-loop risk and should be moved into a useEffect that watches showOnlyUpgradable and backendsFilter. Current Code: if (showOnlyUpgradable && backendsFilter !== 'upgradable') { // One-shot reconciliation — the old state becomes the new chip. setBackendsFilter('upgradable') setShowOnlyUpgradable(false) } --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T15:59:22Z

+	return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
+		var row PendingBackendOp
+		if err := tx.First(&row, id).Error; err != nil {
+			return err
+		}
+		row.Attempts++
+		row.LastError = errMsg
+		row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts))
+		return tx.Save(&row).Error


The row update in RecordPendingBackendOpFailure is vulnerable to lost updates under concurrent failures; use an atomic UPDATE that increments attempts in SQL.

Suggested fix

return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error { var row PendingBackendOp if err := tx.Clauses(clause.Locking{Strength: "UPDATE"}).First(&row, id).Error; err != nil { return err } attempts := row.Attempts + 1 return tx.Model(&PendingBackendOp{}). Where("id = ?", id). Updates(map[string]any{ "attempts": gorm.Expr("attempts + 1"), "last_error": errMsg, "next_retry_at": time.Now().Add(backoffForAttempt(attempts)), }).Error })

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/registry.go Lines: 1061-1069 Issue Type: robustness-high Severity: high Issue Description: The row update in RecordPendingBackendOpFailure is vulnerable to lost updates under concurrent failures; use an atomic UPDATE that increments attempts in SQL. Current Code: return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error { var row PendingBackendOp if err := tx.First(&row, id).Error; err != nil { return err } row.Attempts++ row.LastError = errMsg row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts)) return tx.Save(&row).Error }) --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T15:59:22Z

+	NodeStatus  string `json:"node_status"` // healthy | unhealthy | offline | draining | pending
+	Version     string `json:"version,omitempty"`
+	Digest      string `json:"digest,omitempty"`
+	URI         string `json:"uri,omitempty"`


Exposing raw backend URI values in the JSON response can leak internal filesystem paths or registry locations, so omit this field from the API model or redact it before serialization.

Suggested change

URI string `json:"uri,omitempty"`

URI string `json:"-"`

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/gallery/backends.go Lines: 412-412 Issue Type: security-medium Severity: medium Issue Description: Exposing raw backend URI values in the JSON response can leak internal filesystem paths or registry locations, so omit this field from the API model or redact it before serialization. Current Code: URI string `json:"uri,omitempty"` --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T15:59:22Z

+	// NodeDrift lists nodes whose installed version or digest differs from
+	// the cluster majority. Non-empty means the cluster has diverged and an
+	// upgrade will realign it. Empty in single-node mode.
+	NodeDrift []NodeDriftInfo `json:"node_drift,omitempty"`
 }

-// CheckBackendUpgrades compares installed backends against gallery entries
-// and returns a map of backend names to UpgradeInfo for those that have
-// newer versions or different OCI digests available.
+// NodeDriftInfo describes one node that disagrees with the cluster majority
+// on which version/digest of a backend is installed.
+type NodeDriftInfo struct {
+	NodeID   string `json:"node_id"`
+	NodeName string `json:"node_name"`
+	Version  string `json:"version,omitempty"`
+	Digest   string `json:"digest,omitempty"`
+}


NodeDrift exposes internal node IDs, names, versions, and digests to any caller of this API, so redact these fields or gate them behind an authorization check before serializing.

Suggested fix

// NodeDrift lists nodes whose installed version or digest differs from // the cluster majority. Populate this only for privileged callers. NodeDrift []NodeDriftInfo `json:"node_drift,omitempty"` } // NodeDriftInfo describes one node that disagrees with the cluster majority // on which version/digest of a backend is installed. type NodeDriftInfo struct { NodeName string `json:"node_name,omitempty"` }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/gallery/upgrade.go Lines: 26-39 Issue Type: security-medium Severity: medium Issue Description: NodeDrift exposes internal node IDs, names, versions, and digests to any caller of this API, so redact these fields or gate them behind an authorization check before serializing. Current Code: // NodeDrift lists nodes whose installed version or digest differs from // the cluster majority. Non-empty means the cluster has diverged and an // upgrade will realign it. Empty in single-node mode. NodeDrift []NodeDriftInfo `json:"node_drift,omitempty"` } // NodeDriftInfo describes one node that disagrees with the cluster majority // on which version/digest of a backend is installed. type NodeDriftInfo struct { NodeID string `json:"node_id"` NodeName string `json:"node_name"` Version string `json:"version,omitempty"` Digest string `json:"digest,omitempty"` } --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T15:59:22Z

+                <button
+                  key={f.key}
+                  role="tab"
+                  aria-selected={activeFilter === f.key}
+                  className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
+                  onClick={() => onFilterChange(f.key)}
+                >


The tab buttons are missing type="button", so using this component inside a form will submit the form when a filter is clicked; set an explicit button type.

Suggested fix

<button type="button" key={f.key} role="tab" aria-selected={activeFilter === f.key} className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`} onClick={() => onFilterChange(f.key)} >

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/components/FilterBar.jsx Lines: 54-60 Issue Type: functional-medium Severity: medium Issue Description: The tab buttons are missing `type="button"`, so using this component inside a form will submit the form when a filter is clicked; set an explicit button type. Current Code: <button key={f.key} role="tab" aria-selected={activeFilter === f.key} className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`} onClick={() => onFilterChange(f.key)} > --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T15:59:22Z

+              {list.map(n => (
+                <tr key={n.node_id ?? n.NodeID ?? getName(n)}>


Rows use a fallback key of getName(n), which can collide for duplicate or empty node names and cause React to reuse the wrong row state, so use a guaranteed-unique fallback such as the map index when no ID is present.

Also reported at: core/http/react-ui/src/components/NodeDistributionChip.jsx L46–L55

Suggested fix

{list.map((n, idx) => ( <tr key={n.node_id ?? n.NodeID ?? `${getName(n)}-${idx}`}>

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/components/NodeDistributionChip.jsx Lines: 114-115 Issue Type: functional-medium Severity: medium Issue Description: Rows use a fallback key of getName(n), which can collide for duplicate or empty node names and cause React to reuse the wrong row state, so use a guaranteed-unique fallback such as the map index when no ID is present. _Also reported at: `core/http/react-ui/src/components/NodeDistributionChip.jsx` L46–L55_ Current Code: {list.map(n => ( <tr key={n.node_id ?? n.NodeID ?? getName(n)}> --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T15:59:22Z


 	for _, node := range allNodes {
-		if node.Status != StatusHealthy {
+		if node.Status == StatusPending || node.Status == StatusOffline || node.Status == StatusDraining {


ListBackends now queries every non-pending, non-offline, non-draining node, including non-backend workers that do not subscribe to backend subjects; skip non-backend node types here the same way enqueueAndDrainBackendOp does.

Suggested fix

if node.Status == StatusPending || node.Status == StatusOffline || node.Status == StatusDraining { continue } if node.NodeType != "" && node.NodeType != NodeTypeBackend { continue }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/managers_distributed.go Lines: 237-237 Issue Type: functional-medium Severity: medium Issue Description: ListBackends now queries every non-pending, non-offline, non-draining node, including non-backend workers that do not subscribe to backend subjects; skip non-backend node types here the same way enqueueAndDrainBackendOp does. Current Code: if node.Status == StatusPending || node.Status == StatusOffline || node.Status == StatusDraining { --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T16:00:40Z

Security Scan Summary

Metric	Value
Vulnerabilities	Critical: 0
Overall Risk	Clean
Files Scanned	21

No critical security issues detected

Scan completed in 43.5s

Security scan powered by Codity.ai

codity-dm · 2026-05-20T17:39:28Z

Policy Check Failed

✗ 3/3 policy checks failed:

• Need 2 more approval(s) (0/2) — comment LGTM or approve via review
• Missing ticket reference (expected: JIRA-, ENG-, #*)
• 20 code file(s) changed but no test files added

To merge this PR:

Address the failed checks listed above
Ensure branch protection requires the codity/policy-check status

Configure policies in your dashboard

codity-dm · 2026-05-20T17:39:35Z

PR Summary

What Changed

Fixed distributed-mode backend upgrade detection by routing checks through the active backend manager instead of empty local filesystem
Added durable intent queue for backend lifecycle operations (install/delete/upgrade) with automatic retry and dead-lettering
Built cluster drift visibility into the React UI with node distribution chips, version mismatch detection, and per-node status tracking

Key Changes by Area

Distributed Backend Management

PendingBackendOp table persists operations across restarts with exponential backoff retry (core/services/nodes/registry.go:107-130)
enqueueAndDrainBackendOp applies immediately on healthy nodes, queues for offline/unhealthy ones (core/services/nodes/managers_distributed.go:97-167)
drainPendingBackendOps background process retries due operations and marks nodes unhealthy on ErrNoResponders (core/services/nodes/reconciler.go:171-242)

Upgrade Detection

CheckUpgradesAgainst() accepts pre-aggregated SystemBackends for cluster-wide digest comparison (core/gallery/upgrade.go)
NodeDriftInfo and NodeBackendRef types track per-node version/digest tuples for drift detection
Workers now report Version, URI, Digest in backend metadata (core/cli/worker.go)

UI/UX

NodeDistributionChip shows inline chips (≤3 nodes) or summary popover with health/drift/offline counts (core/http/react-ui/src/components/NodeDistributionChip.jsx)
URL-persisted filters (mq, mf, bq, bf) preserve state across tab switches (core/http/react-ui/src/pages/Manage.jsx)
Auto-refresh models every 10s in distributed mode with "last synced" indicator

State Reconciliation

Split reconcileOnce into advisory-locked passes: replica scaling and state reconciliation with try-lock semantics (core/services/nodes/reconciler.go)
ModelProber interface with gRPC health checks detects orphaned models; probes only stale records (>2min) (core/services/nodes/reconciler.go)

Files Changed

File	Changes Summary
core/application/distributed.go	`ReplicaReconciler` receives `Adapter` and `RegistrationToken` for state reconciliation and health probe auth
core/application/startup.go	Startup cleanup prunes malformed pending ops targeting agents or non-existent nodes
core/application/upgrade_checker.go	Added `backendManagerFn` lazy getter; reduced initial check delay from 30s to 10s
core/cli/worker.go	Worker reports `Version`, `URI`, `Digest` in backend metadata for drift detection
core/gallery/backends.go	Added `Nodes []NodeBackendRef` to `SystemBackend` for per-node attribution
core/gallery/upgrade.go	Added `CheckUpgradesAgainst()` with cluster drift detection; new `NodeDriftInfo`/`NodeBackendRef` types
core/gallery/upgrade_test.go	3 test cases for drift detection, unanimous cluster, empty-version path
core/http/react-ui/src/App.css	Added utilities: `.badge-accent`, `.cell-stack`, `.node-status`, `.stat-grid`, `.filter-bar-group`, popover animations
core/http/react-ui/src/components/FilterBar.jsx	Shared search/filter/toggle control strip with keyboard navigation and extensible slots
core/http/react-ui/src/components/NodeDistributionChip.jsx	Reusable chip with inline/summary modes, health/drift/offline display, popover details
core/http/react-ui/src/components/Popover.jsx	Accessible popover with auto-positioning, outside-click/Escape close, focus management
core/http/react-ui/src/pages/Manage.jsx	URL-persisted filters, auto-refresh, `NodeDistributionChip`, upgrade banner, drift indicators, "Adopted" badge
core/http/react-ui/src/pages/Nodes.jsx	`StatCard` with CSS variables, collapsible rows, "Register a new worker" button styling
core/http/routes/ui_api.go	`/api/models` returns `loaded_on` array; emits `registry-only` entries for ghost models
core/services/advisorylock/keys.go	Added `KeyStateReconciler` lock key for split reconciliation passes
core/services/galleryop/service.go	Exposed `BackendManager()` getter for distributed upgrade checking
core/services/messaging/subjects.go	Added `Version`, `URI`, `Digest` to `NodeBackendInfo` for cluster-wide metadata propagation
core/services/nodes/managers_distributed.go	`enqueueAndDrainBackendOp` with intent queue; skips agent nodes; health-aware fan-out
core/services/nodes/reconciler.go	Split reconciliation passes; `drainPendingBackendOps`; `ModelProber` with stale-record probing
core/services/nodes/reconciler_test.go	`fakeProber` and tests for model probing behavior; health monitor test updates
core/services/nodes/registry.go	`PendingBackendOp` table; upsert/list/record/count methods; startup pruning migration

Review Focus Areas

Retry logic in RecordPendingBackendOpFailure: Verify exponential backoff caps at 15m and max 10 attempts before dead-lettering
Ghost model detection: Confirm registry-only entries in /api/models don't break existing UI assumptions about model sources
Advisory lock splitting: Check that KeyStateReconciler try-lock prevents deadlock when replica scaling holds replica-reconciler lock

Architecture

Design Decisions

Intent queue over synchronous fan-out: Operations must survive node restarts and retry automatically. Tradeoff: increased database writes and complexity vs. reliability in partitioned clusters.
Split reconciliation locks: Prevents long-running state reconciliation from blocking urgent replica scaling. KeyStateReconciler uses try-lock to avoid queue buildup.
Stale-record probing only: Model health checks run only on records >2min old to avoid hammering workers with gRPC calls.

Scalability & Extensibility

NodeDistributionChip context prop (models vs backends) allows reuse across tabs with different data shapes
FilterBar extensible right-side slots support future tab-specific controls without component forks
Out of scope: Global operation ordering (install A before B); current queue is per-backend independent

Risks

Intentional: Dead-lettered operations after 10 attempts require manual intervention. Acceptable for v1; monitoring hook not implemented.
Unintentional: ErrNoResponders marking nodes unhealthy may over-mark during transient NATS hiccups. Review if health flapping occurs in production.
Intentional: Agent node filtering skips backend ops entirely. Agents must never subscribe to backend subjects by design.

Merge Status

NOT MERGEABLE — PR Score 8/100, below threshold (50)

[H1] 1 CRITICAL security vulnerability found (SEC-001)
[H4] PR quality score (8) is below merge floor (50)
[H5] 10 HIGH-severity inline review findings need resolution (threshold: 3)
[H6] Code quality raw score (14) is below merge floor (40)

codity-dm · 2026-05-20T17:43:25Z

+func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) {
+	galleryBackends, err := AvailableBackends(galleries, systemState)


The new ctx parameter is ignored, so upgrade checks can block indefinitely on gallery resolution or remote digest fetches; thread the context into the called functions or remove it until cancellation is supported.

Suggested fix

func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) { _ = ctx galleryBackends, err := AvailableBackends(galleries, systemState)

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/gallery/upgrade.go Lines: 59-60 Issue Type: functional-high Severity: high Issue Description: The new ctx parameter is ignored, so upgrade checks can block indefinitely on gallery resolution or remote digest fetches; thread the context into the called functions or remove it until cancellation is supported. Current Code: func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) { galleryBackends, err := AvailableBackends(galleries, systemState) --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:43:25Z

+func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context, op, backend string, galleriesJSON []byte, apply func(node BackendNode) error) (BackendOpResult, error) {
+	allNodes, err := d.registry.List(ctx)
+	if err != nil {
+		return BackendOpResult{}, err
 	}
-	var errs []error
+
+	result := BackendOpResult{Nodes: make([]NodeOpStatus, 0, len(allNodes))}
 	for _, node := range allNodes {
+		// Pending nodes haven't been approved yet — no intent to apply.
+		if node.Status == StatusPending {
+			continue
+		}
+		// Backend lifecycle ops only make sense on backend-type workers.
+		// Agent workers don't subscribe to backend.install/delete/list, so
+		// enqueueing for them guarantees a forever-retrying row that the
+		// reconciler can never drain. Silently skip — they aren't consumers.
+		if node.NodeType != "" && node.NodeType != NodeTypeBackend {
+			continue
+		}
+		if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
+			xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
+			result.Nodes = append(result.Nodes, NodeOpStatus{
+				NodeID: node.ID, NodeName: node.Name, Status: "error",
+				Error: fmt.Sprintf("enqueue failed: %v", err),
+			})
+			continue
+		}
+
 		if node.Status != StatusHealthy {
+			// Intent is recorded; reconciler will retry when the node recovers.
+			result.Nodes = append(result.Nodes, NodeOpStatus{
+				NodeID: node.ID, NodeName: node.Name, Status: "queued",
+				Error: fmt.Sprintf("node %s, will retry when healthy", node.Status),
+			})
 			continue
 		}
-		if _, delErr := d.adapter.DeleteBackend(node.ID, name); delErr != nil {
-			if errors.Is(delErr, nats.ErrNoResponders) {
-				// Node's NATS subscription is gone — likely restarted with a new ID.
-				// Mark it unhealthy so future fan-outs skip it.
-				xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
-				d.registry.MarkUnhealthy(context.Background(), node.ID)
-				continue
+
+		applyErr := apply(node)
+		if applyErr == nil {
+			// Find the row we just upserted and delete it; cheap but requires
+			// a lookup since UpsertPendingBackendOp doesn't return the ID.
+			if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil {
+				xlog.Debug("Failed to clear pending backend op after success", "error", err)
 			}
-			xlog.Warn("Failed to propagate backend deletion to worker", "node", node.Name, "backend", name, "error", delErr)
-			errs = append(errs, fmt.Errorf("node %s: %w", node.Name, delErr))
+			result.Nodes = append(result.Nodes, NodeOpStatus{
+				NodeID: node.ID, NodeName: node.Name, Status: "success",
+			})
+			continue
+		}
+
+		// Record failure for backoff. If it's an ErrNoResponders, the node's
+		// gone AWOL — mark unhealthy so the router stops picking it too.
+		errMsg := applyErr.Error()
+		if errors.Is(applyErr, nats.ErrNoResponders) {
+			xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
+			d.registry.MarkUnhealthy(ctx, node.ID)
+		}
+		if id, err := d.findPendingRow(ctx, node.ID, backend, op); err == nil {
+			_ = d.registry.RecordPendingBackendOpFailure(ctx, id, errMsg)
+		}
+		result.Nodes = append(result.Nodes, NodeOpStatus{
+			NodeID: node.ID, NodeName: node.Name, Status: "error", Error: errMsg,
+		})
+	}
+	return result, nil
+}


DeleteBackend now always returns nil when some or all nodes fail because enqueueAndDrainBackendOp records per-node errors but never surfaces an aggregated error; return a joined error when any node status is "error".

Suggested fix

func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context, op, backend string, galleriesJSON []byte, apply func(node BackendNode) error) (BackendOpResult, error) { allNodes, err := d.registry.List(ctx) if err != nil { return BackendOpResult{}, err } result := BackendOpResult{Nodes: make([]NodeOpStatus, 0, len(allNodes))} var errs []error for _, node := range allNodes { if node.Status == StatusPending { continue } if node.NodeType != "" && node.NodeType != NodeTypeBackend { continue } if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil { xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err) result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "error", Error: fmt.Sprintf("enqueue failed: %v", err), }) errs = append(errs, fmt.Errorf("node %s: enqueue failed: %w", node.Name, err)) continue } if node.Status != StatusHealthy { result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "queued", Error: fmt.Sprintf("node %s, will retry when healthy", node.Status), }) continue } applyErr := apply(node) if applyErr == nil { if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil { xlog.Debug("Failed to clear pending backend op after success", "error", err) } result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "success", }) continue } errMsg := applyErr.Error() if errors.Is(applyErr, nats.ErrNoResponders) { xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID) d.registry.MarkUnhealthy(ctx, node.ID) } if id, err := d.findPendingRow(ctx, node.ID, backend, op); err == nil { _ = d.registry.RecordPendingBackendOpFailure(ctx, id, errMsg) } result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "error", Error: errMsg, }) errs = append(errs, fmt.Errorf("node %s: %w", node.Name, applyErr)) } return result, errors.Join(errs...) }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/managers_distributed.go Lines: 97-162 Issue Type: functional-high Severity: high Issue Description: DeleteBackend now always returns nil when some or all nodes fail because enqueueAndDrainBackendOp records per-node errors but never surfaces an aggregated error; return a joined error when any node status is "error". Current Code: func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context, op, backend string, galleriesJSON []byte, apply func(node BackendNode) error) (BackendOpResult, error) { allNodes, err := d.registry.List(ctx) if err != nil { return BackendOpResult{}, err } result := BackendOpResult{Nodes: make([]NodeOpStatus, 0, len(allNodes))} for _, node := range allNodes { // Pending nodes haven't been approved yet — no intent to apply. if node.Status == StatusPending { continue } // Backend lifecycle ops only make sense on backend-type workers. // Agent workers don't subscribe to backend.install/delete/list, so // enqueueing for them guarantees a forever-retrying row that the // reconciler can never drain. Silently skip — they aren't consumers. if node.NodeType != "" && node.NodeType != NodeTypeBackend { continue } if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil { xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err) result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "error", Error: fmt.Sprintf("enqueue failed: %v", err), }) continue } if node.Status != StatusHealthy { // Intent is recorded; reconciler will retry when the node recovers. result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "queued", Error: fmt.Sprintf("node %s, will retry when healthy", node.Status), }) continue } applyErr := apply(node) if applyErr == nil { // Find the row we just upserted and delete it; cheap but requires // a lookup since UpsertPendingBackendOp doesn't return the ID. if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil { xlog.Debug("Failed to clear pending backend op after success", "error", err) } result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "success", }) continue } // Record failure for backoff. If it's an ErrNoResponders, the node's // gone AWOL — mark unhealthy so the router stops picking it too. errMsg := applyErr.Error() if errors.Is(applyErr, nats.ErrNoResponders) { xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID) d.registry.MarkUnhealthy(ctx, node.ID) } if id, err := d.findPendingRow(ctx, node.ID, backend, op); err == nil { _ = d.registry.RecordPendingBackendOpFailure(ctx, id, errMsg) } result.Nodes = append(result.Nodes, NodeOpStatus{ NodeID: node.ID, NodeName: node.Name, Status: "error", Error: errMsg, }) } return result, nil } --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:43:25Z

+		_, _ = advisorylock.TryWithLockCtx(ctx, rc.db, advisorylock.KeyStateReconciler, func() error {
+			rc.reconcileState(ctx)
+			return nil
+		})


Ignoring the state-reconciler advisory lock errors hides database or lock failures; log or handle the returned errors so reconciliation does not silently stop running.

Suggested fix

if acquired, err := advisorylock.TryWithLockCtx(ctx, rc.db, advisorylock.KeyStateReconciler, func() error { rc.reconcileState(ctx) return nil }); err != nil { xlog.Warn("Reconciler: state reconciliation lock failed", "error", err) } else if !acquired { xlog.Debug("Reconciler: state reconciliation skipped because lock is held") }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/reconciler.go Lines: 147-150 Issue Type: robustness-high Severity: high Issue Description: Ignoring the state-reconciler advisory lock errors hides database or lock failures; log or handle the returned errors so reconciliation does not silently stop running. Current Code: _, _ = advisorylock.TryWithLockCtx(ctx, rc.db, advisorylock.KeyStateReconciler, func() error { rc.reconcileState(ctx) return nil }) --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:43:25Z

+func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool {
+	client := grpcclient.NewClientWithToken(address, false, nil, false, g.token)
+	probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second)
+	defer cancel()
+	ok, _ := client.HealthCheck(probeCtx)
+	return ok


The prober treats every health-check error as a dead backend and deletes the model row; return and inspect the probe error so transient auth or network failures do not orphan healthy models.

Suggested fix

func (g grpcModelProber) IsAlive(ctx context.Context, address string) (bool, error) { client := grpcclient.NewClientWithToken(address, false, nil, false, g.token) probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second) defer cancel() return client.HealthCheck(probeCtx) }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/reconciler.go Lines: 28-33 Issue Type: robustness-high Severity: high Issue Description: The prober treats every health-check error as a dead backend and deletes the model row; return and inspect the probe error so transient auth or network failures do not orphan healthy models. Current Code: func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool { client := grpcclient.NewClientWithToken(address, false, nil, false, g.token) probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second) defer cancel() ok, _ := client.HealthCheck(probeCtx) return ok } --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:43:25Z

+		if rc.prober.IsAlive(ctx, m.Address) {
+			// Bump updated_at so we don't probe this row again immediately.
+			_ = rc.registry.db.WithContext(ctx).Model(&NodeModel{}).
+				Where("id = ?", m.ID).Update("updated_at", time.Now()).Error
+			continue
+		}
+		if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil {
+			xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
+			continue
+		}
+		xlog.Warn("Reconciler: model unreachable, removed from registry",


Deleting the registry row on any failed probe can remove healthy models during temporary transport or auth failures; only remove after a definitive unhealthy result and log probe errors separately.

Suggested fix

alive, err := rc.prober.IsAlive(ctx, m.Address) if err != nil { xlog.Warn("Reconciler: probe failed", "node", m.NodeID, "model", m.ModelName, "address", m.Address, "error", err) continue } if alive { // Bump updated_at so we don't probe this row again immediately. _ = rc.registry.db.WithContext(ctx).Model(&NodeModel{}). Where("id = ?", m.ID).Update("updated_at", time.Now()).Error continue } if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil { xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err) continue } xlog.Warn("Reconciler: model unreachable, removed from registry", "node", m.NodeID, "model", m.ModelName, "address", m.Address)

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/reconciler.go Lines: 273-283 Issue Type: robustness-high Severity: high Issue Description: Deleting the registry row on any failed probe can remove healthy models during temporary transport or auth failures; only remove after a definitive unhealthy result and log probe errors separately. Current Code: if rc.prober.IsAlive(ctx, m.Address) { // Bump updated_at so we don't probe this row again immediately. _ = rc.registry.db.WithContext(ctx).Model(&NodeModel{}). Where("id = ?", m.ID).Update("updated_at", time.Now()).Error continue } if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil { xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err) continue } xlog.Warn("Reconciler: model unreachable, removed from registry", "node", m.NodeID, "model", m.ModelName, "address", m.Address) --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:43:25Z

+	NodeStatus  string `json:"node_status"` // healthy | unhealthy | offline | draining | pending
+	Version     string `json:"version,omitempty"`
+	Digest      string `json:"digest,omitempty"`
+	URI         string `json:"uri,omitempty"`


URI is added to per-node API output and metadata without any redaction, so registry credentials or internal endpoints embedded in backend URIs should be stripped before serialization.

Suggested fix

// Store only a sanitized URI here to avoid exposing credentials or internal-only endpoints. URI string `json:"uri,omitempty"`

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/gallery/backends.go Lines: 412-412 Issue Type: security-medium Severity: medium Issue Description: URI is added to per-node API output and metadata without any redaction, so registry credentials or internal endpoints embedded in backend URIs should be stripped before serialization. Current Code: URI string `json:"uri,omitempty"` --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:43:25Z

+                <tr key={n.node_id ?? n.NodeID ?? getName(n)}>
+                  <td className="cell-mono">{getName(n)}</td>
+                  <td>
+                    <span className={`badge ${getStatus(n) === 'healthy' ? 'badge-success' : 'badge-warning'}`}>


Rows with status 'draining' are rendered as warning instead of info, so map draining to its own badge variant here as in the inline chip view.

Suggested fix

<span className={`badge ${getStatus(n) === 'healthy' ? 'badge-success' : getStatus(n) === 'draining' ? 'badge-info' : 'badge-warning'}`}>

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/components/NodeDistributionChip.jsx Lines: 118-118 Issue Type: functional-medium Severity: medium Issue Description: Rows with status 'draining' are rendered as warning instead of info, so map draining to its own badge variant here as in the inline chip view. Current Code: <span className={`badge ${getStatus(n) === 'healthy' ? 'badge-success' : 'badge-warning'}`}> --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:43:25Z

+func (r *NodeRegistry) RecordPendingBackendOpFailure(ctx context.Context, id uint, errMsg string) error {
+	return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
+		var row PendingBackendOp
+		if err := tx.First(&row, id).Error; err != nil {
+			return err
+		}
+		row.Attempts++
+		row.LastError = errMsg
+		row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts))
+		return tx.Save(&row).Error


LastError stores arbitrary backend error text without any length limit, so a malicious or noisy peer can persist unbounded log-sized payloads; truncate or cap errMsg before saving.

Suggested fix

func (r *NodeRegistry) RecordPendingBackendOpFailure(ctx context.Context, id uint, errMsg string) error { const maxErrLen = 4096 if len(errMsg) > maxErrLen { errMsg = errMsg[:maxErrLen] } return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error { var row PendingBackendOp if err := tx.First(&row, id).Error; err != nil { return err } row.Attempts++ row.LastError = errMsg row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts)) return tx.Save(&row).Error }) }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices. ### Context File: core/services/nodes/registry.go Lines: 1060-1069 Issue Type: security-medium Severity: medium Issue Description: LastError stores arbitrary backend error text without any length limit, so a malicious or noisy peer can persist unbounded log-sized payloads; truncate or cap errMsg before saving. Current Code: func (r *NodeRegistry) RecordPendingBackendOpFailure(ctx context.Context, id uint, errMsg string) error { return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error { var row PendingBackendOp if err := tx.First(&row, id).Error; err != nil { return err } row.Attempts++ row.LastError = errMsg row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts)) return tx.Save(&row).Error }) } --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow go best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-20T17:44:51Z

Security Scan Summary

Metric	Value
Vulnerabilities	Critical: 0
Overall Risk	Clean
Files Scanned	21

No critical security issues detected

Scan completed in 47.3s

Security scan powered by Codity.ai

codity-dm · 2026-05-20T17:46:34Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	6
Scanner	npm audit

View vulnerability details (6 items)

1. brace-expansion various

CVE: GHSA-f886-m6hf-6m8v
Severity: MODERATE
Fixed in: True

brace-expansion: Zero-step sequence causes process hang and memory exhaustion

2. express-rate-limit various

CVE: N/A
Severity: MODERATE
Fixed in: True

Vulnerability in express-rate-limit

3. fast-uri various

CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
Severity: HIGH
Fixed in: True

fast-uri vulnerable to path traversal via percent-encoded dot segments

4. hono various

CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
Severity: MODERATE
Fixed in: True

Hono has CSS Declaration Injection via Style Object Values in JSX SSR

5. ip-address various

CVE: GHSA-v2v4-37r5-5v8g
Severity: MODERATE
Fixed in: True

ip-address has XSS in Address6 HTML-emitting methods

6. postcss various

CVE: GHSA-qx2v-qp2m-jg93
Severity: MODERATE
Fixed in: True

PostCSS has XSS via Unescaped </style> in its CSS Stringify Output

Powered by Codity.ai · Docs

codity-dm · 2026-05-20T17:46:36Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	4
Scanner	pip-audit

View vulnerability details (4 items)

1. pip 24.0

CVE: GHSA-4xh5-x5gv-qwph
Fixed in: 25.3

When extracting a tar archive pip may not check symbolic links point into the extraction directory if the tarfile module doesn't implement PEP 706. Note that upgrading pip to a "fixed" version for thi

2. pip 24.0

CVE: GHSA-6vgw-5pg2-w6jp
Fixed in: 26.0

When pip is installing and extracting a maliciously crafted wheel archive, files may be extracted outside the installation directory. The path traversal is limited to prefixes of the installation dire

3. pip 24.0

CVE: GHSA-58qw-9mgm-455v
Fixed in:

pip handles concatenated tar and ZIP files as ZIP files regardless of filename or whether a file is both a tar and ZIP file. This behavior could result in confusing installation behavior, such as inst

4. pip 24.0

CVE: GHSA-jp4c-xjxw-mgf9
Fixed in: 26.1

pip prior to version 26.1 would run self-update check functionality after installing wheel files which required importing well-known Python modules names. These module imports were intentionally defer

Powered by Codity.ai · Docs

codity-dm · 2026-05-20T17:48:51Z

License Compliance Scan

Metric	Value
Packages Scanned	778
High Risk (Strong Copyleft)	0
Medium Risk (Weak Copyleft)	6
Low Risk (Permissive)	677
Unknown License	95

Weak copyleft licenses found - verify compatibility

Some packages have unknown licenses - manual review required

Medium Risk Licenses - 6 packages

(MPL-2.0 OR Apache-2.0) (1 packages):

dompurify 3.4.0

MPL-2.0 (5 packages):

github.com/philippgille/chromem-go 0.7.0
github.com/libp2p/go-yamux/v5 5.0.1
github.com/hashicorp/golang-lru 1.0.2
github.com/hashicorp/golang-lru/v2 2.0.7
github.com/shoenig/go-m1cpu 0.1.6

Unknown Licenses - 95 packages

github.com/cloudflare/circl 1.6.1
github.com/bwmarrin/discordgo 0.29.0
github.com/dslipak/pdf 0.0.2
github.com/cyphar/filepath-securejoin 0.5.1
github.com/eritikass/githubmarkdownconvertergo 0.1.10
github.com/go-git/go-git/v5 5.16.4
github.com/go-git/go-billy/v5 5.6.2
github.com/go-git/gcfg 1.5.1-0.20230307220236-3a3c6141e376
github.com/gocolly/colly 1.2.0
github.com/golang/protobuf 1.5.4
github.com/google/go-querystring 1.1.0
github.com/google/go-github/v69 69.2.0
github.com/kennygrant/sanitize 1.2.4
github.com/mschoch/smat 0.2.0
github.com/mudler/skillserver 0.0.6
github.com/olekukonko/tablewriter 0.0.5
github.com/pjbgf/sha1cd 0.3.2
github.com/segmentio/asm 1.1.3
github.com/saintfish/chardet 0.0.0-20230101081208-5e3ef4b5456d
github.com/sergi/go-diff 1.4.0

...and 75 more

Powered by Codity.ai · Docs

codity-dm · 2026-05-20T17:49:24Z

Code Quality Report — test-org-codity/LocalAI · PR #7

Scanned: 2026-05-20 17:49 UTC | Score: 14/100 | Provider: github

Executive Summary

Severity	Count
Critical	0
High	1
Medium	4
Low	211

Top Findings

[CQ-LLM-003] `core/application/upgrade_checker.go:146` (Error_Handling · HIGH)

Issue: Potentially swallowing errors when backendManagerFn is nil.
Suggestion: Ensure that errors are logged or handled appropriately when backendManagerFn is nil.

if uc.backendManagerFn != nil {

[CQ-LLM-001] `core/application/distributed.go:243` (Complexity · MEDIUM)

Issue: The function initDistributed has increased cyclomatic complexity due to additional parameters and logic.
Suggestion: Consider breaking down the function into smaller, more manageable functions.

func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB) (*Distribut

[CQ-LLM-002] `core/application/distributed.go:243` (Documentation · MEDIUM)

Issue: Missing detailed documentation for the new parameters in the ReplicaReconciler.
Suggestion: Add JSDoc comments explaining the purpose of each new parameter.

// Create ReplicaReconciler for auto-scaling model replicas. Adapter +

[CQ-LLM-004] `core/application/upgrade_checker.go:146` (Performance · MEDIUM)

Issue: The runCheck function may lead to performance issues due to unnecessary checks if backendManagerFn is nil.
Suggestion: Optimize the logic to avoid unnecessary calls when backendManagerFn is not set.

if uc.backendManagerFn != nil {

[CQ-LLM-006] `core/gallery/upgrade.go:24` (Documentation · MEDIUM)

Issue: Missing documentation for the new NodeDrift field in UpgradeInfo.
Suggestion: Add a comment explaining the purpose of NodeDrift.

// NodeDrift lists nodes whose installed version or digest differs from

[CQ-008] `core/application/distributed.go:256` (Maintainability · LOW)

Issue: Magic number 30 in code
Suggestion: Extract to a named constant

Interval:          30 * time.Second,

[CQ-009] `core/application/upgrade_checker.go:53` (Style · LOW)

Issue: Line exceeds 120 characters (163 chars)
Suggestion: Break long lines into multiple lines for readability

func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB, backendManagerFn func() ...

[CQ-008] `core/application/upgrade_checker.go:81` (Maintainability · LOW)

Issue: Magic number 10 in code
Suggestion: Extract to a named constant

initialDelay := 10 * time.Second

[CQ-LLM-005] `core/gallery/backends.go:395` (Maintainability · LOW)

Issue: The use of magic strings for NodeStatus can lead to maintainability issues.
Suggestion: Consider using constants for the different node statuses.

NodeStatus  string `json:"node_status"` // healthy | unhealthy | offline | draining | pending

[CQ-009] `core/gallery/upgrade.go:59` (Style · LOW)

Issue: Line exceeds 120 characters (175 chars)
Suggestion: Break long lines into multiple lines for readability

func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBac...

Per-File Breakdown

File	High	Medium	Low	Total
`core/application/distributed.go`	0	2	1	3
`core/application/upgrade_checker.go`	1	1	2	4
`core/gallery/backends.go`	0	0	1	1
`core/gallery/upgrade.go`	0	1	1	2
`core/http/react-ui/src/App.css`	0	0	14	14
`core/http/react-ui/src/components/FilterBar.jsx`	0	0	1	1
`core/http/react-ui/src/components/NodeDistributionChip.jsx`	0	0	11	11
`core/http/react-ui/src/components/Popover.jsx`	0	0	2	2
`core/http/react-ui/src/pages/Manage.jsx`	0	0	122	122
`core/http/react-ui/src/pages/Nodes.jsx`	0	0	36	36
`core/services/advisorylock/keys.go`	0	0	1	1
`core/services/nodes/managers_distributed.go`	0	0	1	1
`core/services/nodes/reconciler.go`	0	0	2	2
`core/services/nodes/reconciler_test.go`	0	0	9	9
`core/services/nodes/registry.go`	0	0	7	7

Recommendations

Resolve High severity issues, especially error handling gaps and performance bottlenecks.

Run automated tests after applying fixes to verify no regressions.

DhirenMhatre · 2026-05-24T20:08:01Z

@codity review

codity-dm · 2026-05-24T20:08:14Z

Policy Check Failed

✗ 3/3 policy checks failed:

• Need 2 more approval(s) (0/2) — comment LGTM or approve via review
• Missing ticket reference (expected: JIRA-, ENG-, #*)
• 20 code file(s) changed but no test files added

To merge this PR:

Address the failed checks listed above
Ensure branch protection requires the codity/policy-check status

Configure policies in your dashboard

codity-dm · 2026-05-24T20:08:22Z

PR Summary

What Changed

Fixed distributed-mode upgrade detection by routing checks through the active backend manager instead of empty local filesystem.
Added durable pending_backend_ops queue for backend lifecycle operations (install/upgrade/delete) with retry logic and dead-lettering.
Built cluster drift visibility into the UI: per-node version comparison, ghost model cleanup, and node distribution chips.

Key Changes by Area

Backend Lifecycle

Replaced fire-and-forget operations with persistent queue (pending_backend_ops table) and exponential backoff retry.
Added CheckUpgradesAgainst() for cluster-wide upgrade detection with NodeDrift outlier reporting.

Distributed Coordination

ReplicaReconciler now health-probes stale models and drains pending backend ops under advisory lock KeyStateReconciler.

Frontend

Added NodeDistributionChip component showing model/backend distribution across nodes with drift indicators.
Added URL-persisted filters, auto-refresh, and upgrade banners in Manage/Nodes pages.

Files Changed

File	Changes Summary
core/application/distributed.go	`ReplicaReconciler` receives auth config for worker health probes
core/application/startup.go	Startup pruning of malformed pending backend ops
core/application/upgrade_checker.go	Added `backendManagerFn` lazy getter for distributed routing
core/cli/worker.go	Worker reports `Version`, `URI`, `Digest` in lifecycle events
core/gallery/backends.go	Extended `SystemBackend` with `Nodes []NodeBackendRef` for attribution
core/gallery/upgrade.go	Added `CheckUpgradesAgainst()` with drift detection
core/gallery/upgrade_test.go	Tests for drift detection, alignment, empty versions
core/http/react-ui/src/App.css	Utility classes for badges, tables, popovers, status indicators
core/http/react-ui/src/components/FilterBar.jsx	Reusable search + filter + toggle control strip
core/http/react-ui/src/components/NodeDistributionChip.jsx	Smart distribution display with popover for large clusters
core/http/react-ui/src/components/Popover.jsx	Accessible popover with auto-positioning and focus management
core/http/react-ui/src/pages/Manage.jsx	Drift badges, filters, auto-refresh, upgrade actions
core/http/react-ui/src/pages/Nodes.jsx	Stat cards, backend distribution column, worker registration UX
core/http/routes/ui_api.go	Added `LoadedOn` and `Source=registry-only` for ghost models
core/services/advisorylock/keys.go	Added `KeyStateReconciler` advisory lock
core/services/galleryop/service.go	Exposed `BackendManager()` getter for upgrade checker
core/services/messaging/subjects.go	Extended `NodeBackendInfo` with version/digest/URI
core/services/nodes/managers_distributed.go	`enqueueAndDrainBackendOp` with durable queue integration
core/services/nodes/reconciler.go	`probeLoadedModels` and `drainPendingBackendOps` with stale probing
core/services/nodes/reconciler_test.go	Tests for ghost detection and health probe behavior
core/services/nodes/registry.go	`PendingBackendOp` model with upsert/list/retry/dead-letter methods

Review Focus Areas

Dead-letter logic in RecordPendingBackendOpFailure: verify 10-attempt cap and 15m backoff ceiling.
Advisory lock splitting: confirm KeyStateReconciler prevents stampede without deadlocking replica scaling.
probeStaleAfter default (2m): check this balances ghost cleanup vs. probe overhead for your cluster size.

Architecture

Design Decisions

Queue-based operations trade immediate feedback for durability. UI shows "pending on N nodes" to compensate.
Separate advisory locks for state reconciliation vs. replica scaling: intentional to allow concurrent scaling during queue drainage.
Lazy backendManagerFn in upgrade checker: avoids circular dependency between gallery service and distributed manager initialization.

Scalability & Extensibility

NodeDistributionChip collapses at >3 nodes: threshold is hardcoded, may need tuning for dense clusters.
Out of scope: cross-region backend replication, pluggable drift policies.

Risks

Intentional: Ghost model probing adds gRPC load; mitigated by 2m staleness threshold but no rate limiting between reconciler runs.
Unintentional: UpsertPendingBackendOp composite index on (node_id, backend, op) may hotspot on popular backends during mass upgrades.

Merge Status

NOT MERGEABLE — PR Score 9/100, below threshold (50)

[H1] 1 CRITICAL security vulnerability found (SEC-001)
[H4] PR quality score (9) is below merge floor (50)
[H5] 10 HIGH-severity inline review findings need resolution (threshold: 3)
[H6] Code quality raw score (15) is below merge floor (40)

codity-dm · 2026-05-24T20:16:58Z

+  useEffect(() => {
+    if (!open && anchor?.current) {
+      // requestAnimationFrame so the close is painted before focus jumps;
+      // otherwise screen readers announce the trigger mid-transition.
+      const raf = requestAnimationFrame(() => anchor.current?.focus?.())
+      return () => cancelAnimationFrame(raf)
+    }
+  }, [open, anchor])


The focus-restoration useEffect fires on initial mount, not only when the popover closes. When the component first mounts open is false and anchor.current is already set (React attaches refs in the commit phase, before passive effects run). The condition !open && anchor?.current is therefore true on the very first render, so requestAnimationFrame(() => anchor.current?.focus?.()) is scheduled immediately, stealing keyboard focus from whatever the user was doing. Every NodeDistributionChip rendered in summary mode (4+ nodes) triggers this on page load, causing the last chip's trigger button to unexpectedly receive focus.

Suggested fix

const popoverWasOpenRef = useRef(false) useEffect(() => { const wasOpen = popoverWasOpenRef.current popoverWasOpenRef.current = open if (!open && wasOpen && anchor?.current) { const raf = requestAnimationFrame(() => anchor.current?.focus?.()) return () => cancelAnimationFrame(raf) } }, [open, anchor])

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/components/Popover.jsx Lines: 64-71 Issue Type: functional-high Severity: high Issue Description: The focus-restoration `useEffect` fires on **initial mount**, not only when the popover closes. When the component first mounts `open` is `false` and `anchor.current` is already set (React attaches refs in the commit phase, before passive effects run). The condition `!open && anchor?.current` is therefore `true` on the very first render, so `requestAnimationFrame(() => anchor.current?.focus?.())` is scheduled immediately, stealing keyboard focus from whatever the user was doing. Every `NodeDistributionChip` rendered in summary mode (4+ nodes) triggers this on page load, causing the last chip's trigger button to unexpectedly receive focus. Current Code: useEffect(() => { if (!open && anchor?.current) { // requestAnimationFrame so the close is painted before focus jumps; // otherwise screen readers announce the trigger mid-transition. const raf = requestAnimationFrame(() => anchor.current?.focus?.()) return () => cancelAnimationFrame(raf) } }, [open, anchor]) --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-24T20:16:58Z

+                  role="tab"
+                  aria-selected={activeFilter === f.key}
+                  className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
+                  onClick={() => onFilterChange(f.key)}


onFilterChange is invoked unconditionally inside the onClick handler whenever hasFilters is true, but the prop has no default value and is not declared as required. If a caller passes a non-empty filters array while omitting onFilterChange (a realistic mistake when wiring up a read-only filter strip), clicking any chip throws TypeError: onFilterChange is not a function. The optional-chaining call operator ?.() is the idiomatic zero-cost guard here.

Suggested change

onClick={() => onFilterChange(f.key)}

onClick={() => onFilterChange?.(f.key)}

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/components/FilterBar.jsx Lines: 59-59 Issue Type: robustness-medium Severity: medium Issue Description: `onFilterChange` is invoked unconditionally inside the `onClick` handler whenever `hasFilters` is true, but the prop has no default value and is not declared as required. If a caller passes a non-empty `filters` array while omitting `onFilterChange` (a realistic mistake when wiring up a read-only filter strip), clicking any chip throws `TypeError: onFilterChange is not a function`. The optional-chaining call operator `?.()` is the idiomatic zero-cost guard here. Current Code: onClick={() => onFilterChange(f.key)} --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-24T20:16:58Z

+    setUpgradingAll(true)
+    try {
+      // Serial upgrade — matches the gallery's Upgrade All behavior.
+      // Each backend upgrade is itself a cluster-wide fan-out, so parallel
+      // calls would multiply load on every worker.
+      for (const name of names) {
+        try {
+          await backendsApi.upgrade(name)
+        } catch (err) {
+          addToast(`Upgrade failed for ${name}: ${err.message}`, 'error')


handleUpgradeAll shows the success info-toast unconditionally after the loop, even when every individual upgrade threw an error. An operator who clicks "Upgrade all" and gets N error toasts followed by "Upgrade started for N backends" receives conflicting feedback and has no reliable way to know whether any upgrade was actually initiated. Additionally err.message is undefined for non-Error rejections (e.g. a plain string throw) so the per-upgrade error toasts can display "Upgrade failed for foo: undefined".

Suggested fix

let succeeded = 0 for (const name of names) { try { await backendsApi.upgrade(name) succeeded++ } catch (err) { addToast(`Upgrade failed for ${name}: ${err.message ?? err}`, 'error') } } if (succeeded > 0) { addToast(`Upgrade started for ${succeeded} backend${succeeded === 1 ? '' : 's'}`, 'info') }

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert javascript developer with deep knowledge of security, performance, and best practices. ### Context File: core/http/react-ui/src/pages/Manage.jsx Lines: 269-278 Issue Type: functional-medium Severity: medium Issue Description: `handleUpgradeAll` shows the success info-toast unconditionally after the loop, even when every individual upgrade threw an error. An operator who clicks "Upgrade all" and gets N error toasts followed by "Upgrade started for N backends" receives conflicting feedback and has no reliable way to know whether any upgrade was actually initiated. Additionally `err.message` is `undefined` for non-`Error` rejections (e.g. a plain string `throw`) so the per-upgrade error toasts can display "Upgrade failed for foo: undefined". Current Code: for (const name of names) { try { await backendsApi.upgrade(name) } catch (err) { addToast(`Upgrade failed for ${name}: ${err.message}`, 'error') } } addToast(`Upgrade started for ${names.length} backend${names.length === 1 ? '' : 's'}`, 'info') --- ### Instructions 1. Fix the issue described above 2. Maintain the exact indentation and code style from the original 3. Follow javascript best practices and language-specific idioms 4. Ensure the fix addresses the root cause, not just the symptoms 5. Add brief inline comments explaining the fix if needed ### Constraints - Do not change functionality beyond fixing the identified issue - Preserve existing variable names and function signatures unless they are part of the problem - Ensure the fix is production-ready ---

codity-dm · 2026-05-24T20:18:08Z

Security Scan Summary

Metric	Value
Vulnerabilities	Critical: 0
Overall Risk	Clean
Files Scanned	21

No critical security issues detected

Scan completed in 40.6s

Security scan powered by Codity.ai

codity-dm · 2026-05-24T20:19:40Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	7
Scanner	npm audit

View vulnerability details (7 items)

1. brace-expansion various

CVE: GHSA-f886-m6hf-6m8v
Severity: MODERATE
Fixed in: True

brace-expansion: Zero-step sequence causes process hang and memory exhaustion

2. express-rate-limit various

CVE: N/A
Severity: MODERATE
Fixed in: True

Vulnerability in express-rate-limit

3. fast-uri various

CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
Severity: HIGH
Fixed in: True

fast-uri vulnerable to path traversal via percent-encoded dot segments

4. hono various

CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
Severity: MODERATE
Fixed in: True

Hono has CSS Declaration Injection via Style Object Values in JSX SSR

5. ip-address various

CVE: GHSA-v2v4-37r5-5v8g
Severity: MODERATE
Fixed in: True

ip-address has XSS in Address6 HTML-emitting methods

6. postcss various

CVE: GHSA-qx2v-qp2m-jg93
Severity: MODERATE
Fixed in: True

PostCSS has XSS via Unescaped </style> in its CSS Stringify Output

7. qs various

CVE: GHSA-q8mj-m7cp-5q26
Severity: MODERATE
Fixed in: True

qs has a remotely triggerable DoS: qs.stringify crashes with TypeError on null/undefined entries in comma-format arrays when encodeValuesOnly is set

Powered by Codity.ai · Docs

codity-dm · 2026-05-24T20:19:41Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	4
Scanner	pip-audit

View vulnerability details (4 items)

1. pip 24.0

CVE: GHSA-4xh5-x5gv-qwph
Fixed in: 25.3

When extracting a tar archive pip may not check symbolic links point into the extraction directory if the tarfile module doesn't implement PEP 706. Note that upgrading pip to a "fixed" version for thi

2. pip 24.0

CVE: GHSA-6vgw-5pg2-w6jp
Fixed in: 26.0

When pip is installing and extracting a maliciously crafted wheel archive, files may be extracted outside the installation directory. The path traversal is limited to prefixes of the installation dire

3. pip 24.0

CVE: GHSA-58qw-9mgm-455v
Fixed in: 26.1

pip handles concatenated tar and ZIP files as ZIP files regardless of filename or whether a file is both a tar and ZIP file. This behavior could result in confusing installation behavior, such as inst

4. pip 24.0

CVE: GHSA-jp4c-xjxw-mgf9
Fixed in: 26.1

pip prior to version 26.1 would run self-update check functionality after installing wheel files which required importing well-known Python modules names. These module imports were intentionally defer

Powered by Codity.ai · Docs

codity-dm · 2026-05-24T20:20:06Z

License Compliance Scan

Metric	Value
Packages Scanned	778
High Risk (Strong Copyleft)	0
Medium Risk (Weak Copyleft)	6
Low Risk (Permissive)	646
Unknown License	126

Weak copyleft licenses found - verify compatibility

Some packages have unknown licenses - manual review required

Medium Risk Licenses - 6 packages

(MPL-2.0 OR Apache-2.0) (1 packages):

dompurify 3.4.0

MPL-2.0 (5 packages):

github.com/philippgille/chromem-go 0.7.0
github.com/libp2p/go-yamux/v5 5.0.1
github.com/hashicorp/golang-lru 1.0.2
github.com/hashicorp/golang-lru/v2 2.0.7
github.com/shoenig/go-m1cpu 0.1.6

Unknown Licenses - 126 packages

github.com/moby/moby/api 1.54.1
github.com/moby/moby/client 0.4.0
github.com/nats-io/nkeys 0.4.15
github.com/nats-io/nuid 1.0.1
github.com/wk8/go-ordered-map/v2 2.1.8
github.com/ProtonMail/go-crypto 1.1.6
github.com/PuerkitoBio/goquery 1.10.3
github.com/RoaringBitmap/roaring/v2 2.4.5
github.com/andybalholm/cascadia 1.3.3
github.com/antchfx/htmlquery 1.3.4
github.com/antchfx/xmlquery 1.4.4
github.com/antchfx/xpath 1.3.4
github.com/bits-and-blooms/bitset 1.24.0
github.com/blevesearch/bleve/v2 2.5.7
github.com/blevesearch/bleve_index_api 1.2.11
github.com/blevesearch/geo 0.2.4
github.com/blevesearch/go-porterstemmer 1.0.3
github.com/blevesearch/gtreap 0.1.1
github.com/blevesearch/mmap-go 1.0.4
github.com/blevesearch/scorch_segment_api/v2 2.3.13

...and 106 more

Powered by Codity.ai · Docs

codity-dm · 2026-05-24T20:20:25Z

Code Quality Report — test-org-codity/LocalAI · PR #7

Scanned: 2026-05-24 20:20 UTC | Score: 15/100 | Provider: github

Executive Summary

Severity	Count
Critical	0
High	1
Medium	3
Low	211

Top Findings

[CQ-LLM-003] `core/gallery/upgrade.go:25` (Error_Handling · HIGH)

Issue: The CheckBackendUpgrades function does not handle the case where ListSystemBackends returns a nil result.
Suggestion: Add error handling to ensure that the function gracefully handles nil results from ListSystemBackends.

installed, err := ListSystemBackends(systemState)

[CQ-LLM-001] `core/application/distributed.go:243` (Complexity · MEDIUM)

Issue: The function initDistributed has increased cyclomatic complexity due to the addition of multiple parameters and logic.
Suggestion: Consider breaking down the function into smaller, more manageable functions to reduce complexity.

func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB) (*Distribut

[CQ-LLM-004] `core/application/upgrade_checker.go:41` (Documentation · MEDIUM)

Issue: Missing documentation for the backendManagerFn parameter in NewUpgradeChecker.
Suggestion: Add a docstring to explain the purpose of the backendManagerFn parameter.

func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB, backendManagerFn func() galleryop.BackendManager) *UpgradeChecker {

[CQ-LLM-002] `core/application/upgrade_checker.go:66` (Duplication · MEDIUM)

Issue: The logic for checking upgrades is duplicated in the runCheck method, which could lead to maintenance issues.
Suggestion: Refactor the upgrade checking logic into a separate function to adhere to the DRY principle.

if upgrades == nil && err == nil { upgrades, err = gallery.CheckBackendUpgrades(ctx, uc.galleries, uc.systemState) }

[CQ-008] `core/application/distributed.go:256` (Maintainability · LOW)

Issue: Magic number 30 in code
Suggestion: Extract to a named constant

Interval:          30 * time.Second,

[CQ-009] `core/application/upgrade_checker.go:53` (Style · LOW)

Issue: Line exceeds 120 characters (163 chars)
Suggestion: Break long lines into multiple lines for readability

func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB, backendManagerFn func() ...

[CQ-008] `core/application/upgrade_checker.go:81` (Maintainability · LOW)

Issue: Magic number 10 in code
Suggestion: Extract to a named constant

initialDelay := 10 * time.Second

[CQ-LLM-005] `core/gallery/backends.go:396` (Maintainability · LOW)

Issue: The Nodes field in SystemBackend is not well documented, making it harder to maintain.
Suggestion: Add a comment explaining the purpose and usage of the Nodes field.

Nodes []NodeBackendRef `json:"nodes,omitempty"`

[CQ-009] `core/gallery/upgrade.go:59` (Style · LOW)

Issue: Line exceeds 120 characters (175 chars)
Suggestion: Break long lines into multiple lines for readability

func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBac...

[CQ-008] `core/http/react-ui/src/App.css:1569` (Maintainability · LOW)

Issue: Magic number 400 in code
Suggestion: Extract to a named constant

font-weight: 400;

Per-File Breakdown

File	High	Medium	Low	Total
`core/application/distributed.go`	0	1	1	2
`core/application/upgrade_checker.go`	0	2	2	4
`core/gallery/backends.go`	0	0	1	1
`core/gallery/upgrade.go`	1	0	1	2
`core/http/react-ui/src/App.css`	0	0	14	14
`core/http/react-ui/src/components/FilterBar.jsx`	0	0	1	1
`core/http/react-ui/src/components/NodeDistributionChip.jsx`	0	0	11	11
`core/http/react-ui/src/components/Popover.jsx`	0	0	2	2
`core/http/react-ui/src/pages/Manage.jsx`	0	0	122	122
`core/http/react-ui/src/pages/Nodes.jsx`	0	0	36	36
`core/services/advisorylock/keys.go`	0	0	1	1
`core/services/nodes/managers_distributed.go`	0	0	1	1
`core/services/nodes/reconciler.go`	0	0	2	2
`core/services/nodes/reconciler_test.go`	0	0	9	9
`core/services/nodes/registry.go`	0	0	7	7

Recommendations

Resolve High severity issues, especially error handling gaps and performance bottlenecks.

Run automated tests after applying fixes to verify no regressions.

DhirenMhatre · 2026-05-24T20:41:36Z

@codity review

DhirenMhatre · 2026-05-24T20:42:55Z

@codity review

codity-dm · 2026-05-24T20:43:09Z

Policy Check Failed

✗ 3/3 policy checks failed:

• Need 2 more approval(s) (0/2) — comment LGTM or approve via review
• Missing ticket reference (expected: JIRA-, ENG-, #*)
• 20 code file(s) changed but no test files added

To merge this PR:

Address the failed checks listed above
Ensure branch protection requires the codity/policy-check status

Configure policies in your dashboard

codity-dm · 2026-05-24T20:43:16Z

PR Summary

What Changed

Fixed distributed upgrade detection to aggregate backend state from all workers instead of checking the frontend's empty filesystem.
Added durable queue for backend lifecycle operations (install/upgrade/delete) with retry logic and dead-lettering after 10 attempts.
Built new React components for cluster visibility: node distribution chips, popovers, and filter bars with URL-persisted state.

Key Changes by Area

Distributed Backend Management

UpgradeChecker now routes through DistributedBackendManager to aggregate SystemBackends from all workers (core/application/upgrade_checker.go:47-52).
Added CheckUpgradesAgainst() entry point accepting pre-aggregated state instead of local filesystem reads (core/gallery/upgrade.go:43-68).
Implemented summarizeNodeDrift() to detect version/digest mismatches across nodes and flag upgrades when nodes disagree (core/gallery/upgrade.go:159-198).

Backend Operation Queue

New PendingBackendOp table with composite key (node_id, backend, op) and exponential backoff (core/services/nodes/registry.go:107-130).
enqueueAndDrainBackendOp records intent before NATS calls, queues operations for offline nodes, and retries healthy nodes with backoff (core/services/nodes/managers_distributed.go:97-167).

State Reconciliation

Split ReplicaReconciler into separate advisory-locked phases for replica scaling vs. state reconciliation (core/services/nodes/reconciler.go:113-130).
Added drainPendingBackendOps() with retry/dead-letter logic (reconciler.go:171-242).
Added probeLoadedModels() to health-check stale models via gRPC and remove unreachable entries (reconciler.go:242-301).

Cluster-Aware API

Extended /api/models with LoadedOn field showing worker node distribution (core/http/routes/ui_api.go).
Added Source: "registry-only" for ghost models discovered by reconciler but missing local config.

React UI

NodeDistributionChip: inline chips for small clusters, summary+popover for larger ones, with drift detection (core/http/react-ui/src/components/NodeDistributionChip.jsx).
FilterBar: shared search/filter/toggle control with URL persistence (core/http/react-ui/src/components/FilterBar.jsx).
Manage.jsx: 10s auto-refresh in distributed mode, backend upgrade banner, filter counts, "Adopted" badge for registry-only models.
Added CSS utilities for table cells, stat cards, and node status indicators (core/http/react-ui/src/App.css:1529-1560).

Files Changed

File	Changes Summary
core/application/distributed.go	Added `Adapter`, `RegistrationToken`, `ProbeStaleAfter` to `ReplicaReconcilerOptions`
core/application/startup.go	Wired `backendManagerFn` for distributed upgrade checks
core/application/upgrade_checker.go	Fixed to use lazy `backendManagerFn` routing through `DistributedBackendManager`
core/cli/worker.go	Extended worker lifecycle events to report `Version`, `URI`, `Digest` metadata
core/gallery/backends.go	Added `NodeDriftInfo` and `NodeDrift` fields to `UpgradeInfo` and `SystemBackend`
core/gallery/upgrade.go	Added `CheckUpgradesAgainst()` and `summarizeNodeDrift()` algorithm
core/gallery/upgrade_test.go	Added tests for drift detection, cluster agreement, empty-version scenarios
core/http/react-ui/src/App.css	Added badge, cell-stack, stat-grid, filter-bar, popover, node-status CSS classes
core/http/react-ui/src/components/FilterBar.jsx	New shared search/filter/toggle component
core/http/react-ui/src/components/NodeDistributionChip.jsx	New component for node distribution visualization with drift detection
core/http/react-ui/src/components/Popover.jsx	New accessible popover with auto-positioning and focus management
core/http/react-ui/src/pages/Manage.jsx	Integrated distribution chips, URL-persisted filters, auto-refresh, upgrade banner
core/http/react-ui/src/pages/Nodes.jsx	Added stat cards, CSS class refactoring, improved worker registration UX
core/http/routes/ui_api.go	Added `LoadedOn` field and `registry-only` source for ghost models
core/services/advisorylock/keys.go	Added `KeyStateReconciler` advisory lock key
core/services/galleryop/service.go	Exposed `BackendManager()` getter for upgrade checker integration
core/services/messaging/subjects.go	Added `Version`, `URI`, `Digest` to `NodeBackendInfo` for upgrade detection
core/services/nodes/managers_distributed.go	Added `enqueueAndDrainBackendOp` with pending-ops queue pattern
core/services/nodes/reconciler.go	Split reconciliation phases, added drain and probe logic with advisory locks
core/services/nodes/reconciler_test.go	Added `fakeProber`, tests for state reconciliation and model probing
core/services/nodes/registry.go	Added `PendingBackendOp` table, queue management methods, startup cleanup

Review Focus Areas

Dead-letter behavior: Operations abandoned after 10 attempts in drainPendingBackendOps - confirm this matches operational expectations.
Advisory lock scope: Verify KeyStateReconciler separation from replica scaling prevents blocking without creating lock ordering issues.
Ghost model probing: grpcModelProber health checks run against model gRPC addresses - ensure timeout handling aligns with ProbeStaleAfter config.

Architecture

Design Decisions: The pending-ops queue trades immediate feedback for durability. Operations on unhealthy nodes queue silently rather than fail fast. This is intentional for distributed deployments where nodes may be temporarily unavailable. The split advisory locks (replica vs. state) accept that state reconciliation may lag behind replica scaling to prevent head-of-line blocking.
Scalability & Extensibility: CheckUpgradesAgainst accepts pre-aggregated SystemBackends, allowing future aggregation strategies (caching, incremental updates) without changing the drift detection algorithm. The ModelProber interface permits non-gRPC health checks if needed later.
Risks:
- Intentional: Dead-lettered operations require manual intervention. Exponential backoff caps at 15 minutes, which may delay recovery for long partitions.
- Unintentional: probeLoadedModels bumps updated_at on successful health checks, which may mask actual staleness if the prober returns false positives.

codity-dm · 2026-05-24T20:50:25Z

Security Scan Summary

Metric	Value
Vulnerabilities	Critical: 0
Overall Risk	Clean
Files Scanned	21

No critical security issues detected

Scan completed in 39.4s

Security scan powered by Codity.ai

DhirenMhatre · 2026-05-25T11:13:45Z

@codity review

codity-ai · 2026-05-25T11:13:56Z

PR Summary

What Changed

Fixed distributed upgrade detection so frontends query workers instead of checking empty local state. Added cluster drift detection when nodes disagree on backend versions.
Added durable queue for backend lifecycle operations (install/upgrade/delete) with retry and dead-letter handling instead of skipping offline nodes.
Built new React components for cluster visibility: node distribution chips, popovers, and filter bars with URL-persisted state.

Key Changes by Area

Distributed Backend Management

core/services/nodes/managers_distributed.go: Replaced fire-and-forget operations with enqueueAndDrainBackendOp queue pattern. Non-healthy nodes get "queued" status with reconciler retry.
core/services/nodes/reconciler.go: Added probeLoadedModels() for ghost entry cleanup and drainPendingBackendOps() for retry processing with exponential backoff.

Upgrade Detection

core/gallery/upgrade.go: New CheckUpgradesAgainst() compares gallery against aggregated worker state. Detects NodeDrift when cluster nodes disagree.
core/application/upgrade_checker.go: Added backendManagerFn lazy getter to route through active backend manager.

Cluster Data Model

core/gallery/backends.go: Extended SystemBackend with Nodes []NodeBackendRef for per-node version/digest tracking.
core/services/nodes/registry.go: Added PendingBackendOp table with upsert semantics, exponential backoff, and startup pruning.

UI Components

NodeDistributionChip.jsx: Renders inline chips (≤3 nodes) or summary popover showing health, drift, and offline counts.
Popover.jsx: Accessible anchored surface with auto-positioning and focus management.
FilterBar.jsx: Reusable search + filter chip + toggle strip.

Management UI

Manage.jsx: URL-persisted filters (mq, mf, bq, bf), 10s auto-refresh in distributed mode, upgrade badges, NodeDistributionChip integration.
Nodes.jsx: Backend version/digest display with drift detection in expandable rows, "Register a new worker" flow.

Files Changed

File	Changes Summary
core/application/distributed.go	Added `Adapter`, `RegistrationToken`, `ProbeStaleAfter` to `ReplicaReconcilerOptions`
core/application/startup.go	(Inferred: reconciler initialization with new options)
core/application/upgrade_checker.go	Added `backendManagerFn` lazy getter for distributed routing
core/cli/worker.go	Extended backend reporting with `Version`, `URI`, `Digest`
core/gallery/backends.go	Added `Nodes []NodeBackendRef` to `SystemBackend` for drift tracking
core/gallery/upgrade.go	New `CheckUpgradesAgainst()` with cluster aggregation and drift detection
core/gallery/upgrade_test.go	Test suite for drift detection, unanimous clusters, edge cases
core/http/react-ui/src/App.css	Table utilities, stat cards, popovers, status indicators, tab badges
core/http/react-ui/src/components/FilterBar.jsx	New reusable search/filter/toggle control strip
core/http/react-ui/src/components/NodeDistributionChip.jsx	New component for node distribution display with popover
core/http/react-ui/src/components/Popover.jsx	New accessible anchored popover with focus management
core/http/react-ui/src/pages/Manage.jsx	URL filters, auto-refresh, upgrade badges, `NodeDistributionChip` integration
core/http/react-ui/src/pages/Nodes.jsx	Version/digest display, drift detection, worker registration polish
core/http/routes/ui_api.go	`/api/models` joins node registry for `LoadedOn`, registry-only ghost detection
core/services/advisorylock/keys.go	Added `KeyStateReconciler` advisory lock key
core/services/galleryop/service.go	Exposed `BackendManager()` getter for upgrade checker
core/services/messaging/subjects.go	Extended `NodeBackendInfo` with `Version`, `URI`, `Digest`
core/services/nodes/managers_distributed.go	`enqueueAndDrainBackendOp` queue pattern, `DeleteBackendDetailed` variant
core/services/nodes/reconciler.go	`probeLoadedModels()`, `drainPendingBackendOps()`, split advisory locks
core/services/nodes/reconciler_test.go	`fakeProber` helper, tests for model probing and pending ops
core/services/nodes/registry.go	`PendingBackendOp` table, upsert/queue/retry methods, startup pruning

Review Focus Areas

Pending ops dead-letter logic: Verify 10-attempt cap and backoff progression in drainPendingBackendOps.
Drift detection accuracy: Check CheckUpgradesAgainst handles edge cases where all nodes report empty versions.
Reconciler lock splitting: Confirm replica-reconciler vs KeyStateReconciler locks prevent deadlock during state reconciliation.

Architecture

Design Decisions

Queue pattern for backend ops accepts temporary inconsistency (nodes may lag) in exchange for durability. Intentionally skips agent nodes and unapproved nodes to prevent stuck operations.
Split advisory locks allow state reconciliation to run opportunistically (non-blocking try-lock) without blocking replica scaling operations.
Lazy backendManagerFn in upgrade checker defers distributed vs local decision to call time, avoiding circular initialization.

Scalability & Extensibility

PendingBackendOp upsert semantics prevent queue explosion under reissue storms. Composite index on (node_id, backend, op) ensures single entry per target.
NodeDistributionChip scales display: inline for ≤3 nodes, popover summary for larger clusters. Out of scope: server-side pagination for very large clusters (>100 nodes).

Risks

Intentional: Startup pruning of malformed queue rows may drop legitimate ops if node registry state is inconsistent at boot. Acceptable given crash-recovery scenario.
Unintentional: ErrNoResponders marks node unhealthy immediately. May over-trigger if NATS transiently unavailable. Should review thresholding.

Merge Status

NOT MERGEABLE — PR Score 8/100, below threshold (50)

[H1] 1 CRITICAL security vulnerability found (SEC-001)
[H4] PR quality score (8) is below merge floor (50)
[H5] 8 HIGH-severity inline review findings need resolution (threshold: 3)
[H6] Code quality raw score (14) is below merge floor (40)

codity-ai · 2026-05-25T11:14:43Z

Workflow Diagrams

Automatically generated sequence diagrams showing the workflows in this PR

1. ## Analysis

This PR introduces a sophisticated distribut...

Complex complexity • Components: UpgradeChecker, DistributedBackendManager, CheckUpgradesAgainst

sequenceDiagram
    title Distributed Backend Upgrade Detection with Drift Awareness

    participant UC as UpgradeChecker
    participant AL as AdvisoryLock
    participant DB as PostgreSQL
    participant DBM as DistributedBackendManager
    participant Worker as WorkerNode
    participant Gallery as GalleryService
    participant UI as ReactUI

    Note over UC: Periodic check triggered every 6 hours

    UC->>AL: Acquire lock for upgrade check
    alt Lock acquired this is the leader instance
        AL->>DB: SELECT pg_try_advisory_lock(key)
        DB-->>AL: true
        
        UC->>DBM: CheckUpgrades()
        Note over DBM: Aggregates from all workers instead of empty frontend filesystem
        
        DBM->>Worker: ListBackends() via NATS
        Worker-->>DBM: NodeBackendRef with version and digest
        
        DBM->>DBM: Aggregate into SystemBackends with per-node metadata
        
        DBM->>Gallery: CheckUpgradesAgainst(galleries, systemState, installedBackends)
        
        Gallery->>Gallery: summarizeNodeDrift(nodes)
        Note over Gallery: Detects version or digest divergence across cluster
        
        Gallery->>Gallery: Compare gallery version vs majority cluster version
        
        alt Cluster drift detected
            Gallery-->>DBM: UpgradeInfo with NodeDrift populated
        else All nodes agree
            Gallery-->>DBM: UpgradeInfo with empty NodeDrift
        end
        
        DBM-->>UC: map of available upgrades
        
        UC->>UC: Cache results in lastUpgrades
        
        alt Auto-upgrade enabled
            UC->>Gallery: UpgradeBackend()
            Gallery->>DBM: UpgradeBackend(name)
            DBM->>Worker: InstallBackend via NATS
            Worker-->>DBM: success
        end
        
    else Lock not acquired another instance is leader
        AL->>DB: SELECT pg_try_advisory_lock(key)
        DB-->>AL: false
        Note over UC: Skip periodic check this instance is not leader
    end

    UC->>UI: GetAvailableUpgrades() API call
    UI-->>UC: Cached upgrade results including NodeDrift

    Note over UI: Displays upgrade banner with cluster drift indicators

Note: Diagrams show detected patterns only. Complex workflows may require manual review.

codity-ai · 2026-05-25T11:16:44Z

Security Scan Summary

Metric	Value
Vulnerabilities	Critical: 0
Overall Risk	Clean
Files Scanned	21

No critical security issues detected

Scan completed in 58.0s

Security scan powered by Codity.ai

codity-ai · 2026-05-25T11:18:22Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	7
Scanner	npm audit

View vulnerability details (7 items)

1. brace-expansion various

CVE: GHSA-f886-m6hf-6m8v
Severity: MODERATE
Fixed in: True

brace-expansion: Zero-step sequence causes process hang and memory exhaustion

2. express-rate-limit various

CVE: N/A
Severity: MODERATE
Fixed in: True

Vulnerability in express-rate-limit

3. fast-uri various

CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
Severity: HIGH
Fixed in: True

fast-uri vulnerable to path traversal via percent-encoded dot segments

4. hono various

CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
Severity: MODERATE
Fixed in: True

Hono has CSS Declaration Injection via Style Object Values in JSX SSR

5. ip-address various

CVE: GHSA-v2v4-37r5-5v8g
Severity: MODERATE
Fixed in: True

ip-address has XSS in Address6 HTML-emitting methods

6. postcss various

CVE: GHSA-qx2v-qp2m-jg93
Severity: MODERATE
Fixed in: True

PostCSS has XSS via Unescaped </style> in its CSS Stringify Output

7. qs various

CVE: GHSA-q8mj-m7cp-5q26
Severity: MODERATE
Fixed in: True

qs has a remotely triggerable DoS: qs.stringify crashes with TypeError on null/undefined entries in comma-format arrays when encodeValuesOnly is set

Powered by Codity.ai · Docs

codity-ai · 2026-05-25T11:18:23Z

Dependency vulnerability scanning

Metric	Value
Vulnerabilities Found	4
Scanner	pip-audit

View vulnerability details (4 items)

1. pip 24.0

CVE: GHSA-4xh5-x5gv-qwph
Fixed in: 25.3

When extracting a tar archive pip may not check symbolic links point into the extraction directory if the tarfile module doesn't implement PEP 706. Note that upgrading pip to a "fixed" version for thi

2. pip 24.0

CVE: GHSA-6vgw-5pg2-w6jp
Fixed in: 26.0

When pip is installing and extracting a maliciously crafted wheel archive, files may be extracted outside the installation directory. The path traversal is limited to prefixes of the installation dire

3. pip 24.0

CVE: GHSA-58qw-9mgm-455v
Fixed in: 26.1

pip handles concatenated tar and ZIP files as ZIP files regardless of filename or whether a file is both a tar and ZIP file. This behavior could result in confusing installation behavior, such as inst

4. pip 24.0

CVE: GHSA-jp4c-xjxw-mgf9
Fixed in: 26.1

pip prior to version 26.1 would run self-update check functionality after installing wheel files which required importing well-known Python modules names. These module imports were intentionally defer

Powered by Codity.ai · Docs

codity-ai · 2026-05-25T11:19:02Z

License Compliance Scan

Metric	Value
Packages Scanned	778
High Risk (Strong Copyleft)	0
Medium Risk (Weak Copyleft)	6
Low Risk (Permissive)	618
Unknown License	154

Weak copyleft licenses found - verify compatibility

Some packages have unknown licenses - manual review required

Medium Risk Licenses - 6 packages

(MPL-2.0 OR Apache-2.0) (1 packages):

dompurify 3.4.0

MPL-2.0 (5 packages):

github.com/philippgille/chromem-go 0.7.0
github.com/libp2p/go-yamux/v5 5.0.1
github.com/hashicorp/golang-lru/v2 2.0.7
github.com/hashicorp/golang-lru 1.0.2
github.com/shoenig/go-m1cpu 0.1.6

Unknown Licenses - 154 packages

github.com/emirpasic/gods 1.18.1
github.com/cyphar/filepath-securejoin 0.5.1
github.com/dslipak/pdf 0.0.2
github.com/go-git/go-billy/v5 5.6.2
github.com/go-git/gcfg 1.5.1-0.20230307220236-3a3c6141e376
github.com/go-git/go-git/v5 5.16.4
github.com/eritikass/githubmarkdownconvertergo 0.1.10
github.com/gocolly/colly 1.2.0
github.com/golang/protobuf 1.5.4
github.com/google/go-github/v69 69.2.0
github.com/google/go-querystring 1.1.0
github.com/kennygrant/sanitize 1.2.4
github.com/mschoch/smat 0.2.0
github.com/mudler/skillserver 0.0.6
github.com/olekukonko/tablewriter 0.0.5
github.com/pjbgf/sha1cd 0.3.2
github.com/saintfish/chardet 0.0.0-20230101081208-5e3ef4b5456d
github.com/segmentio/asm 1.1.3
github.com/sergi/go-diff 1.4.0
go.mau.fi/util 0.3.0

...and 134 more

Powered by Codity.ai · Docs

codity-ai · 2026-05-25T11:19:23Z

Code Quality Report — test-org-codity/LocalAI · PR #7

Scanned: 2026-05-25 11:19 UTC | Score: 14/100 | Provider: github

Executive Summary

Severity	Count
Critical	0
High	1
Medium	4
Low	211

Top Findings

[CQ-LLM-003] `core/application/upgrade_checker.go:146` (Error_Handling · HIGH)

Issue: Potentially swallowing errors when backendManagerFn is nil, leading to unhandled cases.
Suggestion: Add error handling to ensure that if backendManagerFn is nil, an appropriate error is logged or handled.

if uc.backendManagerFn != nil {

[CQ-LLM-002] `core/application/distributed.go:242` (Documentation · MEDIUM)

Issue: Missing detailed documentation for the new parameters in the ReplicaReconciler.
Suggestion: Add JSDoc comments for the new parameters to clarify their purpose.

// Create ReplicaReconciler for auto-scaling model replicas. Adapter +

[CQ-LLM-001] `core/application/distributed.go:243` (Complexity · MEDIUM)

Issue: The function initDistributed has increased cyclomatic complexity due to the addition of multiple parameters and logic.
Suggestion: Consider breaking down the function into smaller functions to reduce complexity.

func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB) (*Distribut

[CQ-LLM-006] `core/application/startup.go:236` (Testability · MEDIUM)

Issue: The use of a hard-coded function for backend manager may hinder testability.
Suggestion: Consider using dependency injection for the backend manager to improve testability.

bmFn := func() galleryop.BackendManager { return application.GalleryService().BackendManager() }

[CQ-LLM-004] `core/application/upgrade_checker.go:158` (Performance · MEDIUM)

Issue: The runCheck function may lead to performance issues due to the nested calls to CheckUpgrades.
Suggestion: Consider optimizing the logic to avoid unnecessary calls to CheckUpgrades.

if upgrades == nil && err == nil {

[CQ-008] `core/application/distributed.go:256` (Maintainability · LOW)

Issue: Magic number 30 in code
Suggestion: Extract to a named constant

Interval:          30 * time.Second,

[CQ-009] `core/application/upgrade_checker.go:53` (Style · LOW)

Issue: Line exceeds 120 characters (163 chars)
Suggestion: Break long lines into multiple lines for readability

func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB, backendManagerFn func() ...

[CQ-008] `core/application/upgrade_checker.go:81` (Maintainability · LOW)

Issue: Magic number 10 in code
Suggestion: Extract to a named constant

initialDelay := 10 * time.Second

[CQ-LLM-005] `core/gallery/upgrade.go:25` (Maintainability · LOW)

Issue: The comment regarding NodeDrift could be clearer and more concise.
Suggestion: Refine the comment to improve clarity and maintainability.

// NodeDrift lists nodes whose installed version or digest differs from

[CQ-009] `core/gallery/upgrade.go:59` (Style · LOW)

Issue: Line exceeds 120 characters (175 chars)
Suggestion: Break long lines into multiple lines for readability

func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBac...

Per-File Breakdown

File	High	Medium	Low	Total
`core/application/distributed.go`	0	2	1	3
`core/application/startup.go`	0	1	0	1
`core/application/upgrade_checker.go`	1	1	2	4
`core/gallery/upgrade.go`	0	0	2	2
`core/http/react-ui/src/App.css`	0	0	14	14
`core/http/react-ui/src/components/FilterBar.jsx`	0	0	1	1
`core/http/react-ui/src/components/NodeDistributionChip.jsx`	0	0	11	11
`core/http/react-ui/src/components/Popover.jsx`	0	0	2	2
`core/http/react-ui/src/pages/Manage.jsx`	0	0	122	122
`core/http/react-ui/src/pages/Nodes.jsx`	0	0	36	36
`core/services/advisorylock/keys.go`	0	0	1	1
`core/services/nodes/managers_distributed.go`	0	0	1	1
`core/services/nodes/reconciler.go`	0	0	2	2
`core/services/nodes/reconciler_test.go`	0	0	9	9
`core/services/nodes/registry.go`	0	0	7	7

Recommendations

Resolve High severity issues, especially error handling gaps and performance bottlenecks.

Run automated tests after applying fixes to verify no regressions.

mudler added 8 commits April 19, 2026 08:03

codity-dm Bot reviewed May 20, 2026

View reviewed changes

greptile-apps Bot reviewed May 20, 2026

View reviewed changes

codity-dm Bot reviewed May 20, 2026

View reviewed changes

codity-dm Bot reviewed May 24, 2026

View reviewed changes

	<strong>Installed on {total} node{total === 1 ? '' : 's'}</strong>
	<strong>{context === 'models' ? 'Loaded on ' : 'Installed on '}{total} node{total === 1 ? '' : 's'}</strong>

	initialDelay := 10 * time.Second
	initialDelay := 30 * time.Second

		{list.map(n => (
		<tr key={n.node_id ?? n.NodeID ?? getName(n)}>

		func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) {
		galleryBackends, err := AvailableBackends(galleries, systemState)

	onClick={() => onFilterChange(f.key)}
	onClick={() => onFilterChange?.(f.key)}

Conversation

DhirenMhatre commented May 20, 2026

Uh oh!

codity-dm Bot commented May 20, 2026

Policy Check Failed

Uh oh!

codity-dm Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

What Changed

Key Changes by Area

Files Changed

Review Focus Areas

Architecture

Uh oh!

codity-dm Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codity-dm Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codity-dm Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codity-dm Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codity-dm Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codity-dm Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Scan Summary

Scan completed in 42.6s

Uh oh!

codity-dm Bot commented May 20, 2026

Dependency vulnerability scanning

Uh oh!

codity-dm Bot commented May 20, 2026

Dependency vulnerability scanning

Uh oh!

greptile-apps Bot commented May 20, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

DhirenMhatre commented May 20, 2026

Uh oh!

codity-dm Bot commented May 20, 2026

Policy Check Failed

Uh oh!

codity-dm Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

What Changed

Key Changes by Area

Files Changed

Review Focus Areas

Architecture

Merge Status

Uh oh!

codity-dm Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codity-dm Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

codity-dm Bot commented May 20, 2026 •

edited

Loading

codity-dm Bot commented May 20, 2026 •

edited

Loading

codity-dm Bot commented May 20, 2026 •

edited

Loading

codity-dm Bot commented May 20, 2026 •

edited

Loading

[CQ-LLM-002] `core/application/distributed.go:242` (Documentation · MEDIUM)

[CQ-LLM-001] `core/application/distributed.go:243` (Complexity · MEDIUM)

[CQ-LLM-005] `core/application/upgrade_checker.go:40` (Testability · MEDIUM)

[CQ-LLM-003] `core/application/upgrade_checker.go:146` (Error_Handling · MEDIUM)

[CQ-LLM-004] `core/application/upgrade_checker.go:158` (Performance · MEDIUM)

[CQ-008] `core/application/distributed.go:256` (Maintainability · LOW)

[CQ-009] `core/application/upgrade_checker.go:53` (Style · LOW)

[CQ-008] `core/application/upgrade_checker.go:81` (Maintainability · LOW)

[CQ-LLM-006] `core/gallery/upgrade.go:25` (Maintainability · LOW)

[CQ-009] `core/gallery/upgrade.go:59` (Style · LOW)

codity-dm Bot commented May 20, 2026 •

edited

Loading