Distributed ux polish#7
Conversation
Before this change `DistributedBackendManager.CheckUpgrades` delegated to the local manager, which read backends from the frontend filesystem. In distributed deployments the frontend has no backends installed locally — they live on workers — so the upgrade-detection loop never ran and the UI silently never surfaced upgrades even when the gallery advertised newer versions or digests. Worker-side: NATS backend.list reply now carries Version, URI and Digest for each installed backend (read from metadata.json). Frontend-side: DistributedBackendManager.ListBackends aggregates per-node refs (name, status, version, digest) instead of deduping, and CheckUpgrades feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint factored out of CheckBackendUpgrades so both paths share the same core logic. Cluster drift policy: when per-node version/digest tuples disagree, the backend is flagged upgradeable regardless of whether any single node matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so operators can see *why* it is out of sync. The next upgrade-all realigns the cluster. Tests cover: drift detection, unanimous-match (no upgrade), and the empty-installed-version path that the old distributed code silently missed.
The System page (Manage.jsx) only showed updates as a tiny inline arrow, so operators routinely missed them. Port the Backend Gallery's upgrade UX so System speaks the same visual language: - Yellow banner at the top of the Backends tab when upgrades are pending, with an "Upgrade all" button (serial fan-out, matches the gallery) and a "Updates only" filter toggle. - Warning pill (↑ N) next to the tab label so the count is glanceable even when the banner is scrolled out of view. - Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button that silently flipped semantics between Reinstall and Upgrade), plus an "Update available" badge in the new Version column. - New columns: Version (with upgrade + drift chips), Nodes (per-node attribution badges for distributed mode, degrading to a compact "on N nodes · M offline" chip above three nodes), Installed (relative time). - System backends render a "Protected" chip instead of a bare "—" so rows still align and the reason is obvious. - Delete uses the softer btn-danger-ghost so rows don't scream red; the ConfirmDialog still owns the "are you sure". The upgrade checker also needed the same per-worker fix as the previous commit: NewUpgradeChecker now takes a BackendManager getter so its periodic runs call the distributed CheckUpgrades (which asks workers) instead of the empty frontend filesystem. Without this the /api/backends/ upgrades endpoint stayed empty in distributed mode even with the protocol change in place. New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack, .cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in App.css so other pages can adopt them without duplicating styles.
The Nodes page was the biggest visual liability in distributed mode. Rework the main dashboard surfaces in place without changing behavior: StatCards: uniform height (96px min), left accent bar colored by the metric's semantic (success/warning/error/primary), icon lives in a 36x36 soft-tinted chip top-right, value is left-aligned and large. Grid auto-fills so the row doesn't collapse on narrow viewports. This replaces the previous thin-bordered boxes with inconsistent heights. Table rows: expandable rows now show a chevron cue on the left (rotates on expand) so users know rows open. Status cell became a dedicated chip with an LED-style halo dot instead of a bare bullet. Action buttons gained labels — "Approve", "Resume", "Drain" — so the icons aren't doing all the semantic work; the destructive remove action uses the softer btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog still owning the real "are you sure". Applied cell-mono/cell-muted utility classes so label chips and addresses share one spacing/font grammar instead of re-declaring inline styles everywhere. Expanded drawer: empty states for Loaded Models and Installed Backends now render as a proper drawer-empty card (dashed border, icon, one-line hint) instead of a plain muted string that read like broken formatting. Tabs: three inline-styled buttons became the shared .tab class so they inherit focus ring, hover state, and the rest of the design system — matches the System page. "Add more workers" toggle turned into a .nodes-add-worker dashed-border button labelled "Register a new worker" (action voice) instead of a chevron + muted link that operators kept mistaking for broken text. New shared CSS primitives carry over to other pages: .stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty, .nodes-add-worker.
Two connected problems handled together:
1) Backend delete/install/upgrade used to silently skip non-healthy nodes,
so a delete during an outage left a zombie on the offline node once it
returned. The fan-out now records intent in a new pending_backend_ops
table before attempting the NATS round-trip. Currently-healthy nodes
get an immediate attempt; everyone else is queued. Unique index on
(node_id, backend, op) means reissuing the same operation refreshes
next_retry_at instead of stacking duplicates.
2) Loaded-model state could drift from reality: a worker OOM'd, got
killed, or restarted a backend process would leave a node_models row
claiming the model was still loaded, feeding ghost entries into the
/api/nodes/models listing and the router's scheduling decisions.
The existing ReplicaReconciler gains two new passes that run under a
fresh KeyStateReconciler advisory lock (non-blocking, so one wedged
frontend doesn't freeze the cluster):
- drainPendingBackendOps: retries queued ops whose next_retry_at has
passed on currently-healthy nodes. Success deletes the row; failure
bumps attempts and pushes next_retry_at out with exponential backoff
(30s → 15m cap). ErrNoResponders also marks the node unhealthy.
- probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are
loaded but hasn't seen touched in the last probeStaleAfter (2m).
Unreachable addresses are removed from the registry. A pluggable
ModelProber lets tests substitute a fake without standing up gRPC.
DistributedBackendManager exposes DeleteBackendDetailed so the HTTP
handler can surface per-node outcomes ("2 succeeded, 1 queued") to the
UI in a follow-up commit; the existing DeleteBackend still returns
error-only for callers that don't care about node breakdown.
Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx
on a new key so N frontends coordinate — the same pattern the health
monitor and replica reconciler already rely on. Single-node mode runs
both passes inline (adapter is nil, state drain is a no-op).
Tests cover the upsert semantics, backoff math, the probe removing an
unreachable model but keeping a reachable one, and filtering by
probeStaleAfter.
When a frontend restarted in distributed mode, models that workers had already loaded weren't visible until the operator clicked into each node manually — the /api/models/capabilities endpoint only knew about configs on the frontend's filesystem, not the registry-backed truth. /api/models/capabilities now joins in ListAllLoadedModels() when the registry is active, returning loaded_on[] with node id/name/state/status for each model. Models that live in the registry but lack a local config (the actual ghosts, not recovered from the frontend's file cache) still surface with source="registry-only" so operators can see and persist them; without that emission they'd be invisible to this frontend. Manage → Models replaces the old Running/Idle pill with a distribution cell that lists the first three nodes the model is loaded on as chips colored by state (green loaded, blue loading, amber anything else). On wider clusters the remaining count collapses into a +N chip with a title-attribute breakdown. Disabled / single-node behavior unchanged. Adopted models get an extra "Adopted" ghost-icon chip with hover copy explaining what it means and how to make it permanent. Distributed mode also enables a 10s auto-refresh and a "Last synced Xs ago" indicator next to the Update button so ghost rows drop off within one reconcile tick after their owning process dies. Non-distributed mode is untouched — no polling, no cell-stack, same old Running/Idle.
Large clusters were going to break the Manage → Backends Nodes column:
the old inline logic rendered every node as a badge and would shred the
layout at >10 workers, plus the Manage → Models distribution cell had
copy-pasted its own slightly-different version.
NodeDistributionChip handles any cluster size with two render modes:
- small (≤3 nodes): inline chips of node names, colored by health.
- large: a single "on N nodes · M offline · K drift" summary chip;
clicking opens a Popover with a per-node table (name, status,
version, digest for backends; name, status, state for models).
Drift counting mirrors the backend's summarizeNodeDrift so the UI
number matches UpgradeInfo.NodeDrift. Digests are truncated to the
docker-style 12-char form with the full value preserved in the title.
Popover is a new general-purpose primitive: fixed positioning anchored
to the trigger, flips above when there's no room below, closes on
outside-click or Escape, returns focus to the trigger. Uses .card as
its surface so theming is inherited. Also useful for a future
labels-editor popup and the user menu.
Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell
and uses the shared chip with context="backends" / "models"
respectively. Delete code removes ~40 lines of ad-hoc logic.
The Backends gallery had a nice search + chip + toggle strip; the System page had nothing, so the two surfaces felt like different apps. Lift the pattern into a reusable FilterBar and wire both System tabs through it. New component core/http/react-ui/src/components/FilterBar.jsx renders a search input, a role="tablist" chip row (aria-selected for a11y), and optional toggles / right slot. Chips support an optional `count` which the System page uses to show "User 3", "Updates 1" etc. System Models tab: search by id or backend; chips for All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in distributed mode. "Last synced" + Update button live in the right slot. System Backends tab: search by name/alias/meta-backend-for; chips for All/User/System/Meta plus conditional Updates / Offline-nodes chips when relevant. The old ad-hoc "Updates only" toggle from the upgrade banner folded into the Updates chip — one source of truth for that filter. Offline chip only appears in distributed mode when at least one backend has an unhealthy node, so the chip row stays quiet on healthy clusters. Filter state persists in URL query params (mq/mf/bq/bf) so deep links and tab switches keep the operator's filter context instead of resetting every time. Also adds an "Adopted" distribution path: when a model in /api/models/capabilities carries source="registry-only" (discovered on a worker but not configured locally), the Models tab shows a ghost chip labelled "Adopted" with hover copy explaining how to persist it — this is what closes the loop on the ghost-model story end-to-end.
pending_backend_ops rows targeting agent-type workers looped forever: the reconciler fan-out hit a NATS subject the worker doesn't subscribe to, returned ErrNoResponders, we marked the node unhealthy, and the health monitor flipped it back to healthy on the next heartbeat. Next tick, same row, same failure. Three related fixes: 1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend. Agent workers handle agent NATS subjects, not backend.install / delete / list, so enqueueing for them guarantees an infinite retry loop. Silent skip is correct — they aren't consumers of these ops. 2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on nats.ErrNoResponders: mark the node unhealthy before recording the failure, so subsequent ListDuePendingBackendOps (filters by status=healthy) stops picking the row until the node actually recovers. Matches the synchronous fan-out path. 3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of exponential backoff the row is a poison message; further retries just thrash NATS. Row is deleted and logged at ERROR so it stays visible without staying infinite. Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows that target agent-type nodes, non-existent nodes, or carry an empty backend name. Guarded by the same schema-migration advisory lock so only one instance performs it. The guards above prevent new rows of this shape; this closes the migration gap for existing ones. Tests: the prune migration (valid row stays, agent + empty-name rows drop) on top of existing upsert / backoff coverage.
Policy Check Failed✗ 3/3 policy checks failed: • Need 2 more approval(s) (0/2) — comment LGTM or approve via review To merge this PR:
|
PR SummaryWhat Changed
Key Changes by AreaDistributed Backend Management
Upgrade Detection
React UI
Data Model
Files Changed
Review Focus Areas
ArchitectureDesign Decisions
Scalability & Extensibility
Risks
|
| if (showOnlyUpgradable && backendsFilter !== 'upgradable') { | ||
| // One-shot reconciliation — the old state becomes the new chip. | ||
| setBackendsFilter('upgradable') | ||
| setShowOnlyUpgradable(false) |
There was a problem hiding this comment.
Calling state setters during render here can trigger React re-render loops, so move this reconciliation into a useEffect or derive the filter without mutating state in render.
Suggested fix
useEffect(() => {
if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
setBackendsFilter('upgradable')
setShowOnlyUpgradable(false)
}
}, [showOnlyUpgradable, backendsFilter])Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/pages/Manage.jsx
Lines: 642-645
Issue Type: functional-high
Severity: high
Issue Description:
Calling state setters during render here can trigger React re-render loops, so move this reconciliation into a useEffect or derive the filter without mutating state in render.
Current Code:
if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
// One-shot reconciliation — the old state becomes the new chip.
setBackendsFilter('upgradable')
setShowOnlyUpgradable(false)
}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool { | ||
| client := grpcclient.NewClientWithToken(address, false, nil, false, g.token) | ||
| probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second) | ||
| defer cancel() | ||
| ok, _ := client.HealthCheck(probeCtx) | ||
| return ok |
There was a problem hiding this comment.
The liveness probe trusts the persisted model address without validation, so a compromised or stale node_models row can trigger server-side requests to arbitrary internal endpoints; restrict probes to the owning node's registered host or validate the address before dialing.
Suggested fix
func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool {
if !isAllowedProbeAddress(address) {
return false
}
client := grpcclient.NewClientWithToken(address, false, nil, false, g.token)
probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second)
defer cancel()
ok, _ := client.HealthCheck(probeCtx)
return ok
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/reconciler.go
Lines: 28-33
Issue Type: functional-high
Severity: high
Issue Description:
The liveness probe trusts the persisted model address without validation, so a compromised or stale node_models row can trigger server-side requests to arbitrary internal endpoints; restrict probes to the owning node's registered host or validate the address before dialing.
Current Code:
func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool {
client := grpcclient.NewClientWithToken(address, false, nil, false, g.token)
probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second)
defer cancel()
ok, _ := client.HealthCheck(probeCtx)
return ok
}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| <button | ||
| key={f.key} | ||
| role="tab" | ||
| aria-selected={activeFilter === f.key} | ||
| className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`} | ||
| onClick={() => onFilterChange(f.key)} | ||
| > |
There was a problem hiding this comment.
The tab buttons are missing a type="button" attribute, so embedding this component inside a form will submit the form on filter clicks; set an explicit button type.
Suggested fix
<button
key={f.key}
type="button"
role="tab"
aria-selected={activeFilter === f.key}
className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
onClick={() => onFilterChange(f.key)}
>Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/components/FilterBar.jsx
Lines: 54-60
Issue Type: robustness-medium
Severity: medium
Issue Description:
The tab buttons are missing a `type="button"` attribute, so embedding this component inside a form will submit the form on filter clicks; set an explicit button type.
Current Code:
<button
key={f.key}
role="tab"
aria-selected={activeFilter === f.key}
className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
onClick={() => onFilterChange(f.key)}
>
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| ariaLabel={context === 'models' ? 'Model distribution' : 'Backend distribution'} | ||
| > | ||
| <div className="popover__header"> | ||
| <strong>Installed on {total} node{total === 1 ? '' : 's'}</strong> |
There was a problem hiding this comment.
The popover header always says "Installed on" even for models where the component is documenting loaded state, so use a context-specific label.
| <strong>Installed on {total} node{total === 1 ? '' : 's'}</strong> | |
| <strong>{context === 'models' ? 'Loaded on ' : 'Installed on '}{total} node{total === 1 ? '' : 's'}</strong> |
Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/components/NodeDistributionChip.jsx
Lines: 97-97
Issue Type: functional-medium
Severity: medium
Issue Description:
The popover header always says "Installed on" even for models where the component is documenting loaded state, so use a context-specific label.
Current Code:
<strong>Installed on {total} node{total === 1 ? '' : 's'}</strong>
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| default: | ||
| xlog.Warn("Reconciler: unknown pending op", "op", op.Op, "id", op.ID) | ||
| continue |
There was a problem hiding this comment.
Unknown pending op types are skipped forever without updating attempts or deleting the row, so a malformed queued operation can persist indefinitely; dead-letter or mark failure for unsupported op values.
Suggested fix
default:
xlog.Warn("Reconciler: unknown pending op", "op", op.Op, "id", op.ID)
_ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, fmt.Sprintf("unsupported op: %s", op.Op))
if op.Attempts+1 >= maxPendingBackendOpAttempts {
_ = rc.registry.DeletePendingBackendOp(ctx, op.ID)
}
continuePrompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/reconciler.go
Lines: 197-199
Issue Type: security-medium
Severity: medium
Issue Description:
Unknown pending op types are skipped forever without updating attempts or deleting the row, so a malformed queued operation can persist indefinitely; dead-letter or mark failure for unsupported op values.
Current Code:
default:
xlog.Warn("Reconciler: unknown pending op", "op", op.Op, "id", op.ID)
continue
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
Security Scan Summary
No critical security issues detected Scan completed in 42.6sSecurity scan powered by Codity.ai |
Dependency vulnerability scanning
View vulnerability details (6 items)1. brace-expansion various CVE: GHSA-f886-m6hf-6m8v
2. express-rate-limit various CVE: N/A
3. fast-uri various CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
4. hono various CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
5. ip-address various CVE: GHSA-v2v4-37r5-5v8g
6. postcss various CVE: GHSA-qx2v-qp2m-jg93
Powered by Codity.ai · Docs |
Dependency vulnerability scanning
View vulnerability details (4 items)1. pip 24.0 CVE: GHSA-4xh5-x5gv-qwph
2. pip 24.0 CVE: GHSA-6vgw-5pg2-w6jp
3. pip 24.0 CVE: GHSA-58qw-9mgm-455v
4. pip 24.0 CVE: GHSA-jp4c-xjxw-mgf9
Powered by Codity.ai · Docs |
Greptile SummaryThis PR polishes the distributed mode UX across the backend: it fixes the long-standing bug where upgrade detection always read the empty frontend filesystem instead of aggregating from workers, introduces a durable
Confidence Score: 3/5The Go backend changes are solid and well-tested; the frontend Upgrade all action unconditionally shows a success notice even when every upgrade call fails, which will mislead operators on a broken cluster. The core distributed infrastructure work is carefully implemented with good test coverage. The frontend misinforms the operator when all backends fail to upgrade — a user who clicks Upgrade all and sees only error toasts followed immediately by a success banner has no reliable signal about the true cluster state. core/http/react-ui/src/pages/Manage.jsx — handleUpgradeAll success-toast logic needs fixing before this ships to operators. Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as React UI
participant API as ui_api.go
participant DBM as DistributedBackendManager
participant Reg as NodeRegistry (DB)
participant Rec as ReplicaReconciler
participant W as Worker Node (NATS)
UI->>API: POST /api/backends/upgrade/:name
API->>DBM: UpgradeBackend(ctx, name)
DBM->>Reg: UpsertPendingBackendOp(nodeID, name, upgrade)
alt node is healthy
DBM->>W: InstallBackend (NATS RPC)
W-->>DBM: reply.Success
DBM->>Reg: deletePendingRow(nodeID, name, upgrade)
else node is offline/draining
DBM-->>API: "status=queued"
Note over Rec,W: Reconciler picks up later
end
loop Every 30s
Rec->>Reg: ListDuePendingBackendOps()
Reg-->>Rec: due ops for healthy nodes
Rec->>W: InstallBackend / DeleteBackend (NATS)
alt success
Rec->>Reg: DeletePendingBackendOp(id)
else failure
Rec->>Reg: RecordPendingBackendOpFailure(id, err)
Note over Rec: exponential backoff, dead-letter at 10 attempts
end
Rec->>Reg: query stale loaded model rows
Rec->>W: gRPC HealthCheck (probe)
alt unreachable
Rec->>Reg: RemoveNodeModel(nodeID, modelName)
end
end
UI->>API: GET /api/system/models
API->>Reg: ListAllLoadedModels + List nodes
API-->>UI: modelCapability with LoadedOn and Source
Reviews (1): Last reviewed commit: "fix(distributed): stop queue loops on ag..." | Re-trigger Greptile |
| for (const name of names) { | ||
| try { | ||
| await backendsApi.upgrade(name) | ||
| } catch (err) { | ||
| addToast(`Upgrade failed for ${name}: ${err.message}`, 'error') | ||
| } | ||
| } | ||
| addToast(`Upgrade started for ${names.length} backend${names.length === 1 ? '' : 's'}`, 'info') |
There was a problem hiding this comment.
Success toast always fires regardless of upgrade outcome. Every call to
backendsApi.upgrade is individually caught, so the outer try block never throws. The "Upgrade started for N backends" toast fires even when every individual upgrade failed — an operator who sees N error toasts followed by a success notice will be misled into thinking work is in progress when none of it is.
| for (const name of names) { | |
| try { | |
| await backendsApi.upgrade(name) | |
| } catch (err) { | |
| addToast(`Upgrade failed for ${name}: ${err.message}`, 'error') | |
| } | |
| } | |
| addToast(`Upgrade started for ${names.length} backend${names.length === 1 ? '' : 's'}`, 'info') | |
| let succeeded = 0 | |
| for (const name of names) { | |
| try { | |
| await backendsApi.upgrade(name) | |
| succeeded++ | |
| } catch (err) { | |
| addToast(`Upgrade failed for ${name}: ${err.message}`, 'error') | |
| } | |
| } | |
| if (succeeded > 0) { | |
| addToast(`Upgrade started for ${succeeded} backend${succeeded === 1 ? '' : 's'}`, 'info') | |
| } |
| // Legacy "showOnlyUpgradable" toggle is now the 'upgradable' chip — | ||
| // keep backward-compat by mapping it onto the new filter. | ||
| if (showOnlyUpgradable && backendsFilter !== 'upgradable') { | ||
| // One-shot reconciliation — the old state becomes the new chip. | ||
| setBackendsFilter('upgradable') | ||
| setShowOnlyUpgradable(false) | ||
| } |
There was a problem hiding this comment.
Dead state and unreachable migration block.
showOnlyUpgradable is initialised to false and setShowOnlyUpgradable(true) is never called anywhere in the component, so this compatibility shim can never run. The state variable and this block can be removed entirely.
| // Legacy "showOnlyUpgradable" toggle is now the 'upgradable' chip — | |
| // keep backward-compat by mapping it onto the new filter. | |
| if (showOnlyUpgradable && backendsFilter !== 'upgradable') { | |
| // One-shot reconciliation — the old state becomes the new chip. | |
| setBackendsFilter('upgradable') | |
| setShowOnlyUpgradable(false) | |
| } | |
| // (showOnlyUpgradable legacy migration removed — state never set) |
| if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil { | ||
| xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err) | ||
| result.Nodes = append(result.Nodes, NodeOpStatus{ | ||
| NodeID: node.ID, NodeName: node.Name, Status: "error", | ||
| Error: fmt.Sprintf("enqueue failed: %v", err), | ||
| }) | ||
| continue | ||
| } | ||
|
|
||
| if node.Status != StatusHealthy { | ||
| // Intent is recorded; reconciler will retry when the node recovers. | ||
| result.Nodes = append(result.Nodes, NodeOpStatus{ | ||
| NodeID: node.ID, NodeName: node.Name, Status: "queued", | ||
| Error: fmt.Sprintf("node %s, will retry when healthy", node.Status), | ||
| }) | ||
| continue | ||
| } | ||
| if _, delErr := d.adapter.DeleteBackend(node.ID, name); delErr != nil { | ||
| if errors.Is(delErr, nats.ErrNoResponders) { | ||
| // Node's NATS subscription is gone — likely restarted with a new ID. | ||
| // Mark it unhealthy so future fan-outs skip it. | ||
| xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID) | ||
| d.registry.MarkUnhealthy(context.Background(), node.ID) | ||
| continue | ||
|
|
||
| applyErr := apply(node) | ||
| if applyErr == nil { | ||
| // Find the row we just upserted and delete it; cheap but requires | ||
| // a lookup since UpsertPendingBackendOp doesn't return the ID. | ||
| if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil { | ||
| xlog.Debug("Failed to clear pending backend op after success", "error", err) | ||
| } | ||
| xlog.Warn("Failed to propagate backend deletion to worker", "node", node.Name, "backend", name, "error", delErr) | ||
| errs = append(errs, fmt.Errorf("node %s: %w", node.Name, delErr)) | ||
| result.Nodes = append(result.Nodes, NodeOpStatus{ | ||
| NodeID: node.ID, NodeName: node.Name, Status: "success", | ||
| }) | ||
| continue |
There was a problem hiding this comment.
Reconciler can race the initial fan-out on healthy nodes.
UpsertPendingBackendOp sets NextRetryAt = time.Now(), so the row is immediately eligible for ListDuePendingBackendOps. If a reconciler tick fires in the ≤30 s window between the upsert and the successful apply/deletePendingRow call, both the direct fan-out path and drainPendingBackendOps will call the same NATS subject for the same node concurrently. For delete the second call may see a "not found" error that triggers unnecessary retries. Setting NextRetryAt ahead (e.g., time.Now().Add(interval + 5*time.Second)) would close the window.
| // Initial delay: don't slow down startup. Short enough that operators | ||
| // don't stare at an empty upgrade banner for long; long enough that | ||
| // workers have registered and reported their installed backends. | ||
| initialDelay := 10 * time.Second |
There was a problem hiding this comment.
10 s initial delay is likely too short for a distributed cluster. Worker registration, NATS subscription setup, and the initial
ListBackends fan-out typically take longer than 10 s on any real cluster. Running CheckUpgrades before workers are visible means the upgrade banner silently shows nothing until the next scheduled check (6 h later). The old 30 s value was closer to the right order of magnitude.
| initialDelay := 10 * time.Second | |
| initialDelay := 30 * time.Second |
|
@codity review |
Policy Check Failed✗ 3/3 policy checks failed: • Need 2 more approval(s) (0/2) — comment LGTM or approve via review To merge this PR:
|
PR SummaryWhat Changed
Key Changes by AreaBackend Lifecycle
Upgrade & Drift Detection
UI Components
API & Registry
Files Changed
Review Focus Areas
ArchitectureDesign Decisions
Scalability & Extensibility
Risks
Merge StatusNOT MERGEABLE — PR Score 23/100, below threshold (50)
|
| if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil { | ||
| xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err) | ||
| continue | ||
| } | ||
| xlog.Warn("Reconciler: model unreachable, removed from registry", |
There was a problem hiding this comment.
The code removes the registry row before unloading the worker, so unload failures leave an unmanaged loaded backend; unload first or restore the row on failure.
Suggested fix
if err := rc.unloader.UnloadModelOnNode(m.NodeID, m.ModelName); err != nil {
xlog.Warn("Reconciler: failed to unload unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
continue
}
if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil {
xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
continue
}
xlog.Warn("Reconciler: model unreachable, removed from registry",
"node", m.NodeID, "model", m.ModelName, "address", m.Address)Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/reconciler.go
Lines: 279-283
Issue Type: robustness-high
Severity: high
Issue Description:
The code removes the registry row before unloading the worker, so unload failures leave an unmanaged loaded backend; unload first or restore the row on failure.
Current Code:
if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil {
xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
continue
}
xlog.Warn("Reconciler: model unreachable, removed from registry",
"node", m.NodeID, "model", m.ModelName, "address", m.Address)
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| ctx := context.Background() | ||
| _, err := d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error { | ||
| _, err := d.adapter.DeleteBackend(node.ID, name) | ||
| return err | ||
| }) | ||
| return err |
There was a problem hiding this comment.
DeleteBackend uses context.Background instead of a request-scoped context, so cancellation and deadlines from the caller are ignored; pass a caller-provided context through this path.
Suggested fix
func (d *DistributedBackendManager) DeleteBackend(ctx context.Context, name string) error {
if err := d.local.DeleteBackend(name); err != nil {
if !errors.Is(err, gallery.ErrBackendNotFound) {
return err
}
xlog.Debug("Backend not found locally, will attempt deletion on workers", "backend", name)
}
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error {
_, err := d.adapter.DeleteBackend(node.ID, name)
return err
})
return err
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/managers_distributed.go
Lines: 200-205
Issue Type: security-medium
Severity: medium
Issue Description:
DeleteBackend uses context.Background instead of a request-scoped context, so cancellation and deadlines from the caller are ignored; pass a caller-provided context through this path.
Current Code:
ctx := context.Background()
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error {
_, err := d.adapter.DeleteBackend(node.ID, name)
return err
})
return err
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
Security Scan Summary
No critical security issues detected Scan completed in 46.8sSecurity scan powered by Codity.ai |
Dependency vulnerability scanning
View vulnerability details (6 items)1. brace-expansion various CVE: GHSA-f886-m6hf-6m8v
2. express-rate-limit various CVE: N/A
3. fast-uri various CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
4. hono various CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
5. ip-address various CVE: GHSA-v2v4-37r5-5v8g
6. postcss various CVE: GHSA-qx2v-qp2m-jg93
Powered by Codity.ai · Docs |
Dependency vulnerability scanning
View vulnerability details (4 items)1. pip 24.0 CVE: GHSA-4xh5-x5gv-qwph
2. pip 24.0 CVE: GHSA-6vgw-5pg2-w6jp
3. pip 24.0 CVE: GHSA-58qw-9mgm-455v
4. pip 24.0 CVE: GHSA-jp4c-xjxw-mgf9
Powered by Codity.ai · Docs |
License Compliance Scan
Weak copyleft licenses found - verify compatibility Some packages have unknown licenses - manual review required Medium Risk Licenses - 6 packages(MPL-2.0 OR Apache-2.0) (1 packages):
MPL-2.0 (5 packages):
Unknown Licenses - 126 packages
...and 106 more Powered by Codity.ai · Docs |
Code Quality Report — test-org-codity/LocalAI · PR #7Scanned: 2026-05-20 12:14 UTC | Score: 15/100 | Provider: github Executive Summary
Top Findings[CQ-LLM-002]
|
| File | Critical | High | Medium | Low | Total |
|---|---|---|---|---|---|
core/application/distributed.go |
0 | 0 | 2 | 1 | 3 |
core/application/upgrade_checker.go |
0 | 0 | 3 | 2 | 5 |
core/gallery/upgrade.go |
0 | 0 | 0 | 2 | 2 |
core/http/react-ui/src/App.css |
0 | 0 | 0 | 14 | 14 |
core/http/react-ui/src/components/FilterBar.jsx |
0 | 0 | 0 | 1 | 1 |
core/http/react-ui/src/components/NodeDistributionChip.jsx |
0 | 0 | 0 | 11 | 11 |
core/http/react-ui/src/components/Popover.jsx |
0 | 0 | 0 | 2 | 2 |
core/http/react-ui/src/pages/Manage.jsx |
0 | 0 | 0 | 122 | 122 |
core/http/react-ui/src/pages/Nodes.jsx |
0 | 0 | 0 | 36 | 36 |
core/services/advisorylock/keys.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/managers_distributed.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/reconciler.go |
0 | 0 | 0 | 2 | 2 |
core/services/nodes/reconciler_test.go |
0 | 0 | 0 | 9 | 9 |
core/services/nodes/registry.go |
0 | 0 | 0 | 7 | 7 |
Recommendations
- Run automated tests after applying fixes to verify no regressions.
|
@codity review |
Policy Check Failed✗ 3/3 policy checks failed: • Need 2 more approval(s) (0/2) — comment LGTM or approve via review To merge this PR:
|
PR SummaryWhat Changed
Key Changes by AreaDistributed Backend Management
Cluster Upgrade Detection
State Reconciliation
UI/UX
Files Changed
Review Focus Areas
ArchitectureDesign Decisions
Scalability & Extensibility
Risks
Merge StatusNOT MERGEABLE — PR Score 8/100, below threshold (50)
|
| if (showOnlyUpgradable && backendsFilter !== 'upgradable') { | ||
| // One-shot reconciliation — the old state becomes the new chip. | ||
| setBackendsFilter('upgradable') | ||
| setShowOnlyUpgradable(false) |
There was a problem hiding this comment.
Calling state setters during render causes render-loop risk and should be moved into a useEffect that watches showOnlyUpgradable and backendsFilter.
Suggested fix
useEffect(() => {
if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
setBackendsFilter('upgradable')
setShowOnlyUpgradable(false)
}
}, [showOnlyUpgradable, backendsFilter])Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/pages/Manage.jsx
Lines: 642-645
Issue Type: robustness-high
Severity: high
Issue Description:
Calling state setters during render causes render-loop risk and should be moved into a useEffect that watches showOnlyUpgradable and backendsFilter.
Current Code:
if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
// One-shot reconciliation — the old state becomes the new chip.
setBackendsFilter('upgradable')
setShowOnlyUpgradable(false)
}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error { | ||
| var row PendingBackendOp | ||
| if err := tx.First(&row, id).Error; err != nil { | ||
| return err | ||
| } | ||
| row.Attempts++ | ||
| row.LastError = errMsg | ||
| row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts)) | ||
| return tx.Save(&row).Error |
There was a problem hiding this comment.
The row update in RecordPendingBackendOpFailure is vulnerable to lost updates under concurrent failures; use an atomic UPDATE that increments attempts in SQL.
Suggested fix
return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
var row PendingBackendOp
if err := tx.Clauses(clause.Locking{Strength: "UPDATE"}).First(&row, id).Error; err != nil {
return err
}
attempts := row.Attempts + 1
return tx.Model(&PendingBackendOp{}).
Where("id = ?", id).
Updates(map[string]any{
"attempts": gorm.Expr("attempts + 1"),
"last_error": errMsg,
"next_retry_at": time.Now().Add(backoffForAttempt(attempts)),
}).Error
})Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/registry.go
Lines: 1061-1069
Issue Type: robustness-high
Severity: high
Issue Description:
The row update in RecordPendingBackendOpFailure is vulnerable to lost updates under concurrent failures; use an atomic UPDATE that increments attempts in SQL.
Current Code:
return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
var row PendingBackendOp
if err := tx.First(&row, id).Error; err != nil {
return err
}
row.Attempts++
row.LastError = errMsg
row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts))
return tx.Save(&row).Error
})
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| NodeStatus string `json:"node_status"` // healthy | unhealthy | offline | draining | pending | ||
| Version string `json:"version,omitempty"` | ||
| Digest string `json:"digest,omitempty"` | ||
| URI string `json:"uri,omitempty"` |
There was a problem hiding this comment.
Exposing raw backend URI values in the JSON response can leak internal filesystem paths or registry locations, so omit this field from the API model or redact it before serialization.
| URI string `json:"uri,omitempty"` | |
| URI string `json:"-"` |
Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/gallery/backends.go
Lines: 412-412
Issue Type: security-medium
Severity: medium
Issue Description:
Exposing raw backend URI values in the JSON response can leak internal filesystem paths or registry locations, so omit this field from the API model or redact it before serialization.
Current Code:
URI string `json:"uri,omitempty"`
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| // NodeDrift lists nodes whose installed version or digest differs from | ||
| // the cluster majority. Non-empty means the cluster has diverged and an | ||
| // upgrade will realign it. Empty in single-node mode. | ||
| NodeDrift []NodeDriftInfo `json:"node_drift,omitempty"` | ||
| } | ||
|
|
||
| // CheckBackendUpgrades compares installed backends against gallery entries | ||
| // and returns a map of backend names to UpgradeInfo for those that have | ||
| // newer versions or different OCI digests available. | ||
| // NodeDriftInfo describes one node that disagrees with the cluster majority | ||
| // on which version/digest of a backend is installed. | ||
| type NodeDriftInfo struct { | ||
| NodeID string `json:"node_id"` | ||
| NodeName string `json:"node_name"` | ||
| Version string `json:"version,omitempty"` | ||
| Digest string `json:"digest,omitempty"` | ||
| } |
There was a problem hiding this comment.
NodeDrift exposes internal node IDs, names, versions, and digests to any caller of this API, so redact these fields or gate them behind an authorization check before serializing.
Suggested fix
// NodeDrift lists nodes whose installed version or digest differs from
// the cluster majority. Populate this only for privileged callers.
NodeDrift []NodeDriftInfo `json:"node_drift,omitempty"`
}
// NodeDriftInfo describes one node that disagrees with the cluster majority
// on which version/digest of a backend is installed.
type NodeDriftInfo struct {
NodeName string `json:"node_name,omitempty"`
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/gallery/upgrade.go
Lines: 26-39
Issue Type: security-medium
Severity: medium
Issue Description:
NodeDrift exposes internal node IDs, names, versions, and digests to any caller of this API, so redact these fields or gate them behind an authorization check before serializing.
Current Code:
// NodeDrift lists nodes whose installed version or digest differs from
// the cluster majority. Non-empty means the cluster has diverged and an
// upgrade will realign it. Empty in single-node mode.
NodeDrift []NodeDriftInfo `json:"node_drift,omitempty"`
}
// NodeDriftInfo describes one node that disagrees with the cluster majority
// on which version/digest of a backend is installed.
type NodeDriftInfo struct {
NodeID string `json:"node_id"`
NodeName string `json:"node_name"`
Version string `json:"version,omitempty"`
Digest string `json:"digest,omitempty"`
}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| <button | ||
| key={f.key} | ||
| role="tab" | ||
| aria-selected={activeFilter === f.key} | ||
| className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`} | ||
| onClick={() => onFilterChange(f.key)} | ||
| > |
There was a problem hiding this comment.
The tab buttons are missing type="button", so using this component inside a form will submit the form when a filter is clicked; set an explicit button type.
Suggested fix
<button
type="button"
key={f.key}
role="tab"
aria-selected={activeFilter === f.key}
className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
onClick={() => onFilterChange(f.key)}
>Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/components/FilterBar.jsx
Lines: 54-60
Issue Type: functional-medium
Severity: medium
Issue Description:
The tab buttons are missing `type="button"`, so using this component inside a form will submit the form when a filter is clicked; set an explicit button type.
Current Code:
<button
key={f.key}
role="tab"
aria-selected={activeFilter === f.key}
className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
onClick={() => onFilterChange(f.key)}
>
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| {list.map(n => ( | ||
| <tr key={n.node_id ?? n.NodeID ?? getName(n)}> |
There was a problem hiding this comment.
Rows use a fallback key of getName(n), which can collide for duplicate or empty node names and cause React to reuse the wrong row state, so use a guaranteed-unique fallback such as the map index when no ID is present.
Also reported at: core/http/react-ui/src/components/NodeDistributionChip.jsx L46–L55
Suggested fix
{list.map((n, idx) => (
<tr key={n.node_id ?? n.NodeID ?? `${getName(n)}-${idx}`}>Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/components/NodeDistributionChip.jsx
Lines: 114-115
Issue Type: functional-medium
Severity: medium
Issue Description:
Rows use a fallback key of getName(n), which can collide for duplicate or empty node names and cause React to reuse the wrong row state, so use a guaranteed-unique fallback such as the map index when no ID is present.
_Also reported at: `core/http/react-ui/src/components/NodeDistributionChip.jsx` L46–L55_
Current Code:
{list.map(n => (
<tr key={n.node_id ?? n.NodeID ?? getName(n)}>
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
|
|
||
| for _, node := range allNodes { | ||
| if node.Status != StatusHealthy { | ||
| if node.Status == StatusPending || node.Status == StatusOffline || node.Status == StatusDraining { |
There was a problem hiding this comment.
ListBackends now queries every non-pending, non-offline, non-draining node, including non-backend workers that do not subscribe to backend subjects; skip non-backend node types here the same way enqueueAndDrainBackendOp does.
Suggested fix
if node.Status == StatusPending || node.Status == StatusOffline || node.Status == StatusDraining {
continue
}
if node.NodeType != "" && node.NodeType != NodeTypeBackend {
continue
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/managers_distributed.go
Lines: 237-237
Issue Type: functional-medium
Severity: medium
Issue Description:
ListBackends now queries every non-pending, non-offline, non-draining node, including non-backend workers that do not subscribe to backend subjects; skip non-backend node types here the same way enqueueAndDrainBackendOp does.
Current Code:
if node.Status == StatusPending || node.Status == StatusOffline || node.Status == StatusDraining {
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
Security Scan Summary
No critical security issues detected Scan completed in 43.5sSecurity scan powered by Codity.ai |
Policy Check Failed✗ 3/3 policy checks failed: • Need 2 more approval(s) (0/2) — comment LGTM or approve via review To merge this PR:
|
PR SummaryWhat Changed
Key Changes by AreaDistributed Backend Management
Upgrade Detection
UI/UX
State Reconciliation
Files Changed
Review Focus Areas
ArchitectureDesign Decisions
Scalability & Extensibility
Risks
Merge StatusNOT MERGEABLE — PR Score 8/100, below threshold (50)
|
| func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) { | ||
| galleryBackends, err := AvailableBackends(galleries, systemState) |
There was a problem hiding this comment.
The new ctx parameter is ignored, so upgrade checks can block indefinitely on gallery resolution or remote digest fetches; thread the context into the called functions or remove it until cancellation is supported.
Suggested fix
func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) {
_ = ctx
galleryBackends, err := AvailableBackends(galleries, systemState)Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/gallery/upgrade.go
Lines: 59-60
Issue Type: functional-high
Severity: high
Issue Description:
The new ctx parameter is ignored, so upgrade checks can block indefinitely on gallery resolution or remote digest fetches; thread the context into the called functions or remove it until cancellation is supported.
Current Code:
func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) {
galleryBackends, err := AvailableBackends(galleries, systemState)
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context, op, backend string, galleriesJSON []byte, apply func(node BackendNode) error) (BackendOpResult, error) { | ||
| allNodes, err := d.registry.List(ctx) | ||
| if err != nil { | ||
| return BackendOpResult{}, err | ||
| } | ||
| var errs []error | ||
|
|
||
| result := BackendOpResult{Nodes: make([]NodeOpStatus, 0, len(allNodes))} | ||
| for _, node := range allNodes { | ||
| // Pending nodes haven't been approved yet — no intent to apply. | ||
| if node.Status == StatusPending { | ||
| continue | ||
| } | ||
| // Backend lifecycle ops only make sense on backend-type workers. | ||
| // Agent workers don't subscribe to backend.install/delete/list, so | ||
| // enqueueing for them guarantees a forever-retrying row that the | ||
| // reconciler can never drain. Silently skip — they aren't consumers. | ||
| if node.NodeType != "" && node.NodeType != NodeTypeBackend { | ||
| continue | ||
| } | ||
| if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil { | ||
| xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err) | ||
| result.Nodes = append(result.Nodes, NodeOpStatus{ | ||
| NodeID: node.ID, NodeName: node.Name, Status: "error", | ||
| Error: fmt.Sprintf("enqueue failed: %v", err), | ||
| }) | ||
| continue | ||
| } | ||
|
|
||
| if node.Status != StatusHealthy { | ||
| // Intent is recorded; reconciler will retry when the node recovers. | ||
| result.Nodes = append(result.Nodes, NodeOpStatus{ | ||
| NodeID: node.ID, NodeName: node.Name, Status: "queued", | ||
| Error: fmt.Sprintf("node %s, will retry when healthy", node.Status), | ||
| }) | ||
| continue | ||
| } | ||
| if _, delErr := d.adapter.DeleteBackend(node.ID, name); delErr != nil { | ||
| if errors.Is(delErr, nats.ErrNoResponders) { | ||
| // Node's NATS subscription is gone — likely restarted with a new ID. | ||
| // Mark it unhealthy so future fan-outs skip it. | ||
| xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID) | ||
| d.registry.MarkUnhealthy(context.Background(), node.ID) | ||
| continue | ||
|
|
||
| applyErr := apply(node) | ||
| if applyErr == nil { | ||
| // Find the row we just upserted and delete it; cheap but requires | ||
| // a lookup since UpsertPendingBackendOp doesn't return the ID. | ||
| if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil { | ||
| xlog.Debug("Failed to clear pending backend op after success", "error", err) | ||
| } | ||
| xlog.Warn("Failed to propagate backend deletion to worker", "node", node.Name, "backend", name, "error", delErr) | ||
| errs = append(errs, fmt.Errorf("node %s: %w", node.Name, delErr)) | ||
| result.Nodes = append(result.Nodes, NodeOpStatus{ | ||
| NodeID: node.ID, NodeName: node.Name, Status: "success", | ||
| }) | ||
| continue | ||
| } | ||
|
|
||
| // Record failure for backoff. If it's an ErrNoResponders, the node's | ||
| // gone AWOL — mark unhealthy so the router stops picking it too. | ||
| errMsg := applyErr.Error() | ||
| if errors.Is(applyErr, nats.ErrNoResponders) { | ||
| xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID) | ||
| d.registry.MarkUnhealthy(ctx, node.ID) | ||
| } | ||
| if id, err := d.findPendingRow(ctx, node.ID, backend, op); err == nil { | ||
| _ = d.registry.RecordPendingBackendOpFailure(ctx, id, errMsg) | ||
| } | ||
| result.Nodes = append(result.Nodes, NodeOpStatus{ | ||
| NodeID: node.ID, NodeName: node.Name, Status: "error", Error: errMsg, | ||
| }) | ||
| } | ||
| return result, nil | ||
| } |
There was a problem hiding this comment.
DeleteBackend now always returns nil when some or all nodes fail because enqueueAndDrainBackendOp records per-node errors but never surfaces an aggregated error; return a joined error when any node status is "error".
Suggested fix
func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context, op, backend string, galleriesJSON []byte, apply func(node BackendNode) error) (BackendOpResult, error) {
allNodes, err := d.registry.List(ctx)
if err != nil {
return BackendOpResult{}, err
}
result := BackendOpResult{Nodes: make([]NodeOpStatus, 0, len(allNodes))}
var errs []error
for _, node := range allNodes {
if node.Status == StatusPending {
continue
}
if node.NodeType != "" && node.NodeType != NodeTypeBackend {
continue
}
if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "error",
Error: fmt.Sprintf("enqueue failed: %v", err),
})
errs = append(errs, fmt.Errorf("node %s: enqueue failed: %w", node.Name, err))
continue
}
if node.Status != StatusHealthy {
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "queued",
Error: fmt.Sprintf("node %s, will retry when healthy", node.Status),
})
continue
}
applyErr := apply(node)
if applyErr == nil {
if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil {
xlog.Debug("Failed to clear pending backend op after success", "error", err)
}
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "success",
})
continue
}
errMsg := applyErr.Error()
if errors.Is(applyErr, nats.ErrNoResponders) {
xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
d.registry.MarkUnhealthy(ctx, node.ID)
}
if id, err := d.findPendingRow(ctx, node.ID, backend, op); err == nil {
_ = d.registry.RecordPendingBackendOpFailure(ctx, id, errMsg)
}
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "error", Error: errMsg,
})
errs = append(errs, fmt.Errorf("node %s: %w", node.Name, applyErr))
}
return result, errors.Join(errs...)
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/managers_distributed.go
Lines: 97-162
Issue Type: functional-high
Severity: high
Issue Description:
DeleteBackend now always returns nil when some or all nodes fail because enqueueAndDrainBackendOp records per-node errors but never surfaces an aggregated error; return a joined error when any node status is "error".
Current Code:
func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context, op, backend string, galleriesJSON []byte, apply func(node BackendNode) error) (BackendOpResult, error) {
allNodes, err := d.registry.List(ctx)
if err != nil {
return BackendOpResult{}, err
}
result := BackendOpResult{Nodes: make([]NodeOpStatus, 0, len(allNodes))}
for _, node := range allNodes {
// Pending nodes haven't been approved yet — no intent to apply.
if node.Status == StatusPending {
continue
}
// Backend lifecycle ops only make sense on backend-type workers.
// Agent workers don't subscribe to backend.install/delete/list, so
// enqueueing for them guarantees a forever-retrying row that the
// reconciler can never drain. Silently skip — they aren't consumers.
if node.NodeType != "" && node.NodeType != NodeTypeBackend {
continue
}
if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "error",
Error: fmt.Sprintf("enqueue failed: %v", err),
})
continue
}
if node.Status != StatusHealthy {
// Intent is recorded; reconciler will retry when the node recovers.
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "queued",
Error: fmt.Sprintf("node %s, will retry when healthy", node.Status),
})
continue
}
applyErr := apply(node)
if applyErr == nil {
// Find the row we just upserted and delete it; cheap but requires
// a lookup since UpsertPendingBackendOp doesn't return the ID.
if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil {
xlog.Debug("Failed to clear pending backend op after success", "error", err)
}
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "success",
})
continue
}
// Record failure for backoff. If it's an ErrNoResponders, the node's
// gone AWOL — mark unhealthy so the router stops picking it too.
errMsg := applyErr.Error()
if errors.Is(applyErr, nats.ErrNoResponders) {
xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
d.registry.MarkUnhealthy(ctx, node.ID)
}
if id, err := d.findPendingRow(ctx, node.ID, backend, op); err == nil {
_ = d.registry.RecordPendingBackendOpFailure(ctx, id, errMsg)
}
result.Nodes = append(result.Nodes, NodeOpStatus{
NodeID: node.ID, NodeName: node.Name, Status: "error", Error: errMsg,
})
}
return result, nil
}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| _, _ = advisorylock.TryWithLockCtx(ctx, rc.db, advisorylock.KeyStateReconciler, func() error { | ||
| rc.reconcileState(ctx) | ||
| return nil | ||
| }) |
There was a problem hiding this comment.
Ignoring the state-reconciler advisory lock errors hides database or lock failures; log or handle the returned errors so reconciliation does not silently stop running.
Suggested fix
if acquired, err := advisorylock.TryWithLockCtx(ctx, rc.db, advisorylock.KeyStateReconciler, func() error {
rc.reconcileState(ctx)
return nil
}); err != nil {
xlog.Warn("Reconciler: state reconciliation lock failed", "error", err)
} else if !acquired {
xlog.Debug("Reconciler: state reconciliation skipped because lock is held")
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/reconciler.go
Lines: 147-150
Issue Type: robustness-high
Severity: high
Issue Description:
Ignoring the state-reconciler advisory lock errors hides database or lock failures; log or handle the returned errors so reconciliation does not silently stop running.
Current Code:
_, _ = advisorylock.TryWithLockCtx(ctx, rc.db, advisorylock.KeyStateReconciler, func() error {
rc.reconcileState(ctx)
return nil
})
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool { | ||
| client := grpcclient.NewClientWithToken(address, false, nil, false, g.token) | ||
| probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second) | ||
| defer cancel() | ||
| ok, _ := client.HealthCheck(probeCtx) | ||
| return ok |
There was a problem hiding this comment.
The prober treats every health-check error as a dead backend and deletes the model row; return and inspect the probe error so transient auth or network failures do not orphan healthy models.
Suggested fix
func (g grpcModelProber) IsAlive(ctx context.Context, address string) (bool, error) {
client := grpcclient.NewClientWithToken(address, false, nil, false, g.token)
probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second)
defer cancel()
return client.HealthCheck(probeCtx)
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/reconciler.go
Lines: 28-33
Issue Type: robustness-high
Severity: high
Issue Description:
The prober treats every health-check error as a dead backend and deletes the model row; return and inspect the probe error so transient auth or network failures do not orphan healthy models.
Current Code:
func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool {
client := grpcclient.NewClientWithToken(address, false, nil, false, g.token)
probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second)
defer cancel()
ok, _ := client.HealthCheck(probeCtx)
return ok
}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| if rc.prober.IsAlive(ctx, m.Address) { | ||
| // Bump updated_at so we don't probe this row again immediately. | ||
| _ = rc.registry.db.WithContext(ctx).Model(&NodeModel{}). | ||
| Where("id = ?", m.ID).Update("updated_at", time.Now()).Error | ||
| continue | ||
| } | ||
| if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil { | ||
| xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err) | ||
| continue | ||
| } | ||
| xlog.Warn("Reconciler: model unreachable, removed from registry", |
There was a problem hiding this comment.
Deleting the registry row on any failed probe can remove healthy models during temporary transport or auth failures; only remove after a definitive unhealthy result and log probe errors separately.
Suggested fix
alive, err := rc.prober.IsAlive(ctx, m.Address)
if err != nil {
xlog.Warn("Reconciler: probe failed", "node", m.NodeID, "model", m.ModelName, "address", m.Address, "error", err)
continue
}
if alive {
// Bump updated_at so we don't probe this row again immediately.
_ = rc.registry.db.WithContext(ctx).Model(&NodeModel{}).
Where("id = ?", m.ID).Update("updated_at", time.Now()).Error
continue
}
if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil {
xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
continue
}
xlog.Warn("Reconciler: model unreachable, removed from registry",
"node", m.NodeID, "model", m.ModelName, "address", m.Address)Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/reconciler.go
Lines: 273-283
Issue Type: robustness-high
Severity: high
Issue Description:
Deleting the registry row on any failed probe can remove healthy models during temporary transport or auth failures; only remove after a definitive unhealthy result and log probe errors separately.
Current Code:
if rc.prober.IsAlive(ctx, m.Address) {
// Bump updated_at so we don't probe this row again immediately.
_ = rc.registry.db.WithContext(ctx).Model(&NodeModel{}).
Where("id = ?", m.ID).Update("updated_at", time.Now()).Error
continue
}
if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil {
xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
continue
}
xlog.Warn("Reconciler: model unreachable, removed from registry",
"node", m.NodeID, "model", m.ModelName, "address", m.Address)
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| NodeStatus string `json:"node_status"` // healthy | unhealthy | offline | draining | pending | ||
| Version string `json:"version,omitempty"` | ||
| Digest string `json:"digest,omitempty"` | ||
| URI string `json:"uri,omitempty"` |
There was a problem hiding this comment.
URI is added to per-node API output and metadata without any redaction, so registry credentials or internal endpoints embedded in backend URIs should be stripped before serialization.
Suggested fix
// Store only a sanitized URI here to avoid exposing credentials or internal-only endpoints.
URI string `json:"uri,omitempty"`Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/gallery/backends.go
Lines: 412-412
Issue Type: security-medium
Severity: medium
Issue Description:
URI is added to per-node API output and metadata without any redaction, so registry credentials or internal endpoints embedded in backend URIs should be stripped before serialization.
Current Code:
URI string `json:"uri,omitempty"`
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| <tr key={n.node_id ?? n.NodeID ?? getName(n)}> | ||
| <td className="cell-mono">{getName(n)}</td> | ||
| <td> | ||
| <span className={`badge ${getStatus(n) === 'healthy' ? 'badge-success' : 'badge-warning'}`}> |
There was a problem hiding this comment.
Rows with status 'draining' are rendered as warning instead of info, so map draining to its own badge variant here as in the inline chip view.
Suggested fix
<span className={`badge ${getStatus(n) === 'healthy'
? 'badge-success'
: getStatus(n) === 'draining'
? 'badge-info'
: 'badge-warning'}`}>Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/components/NodeDistributionChip.jsx
Lines: 118-118
Issue Type: functional-medium
Severity: medium
Issue Description:
Rows with status 'draining' are rendered as warning instead of info, so map draining to its own badge variant here as in the inline chip view.
Current Code:
<span className={`badge ${getStatus(n) === 'healthy' ? 'badge-success' : 'badge-warning'}`}>
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| func (r *NodeRegistry) RecordPendingBackendOpFailure(ctx context.Context, id uint, errMsg string) error { | ||
| return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error { | ||
| var row PendingBackendOp | ||
| if err := tx.First(&row, id).Error; err != nil { | ||
| return err | ||
| } | ||
| row.Attempts++ | ||
| row.LastError = errMsg | ||
| row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts)) | ||
| return tx.Save(&row).Error |
There was a problem hiding this comment.
LastError stores arbitrary backend error text without any length limit, so a malicious or noisy peer can persist unbounded log-sized payloads; truncate or cap errMsg before saving.
Suggested fix
func (r *NodeRegistry) RecordPendingBackendOpFailure(ctx context.Context, id uint, errMsg string) error {
const maxErrLen = 4096
if len(errMsg) > maxErrLen {
errMsg = errMsg[:maxErrLen]
}
return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
var row PendingBackendOp
if err := tx.First(&row, id).Error; err != nil {
return err
}
row.Attempts++
row.LastError = errMsg
row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts))
return tx.Save(&row).Error
})
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert go developer with deep knowledge of security, performance, and best practices.
### Context
File: core/services/nodes/registry.go
Lines: 1060-1069
Issue Type: security-medium
Severity: medium
Issue Description:
LastError stores arbitrary backend error text without any length limit, so a malicious or noisy peer can persist unbounded log-sized payloads; truncate or cap errMsg before saving.
Current Code:
func (r *NodeRegistry) RecordPendingBackendOpFailure(ctx context.Context, id uint, errMsg string) error {
return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
var row PendingBackendOp
if err := tx.First(&row, id).Error; err != nil {
return err
}
row.Attempts++
row.LastError = errMsg
row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts))
return tx.Save(&row).Error
})
}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
Security Scan Summary
No critical security issues detected Scan completed in 47.3sSecurity scan powered by Codity.ai |
Dependency vulnerability scanning
View vulnerability details (6 items)1. brace-expansion various CVE: GHSA-f886-m6hf-6m8v
2. express-rate-limit various CVE: N/A
3. fast-uri various CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
4. hono various CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
5. ip-address various CVE: GHSA-v2v4-37r5-5v8g
6. postcss various CVE: GHSA-qx2v-qp2m-jg93
Powered by Codity.ai · Docs |
Dependency vulnerability scanning
View vulnerability details (4 items)1. pip 24.0 CVE: GHSA-4xh5-x5gv-qwph
2. pip 24.0 CVE: GHSA-6vgw-5pg2-w6jp
3. pip 24.0 CVE: GHSA-58qw-9mgm-455v
4. pip 24.0 CVE: GHSA-jp4c-xjxw-mgf9
Powered by Codity.ai · Docs |
License Compliance Scan
Weak copyleft licenses found - verify compatibility Some packages have unknown licenses - manual review required Medium Risk Licenses - 6 packages(MPL-2.0 OR Apache-2.0) (1 packages):
MPL-2.0 (5 packages):
Unknown Licenses - 95 packages
...and 75 more Powered by Codity.ai · Docs |
Code Quality Report — test-org-codity/LocalAI · PR #7Scanned: 2026-05-20 17:49 UTC | Score: 14/100 | Provider: github Executive Summary
Top Findings[CQ-LLM-003]
|
| File | Critical | High | Medium | Low | Total |
|---|---|---|---|---|---|
core/application/distributed.go |
0 | 0 | 2 | 1 | 3 |
core/application/upgrade_checker.go |
0 | 1 | 1 | 2 | 4 |
core/gallery/backends.go |
0 | 0 | 0 | 1 | 1 |
core/gallery/upgrade.go |
0 | 0 | 1 | 1 | 2 |
core/http/react-ui/src/App.css |
0 | 0 | 0 | 14 | 14 |
core/http/react-ui/src/components/FilterBar.jsx |
0 | 0 | 0 | 1 | 1 |
core/http/react-ui/src/components/NodeDistributionChip.jsx |
0 | 0 | 0 | 11 | 11 |
core/http/react-ui/src/components/Popover.jsx |
0 | 0 | 0 | 2 | 2 |
core/http/react-ui/src/pages/Manage.jsx |
0 | 0 | 0 | 122 | 122 |
core/http/react-ui/src/pages/Nodes.jsx |
0 | 0 | 0 | 36 | 36 |
core/services/advisorylock/keys.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/managers_distributed.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/reconciler.go |
0 | 0 | 0 | 2 | 2 |
core/services/nodes/reconciler_test.go |
0 | 0 | 0 | 9 | 9 |
core/services/nodes/registry.go |
0 | 0 | 0 | 7 | 7 |
Recommendations
- Resolve High severity issues, especially error handling gaps and performance bottlenecks.
- Run automated tests after applying fixes to verify no regressions.
|
@codity review |
Policy Check Failed✗ 3/3 policy checks failed: • Need 2 more approval(s) (0/2) — comment LGTM or approve via review To merge this PR:
|
PR SummaryWhat Changed
Key Changes by AreaBackend Lifecycle
Distributed Coordination
Frontend
Files Changed
Review Focus Areas
ArchitectureDesign Decisions
Scalability & Extensibility
Risks
Merge StatusNOT MERGEABLE — PR Score 9/100, below threshold (50)
|
| useEffect(() => { | ||
| if (!open && anchor?.current) { | ||
| // requestAnimationFrame so the close is painted before focus jumps; | ||
| // otherwise screen readers announce the trigger mid-transition. | ||
| const raf = requestAnimationFrame(() => anchor.current?.focus?.()) | ||
| return () => cancelAnimationFrame(raf) | ||
| } | ||
| }, [open, anchor]) |
There was a problem hiding this comment.
The focus-restoration useEffect fires on initial mount, not only when the popover closes. When the component first mounts open is false and anchor.current is already set (React attaches refs in the commit phase, before passive effects run). The condition !open && anchor?.current is therefore true on the very first render, so requestAnimationFrame(() => anchor.current?.focus?.()) is scheduled immediately, stealing keyboard focus from whatever the user was doing. Every NodeDistributionChip rendered in summary mode (4+ nodes) triggers this on page load, causing the last chip's trigger button to unexpectedly receive focus.
Suggested fix
const popoverWasOpenRef = useRef(false)
useEffect(() => {
const wasOpen = popoverWasOpenRef.current
popoverWasOpenRef.current = open
if (!open && wasOpen && anchor?.current) {
const raf = requestAnimationFrame(() => anchor.current?.focus?.())
return () => cancelAnimationFrame(raf)
}
}, [open, anchor])Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/components/Popover.jsx
Lines: 64-71
Issue Type: functional-high
Severity: high
Issue Description:
The focus-restoration `useEffect` fires on **initial mount**, not only when the popover closes. When the component first mounts `open` is `false` and `anchor.current` is already set (React attaches refs in the commit phase, before passive effects run). The condition `!open && anchor?.current` is therefore `true` on the very first render, so `requestAnimationFrame(() => anchor.current?.focus?.())` is scheduled immediately, stealing keyboard focus from whatever the user was doing. Every `NodeDistributionChip` rendered in summary mode (4+ nodes) triggers this on page load, causing the last chip's trigger button to unexpectedly receive focus.
Current Code:
useEffect(() => {
if (!open && anchor?.current) {
// requestAnimationFrame so the close is painted before focus jumps;
// otherwise screen readers announce the trigger mid-transition.
const raf = requestAnimationFrame(() => anchor.current?.focus?.())
return () => cancelAnimationFrame(raf)
}
}, [open, anchor])
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| role="tab" | ||
| aria-selected={activeFilter === f.key} | ||
| className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`} | ||
| onClick={() => onFilterChange(f.key)} |
There was a problem hiding this comment.
onFilterChange is invoked unconditionally inside the onClick handler whenever hasFilters is true, but the prop has no default value and is not declared as required. If a caller passes a non-empty filters array while omitting onFilterChange (a realistic mistake when wiring up a read-only filter strip), clicking any chip throws TypeError: onFilterChange is not a function. The optional-chaining call operator ?.() is the idiomatic zero-cost guard here.
| onClick={() => onFilterChange(f.key)} | |
| onClick={() => onFilterChange?.(f.key)} |
Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/components/FilterBar.jsx
Lines: 59-59
Issue Type: robustness-medium
Severity: medium
Issue Description:
`onFilterChange` is invoked unconditionally inside the `onClick` handler whenever `hasFilters` is true, but the prop has no default value and is not declared as required. If a caller passes a non-empty `filters` array while omitting `onFilterChange` (a realistic mistake when wiring up a read-only filter strip), clicking any chip throws `TypeError: onFilterChange is not a function`. The optional-chaining call operator `?.()` is the idiomatic zero-cost guard here.
Current Code:
onClick={() => onFilterChange(f.key)}
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
| setUpgradingAll(true) | ||
| try { | ||
| // Serial upgrade — matches the gallery's Upgrade All behavior. | ||
| // Each backend upgrade is itself a cluster-wide fan-out, so parallel | ||
| // calls would multiply load on every worker. | ||
| for (const name of names) { | ||
| try { | ||
| await backendsApi.upgrade(name) | ||
| } catch (err) { | ||
| addToast(`Upgrade failed for ${name}: ${err.message}`, 'error') |
There was a problem hiding this comment.
handleUpgradeAll shows the success info-toast unconditionally after the loop, even when every individual upgrade threw an error. An operator who clicks "Upgrade all" and gets N error toasts followed by "Upgrade started for N backends" receives conflicting feedback and has no reliable way to know whether any upgrade was actually initiated. Additionally err.message is undefined for non-Error rejections (e.g. a plain string throw) so the per-upgrade error toasts can display "Upgrade failed for foo: undefined".
Suggested fix
let succeeded = 0
for (const name of names) {
try {
await backendsApi.upgrade(name)
succeeded++
} catch (err) {
addToast(`Upgrade failed for ${name}: ${err.message ?? err}`, 'error')
}
}
if (succeeded > 0) {
addToast(`Upgrade started for ${succeeded} backend${succeeded === 1 ? '' : 's'}`, 'info')
}Prompt for AI assistance
Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:
You are an expert javascript developer with deep knowledge of security, performance, and best practices.
### Context
File: core/http/react-ui/src/pages/Manage.jsx
Lines: 269-278
Issue Type: functional-medium
Severity: medium
Issue Description:
`handleUpgradeAll` shows the success info-toast unconditionally after the loop, even when every individual upgrade threw an error. An operator who clicks "Upgrade all" and gets N error toasts followed by "Upgrade started for N backends" receives conflicting feedback and has no reliable way to know whether any upgrade was actually initiated. Additionally `err.message` is `undefined` for non-`Error` rejections (e.g. a plain string `throw`) so the per-upgrade error toasts can display "Upgrade failed for foo: undefined".
Current Code:
for (const name of names) {
try {
await backendsApi.upgrade(name)
} catch (err) {
addToast(`Upgrade failed for ${name}: ${err.message}`, 'error')
}
}
addToast(`Upgrade started for ${names.length} backend${names.length === 1 ? '' : 's'}`, 'info')
---
### Instructions
1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow javascript best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed
### Constraints
- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready
---
Security Scan Summary
No critical security issues detected Scan completed in 40.6sSecurity scan powered by Codity.ai |
Dependency vulnerability scanning
View vulnerability details (7 items)1. brace-expansion various CVE: GHSA-f886-m6hf-6m8v
2. express-rate-limit various CVE: N/A
3. fast-uri various CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
4. hono various CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
5. ip-address various CVE: GHSA-v2v4-37r5-5v8g
6. postcss various CVE: GHSA-qx2v-qp2m-jg93
7. qs various CVE: GHSA-q8mj-m7cp-5q26
Powered by Codity.ai · Docs |
Dependency vulnerability scanning
View vulnerability details (4 items)1. pip 24.0 CVE: GHSA-4xh5-x5gv-qwph
2. pip 24.0 CVE: GHSA-6vgw-5pg2-w6jp
3. pip 24.0 CVE: GHSA-58qw-9mgm-455v
4. pip 24.0 CVE: GHSA-jp4c-xjxw-mgf9
Powered by Codity.ai · Docs |
License Compliance Scan
Weak copyleft licenses found - verify compatibility Some packages have unknown licenses - manual review required Medium Risk Licenses - 6 packages(MPL-2.0 OR Apache-2.0) (1 packages):
MPL-2.0 (5 packages):
Unknown Licenses - 126 packages
...and 106 more Powered by Codity.ai · Docs |
Code Quality Report — test-org-codity/LocalAI · PR #7Scanned: 2026-05-24 20:20 UTC | Score: 15/100 | Provider: github Executive Summary
Top Findings[CQ-LLM-003]
|
| File | Critical | High | Medium | Low | Total |
|---|---|---|---|---|---|
core/application/distributed.go |
0 | 0 | 1 | 1 | 2 |
core/application/upgrade_checker.go |
0 | 0 | 2 | 2 | 4 |
core/gallery/backends.go |
0 | 0 | 0 | 1 | 1 |
core/gallery/upgrade.go |
0 | 1 | 0 | 1 | 2 |
core/http/react-ui/src/App.css |
0 | 0 | 0 | 14 | 14 |
core/http/react-ui/src/components/FilterBar.jsx |
0 | 0 | 0 | 1 | 1 |
core/http/react-ui/src/components/NodeDistributionChip.jsx |
0 | 0 | 0 | 11 | 11 |
core/http/react-ui/src/components/Popover.jsx |
0 | 0 | 0 | 2 | 2 |
core/http/react-ui/src/pages/Manage.jsx |
0 | 0 | 0 | 122 | 122 |
core/http/react-ui/src/pages/Nodes.jsx |
0 | 0 | 0 | 36 | 36 |
core/services/advisorylock/keys.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/managers_distributed.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/reconciler.go |
0 | 0 | 0 | 2 | 2 |
core/services/nodes/reconciler_test.go |
0 | 0 | 0 | 9 | 9 |
core/services/nodes/registry.go |
0 | 0 | 0 | 7 | 7 |
Recommendations
- Resolve High severity issues, especially error handling gaps and performance bottlenecks.
- Run automated tests after applying fixes to verify no regressions.
|
@codity review |
1 similar comment
|
@codity review |
Policy Check Failed✗ 3/3 policy checks failed: • Need 2 more approval(s) (0/2) — comment LGTM or approve via review To merge this PR:
|
PR SummaryWhat Changed
Key Changes by AreaDistributed Backend Management
Backend Operation Queue
State Reconciliation
Cluster-Aware API
React UI
Files Changed
Review Focus Areas
Architecture
|
Security Scan Summary
No critical security issues detected Scan completed in 39.4sSecurity scan powered by Codity.ai |
|
@codity review |
PR SummaryWhat Changed
Key Changes by AreaDistributed Backend Management
Upgrade Detection
Cluster Data Model
UI Components
Management UI
Files Changed
Review Focus Areas
ArchitectureDesign Decisions
Scalability & Extensibility
Risks
Merge StatusNOT MERGEABLE — PR Score 8/100, below threshold (50)
|
Workflow DiagramsAutomatically generated sequence diagrams showing the workflows in this PR 1. ## AnalysisThis PR introduces a sophisticated distribut... Complex complexity • Components: UpgradeChecker, DistributedBackendManager, CheckUpgradesAgainst sequenceDiagram
title Distributed Backend Upgrade Detection with Drift Awareness
participant UC as UpgradeChecker
participant AL as AdvisoryLock
participant DB as PostgreSQL
participant DBM as DistributedBackendManager
participant Worker as WorkerNode
participant Gallery as GalleryService
participant UI as ReactUI
Note over UC: Periodic check triggered every 6 hours
UC->>AL: Acquire lock for upgrade check
alt Lock acquired this is the leader instance
AL->>DB: SELECT pg_try_advisory_lock(key)
DB-->>AL: true
UC->>DBM: CheckUpgrades()
Note over DBM: Aggregates from all workers instead of empty frontend filesystem
DBM->>Worker: ListBackends() via NATS
Worker-->>DBM: NodeBackendRef with version and digest
DBM->>DBM: Aggregate into SystemBackends with per-node metadata
DBM->>Gallery: CheckUpgradesAgainst(galleries, systemState, installedBackends)
Gallery->>Gallery: summarizeNodeDrift(nodes)
Note over Gallery: Detects version or digest divergence across cluster
Gallery->>Gallery: Compare gallery version vs majority cluster version
alt Cluster drift detected
Gallery-->>DBM: UpgradeInfo with NodeDrift populated
else All nodes agree
Gallery-->>DBM: UpgradeInfo with empty NodeDrift
end
DBM-->>UC: map of available upgrades
UC->>UC: Cache results in lastUpgrades
alt Auto-upgrade enabled
UC->>Gallery: UpgradeBackend()
Gallery->>DBM: UpgradeBackend(name)
DBM->>Worker: InstallBackend via NATS
Worker-->>DBM: success
end
else Lock not acquired another instance is leader
AL->>DB: SELECT pg_try_advisory_lock(key)
DB-->>AL: false
Note over UC: Skip periodic check this instance is not leader
end
UC->>UI: GetAvailableUpgrades() API call
UI-->>UC: Cached upgrade results including NodeDrift
Note over UI: Displays upgrade banner with cluster drift indicators
Note: Diagrams show detected patterns only. Complex workflows may require manual review. |
Security Scan Summary
No critical security issues detected Scan completed in 58.0sSecurity scan powered by Codity.ai |
Dependency vulnerability scanning
View vulnerability details (7 items)1. brace-expansion various CVE: GHSA-f886-m6hf-6m8v
2. express-rate-limit various CVE: N/A
3. fast-uri various CVE: GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc
4. hono various CVE: GHSA-qp7p-654g-cw7p, GHSA-hm8q-7f3q-5f36, GHSA-p77w-8qqv-26rm, GHSA-9vqf-7f2p-gf9v, GHSA-69xw-7hcm-h432
5. ip-address various CVE: GHSA-v2v4-37r5-5v8g
6. postcss various CVE: GHSA-qx2v-qp2m-jg93
7. qs various CVE: GHSA-q8mj-m7cp-5q26
Powered by Codity.ai · Docs |
Dependency vulnerability scanning
View vulnerability details (4 items)1. pip 24.0 CVE: GHSA-4xh5-x5gv-qwph
2. pip 24.0 CVE: GHSA-6vgw-5pg2-w6jp
3. pip 24.0 CVE: GHSA-58qw-9mgm-455v
4. pip 24.0 CVE: GHSA-jp4c-xjxw-mgf9
Powered by Codity.ai · Docs |
License Compliance Scan
Weak copyleft licenses found - verify compatibility Some packages have unknown licenses - manual review required Medium Risk Licenses - 6 packages(MPL-2.0 OR Apache-2.0) (1 packages):
MPL-2.0 (5 packages):
Unknown Licenses - 154 packages
...and 134 more Powered by Codity.ai · Docs |
Code Quality Report — test-org-codity/LocalAI · PR #7Scanned: 2026-05-25 11:19 UTC | Score: 14/100 | Provider: github Executive Summary
Top Findings[CQ-LLM-003]
|
| File | Critical | High | Medium | Low | Total |
|---|---|---|---|---|---|
core/application/distributed.go |
0 | 0 | 2 | 1 | 3 |
core/application/startup.go |
0 | 0 | 1 | 0 | 1 |
core/application/upgrade_checker.go |
0 | 1 | 1 | 2 | 4 |
core/gallery/upgrade.go |
0 | 0 | 0 | 2 | 2 |
core/http/react-ui/src/App.css |
0 | 0 | 0 | 14 | 14 |
core/http/react-ui/src/components/FilterBar.jsx |
0 | 0 | 0 | 1 | 1 |
core/http/react-ui/src/components/NodeDistributionChip.jsx |
0 | 0 | 0 | 11 | 11 |
core/http/react-ui/src/components/Popover.jsx |
0 | 0 | 0 | 2 | 2 |
core/http/react-ui/src/pages/Manage.jsx |
0 | 0 | 0 | 122 | 122 |
core/http/react-ui/src/pages/Nodes.jsx |
0 | 0 | 0 | 36 | 36 |
core/services/advisorylock/keys.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/managers_distributed.go |
0 | 0 | 0 | 1 | 1 |
core/services/nodes/reconciler.go |
0 | 0 | 0 | 2 | 2 |
core/services/nodes/reconciler_test.go |
0 | 0 | 0 | 9 | 9 |
core/services/nodes/registry.go |
0 | 0 | 0 | 7 | 7 |
Recommendations
- Resolve High severity issues, especially error handling gaps and performance bottlenecks.
- Run automated tests after applying fixes to verify no regressions.
Description
This PR fixes #
Notes for Reviewers
Signed commits