Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions core/application/application.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ type Application struct {

// Distributed mode services (nil when not in distributed mode)
distributed *DistributedServices

// Upgrade checker (background service for detecting backend upgrades)
upgradeChecker *UpgradeChecker
}

func newApplication(appConfig *config.ApplicationConfig) *Application {
Expand Down Expand Up @@ -79,6 +82,19 @@ func (a *Application) AgentJobService() *agentpool.AgentJobService {
return a.agentJobService
}

func (a *Application) UpgradeChecker() *UpgradeChecker {
return a.upgradeChecker
}

// distributedDB returns the PostgreSQL database for distributed coordination,
// or nil in standalone mode.
func (a *Application) distributedDB() *gorm.DB {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional High

In distributed mode the new upgrade checker is passed authDB instead of the distributed coordination DB, so starting two frontend replicas against separate auth databases will both acquire their own advisory locks and auto-upgrade the same backend concurrently.

Return the actual distributed coordination/postgres DB from a.distributed here (or rename/remove this helper if advisory locks are not supported in the current mode) so all replicas contend on the same database lock.

Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/application.go
Lines: 91-91
Issue Type: functional-high
Severity: high

Issue Description:
In distributed mode the new upgrade checker is passed `authDB` instead of the distributed coordination DB, so starting two frontend replicas against separate auth databases will both acquire their own advisory locks and auto-upgrade the same backend concurrently.

Current Code:
func (a *Application) distributedDB() *gorm.DB {
	if a.distributed != nil {
		return a.authDB
	}
	return nil
}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue Jira

if a.distributed != nil {
return a.authDB
}
return nil
}
Comment on lines +88 to +96

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 distributedDB may return nil in distributed mode when auth is not configured

distributedDB returns a.authDB when a.distributed != nil. But authDB is only populated when auth is separately enabled (auth.InitDB); a deployment running distributed mode without auth will have distributed != nil but authDB == nil. In that case NewUpgradeChecker receives a nil DB and silently falls back to standalone mode, meaning every frontend replica runs its own independent periodic-check loop without the advisory-lock coordination that was intended for distributed deployments.


func (a *Application) AgentPoolService() *agentpool.AgentPoolService {
return a.agentPoolService.Load()
}
Expand Down
3 changes: 3 additions & 0 deletions core/application/config_file_watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,9 @@ func readRuntimeSettingsJson(startupAppConfig config.ApplicationConfig) fileHand
if settings.AutoloadBackendGalleries != nil && !envAutoloadBackendGalleries {
appConfig.AutoloadBackendGalleries = *settings.AutoloadBackendGalleries
}
if settings.AutoUpgradeBackends != nil {
appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
Comment on lines +338 to +339

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Medium

This setting bypasses the existing env var precedence checks, so a writable runtime_settings.json can override startup policy; gate it behind the corresponding env flag like the surrounding settings.

Suggested fix
			if settings.AutoUpgradeBackends != nil && !envAutoUpgradeBackends {
				appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
			}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/config_file_watcher.go
Lines: 338-339
Issue Type: security-medium
Severity: medium

Issue Description:
This setting bypasses the existing env var precedence checks, so a writable runtime_settings.json can override startup policy; gate it behind the corresponding env flag like the surrounding settings.

Current Code:
			if settings.AutoUpgradeBackends != nil {
				appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
			}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue Jira

}
Comment on lines +338 to +340

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robustness High

The AutoUpgradeBackends setting from runtime_settings.json is applied without checking the envAutoUpgradeBackends guard that all other similarly-gated settings use in the same function. This causes environment variable precedence to be violated: if LOCALAI_AUTO_UPGRADE_BACKENDS=false is explicitly set as an environment variable, a runtime_settings.json file containing {"auto_upgrade_backends": true} will silently override this policy.

Suggested fix
			if settings.AutoUpgradeBackends != nil && !envAutoUpgradeBackends {
							appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
						}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert bash developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/config_file_watcher.go
Lines: 338-340
Issue Type: robustness-high
Severity: high

Issue Description:
The `AutoUpgradeBackends` setting from `runtime_settings.json` is applied without checking the `envAutoUpgradeBackends` guard that all other similarly-gated settings use in the same function. This causes environment variable precedence to be violated: if `LOCALAI_AUTO_UPGRADE_BACKENDS=false` is explicitly set as an environment variable, a `runtime_settings.json` file containing `{"auto_upgrade_backends": true}` will silently override this policy.

Current Code:
			if settings.AutoUpgradeBackends != nil {
				appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
			}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow bash best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue

if settings.ApiKeys != nil {
// API keys from env vars (startup) should be kept, runtime settings keys replace all runtime keys
// If runtime_settings.json specifies ApiKeys (even if empty), it replaces all runtime keys
Expand Down
9 changes: 9 additions & 0 deletions core/application/startup.go
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,15 @@ func New(opts ...config.AppOption) (*Application, error) {
xlog.Error("error registering external backends", "error", err)
}

// Start background upgrade checker for backends.
// In distributed mode, uses PostgreSQL advisory lock so only one frontend
// instance runs periodic checks (avoids duplicate upgrades across replicas).
if len(options.BackendGalleries) > 0 {
uc := NewUpgradeChecker(options, application.ModelLoader(), application.distributedDB())
application.upgradeChecker = uc
go uc.Run(options.Context)
}

if options.ConfigFile != "" {
if err := application.ModelConfigLoader().LoadMultipleModelConfigsSingleFile(options.ConfigFile, configLoaderOpts...); err != nil {
xlog.Error("error loading config file", "error", err)
Expand Down
198 changes: 198 additions & 0 deletions core/application/upgrade_checker.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
package application

import (
"context"
"sync"
"time"

"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/services/advisorylock"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/system"
"github.com/mudler/xlog"
"gorm.io/gorm"
)

// UpgradeChecker periodically checks for backend upgrades and optionally
// auto-upgrades them. It caches the last check results for API queries.
//
// In standalone mode it runs a simple ticker loop.
// In distributed mode it uses a PostgreSQL advisory lock so that only one
// frontend instance performs periodic checks and auto-upgrades at a time.
type UpgradeChecker struct {
appConfig *config.ApplicationConfig
modelLoader *model.ModelLoader
galleries []config.Gallery
systemState *system.SystemState
db *gorm.DB // non-nil in distributed mode

checkInterval time.Duration
stop chan struct{}
done chan struct{}
triggerCh chan struct{}

mu sync.RWMutex
lastUpgrades map[string]gallery.UpgradeInfo
lastCheckTime time.Time
}

// NewUpgradeChecker creates a new UpgradeChecker service.
// Pass db=nil for standalone mode, or a *gorm.DB for distributed mode
// (uses advisory locks so only one instance runs periodic checks).
func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB) *UpgradeChecker {
return &UpgradeChecker{
appConfig: appConfig,
modelLoader: ml,
galleries: appConfig.BackendGalleries,
systemState: appConfig.SystemState,
db: db,
checkInterval: 6 * time.Hour,
stop: make(chan struct{}),
done: make(chan struct{}),
triggerCh: make(chan struct{}, 1),
lastUpgrades: make(map[string]gallery.UpgradeInfo),
}
}

// Run starts the upgrade checker loop. It waits 30 seconds after startup,
// performs an initial check, then re-checks every 6 hours.
//
// In distributed mode, periodic checks are guarded by a PostgreSQL advisory
// lock so only one frontend instance runs them. On-demand triggers (TriggerCheck)
// and the initial check always run locally for fast API response cache warming.
func (uc *UpgradeChecker) Run(ctx context.Context) {
defer close(uc.done)

// Initial delay: don't slow down startup
select {
case <-ctx.Done():
return
case <-uc.stop:
return
case <-time.After(30 * time.Second):
}

// First check always runs locally (to warm the cache on this instance)
uc.runCheck(ctx)
Comment on lines +76 to +77

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robustness Critical

The initial uc.runCheck(ctx) call at line 77 executes unconditionally on every frontend instance before the distributed advisory-lock guard is installed. When AutoUpgradeBackends = true and multiple frontend replicas start simultaneously, each replica calls gallery.UpgradeBackend concurrently for the same backends, racing on filesystem paths like .upgrade-tmp and .backup directories, causing corrupted binaries and undefined ordering of atomic rename operations.

Suggested fix
	// First check always runs locally (to warm the cache on this instance)
		if uc.db == nil {
			// Only run initial check in standalone mode
			// In distributed mode, wait for advisory lock to avoid concurrent upgrade races
			uc.runCheck(ctx)
		}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert bash developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/upgrade_checker.go
Lines: 76-77
Issue Type: robustness-critical
Severity: critical

Issue Description:
The initial `uc.runCheck(ctx)` call at line 77 executes unconditionally on every frontend instance before the distributed advisory-lock guard is installed. When `AutoUpgradeBackends = true` and multiple frontend replicas start simultaneously, each replica calls `gallery.UpgradeBackend` concurrently for the same backends, racing on filesystem paths like `.upgrade-tmp` and `.backup` directories, causing corrupted binaries and undefined ordering of atomic rename operations.

Current Code:
	// First check always runs locally (to warm the cache on this instance)
	uc.runCheck(ctx)

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow bash best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue


if uc.db != nil {
// Distributed mode: use advisory lock for periodic checks.
// RunLeaderLoop ticks every checkInterval; only the lock holder executes.
go advisorylock.RunLeaderLoop(ctx, uc.db, advisorylock.KeyBackendUpgradeCheck, uc.checkInterval, func() {
uc.runCheck(ctx)
})
Comment on lines +82 to +84

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robustness Medium

In distributed mode, Run() spawns a RunLeaderLoop goroutine whose only cancellation signal is ctx. When Shutdown() is called (closes uc.stop), the outer for loop returns and close(uc.done) is executed: but the spawned goroutine keeps running until the parent context is cancelled. Any caller (including tests) that invokes Shutdown() before cancelling ctx incorrectly believes all activity has stopped, while the goroutine continues to hold the advisory lock and run runCheck. Wrapping ctx in a derived cancellable context and cancelling it via defer in Run() would propagate the stop signal to the goroutine.

Suggested fix
		runCtx, cancelRun := context.WithCancel(ctx)
		defer cancelRun()
		go advisorylock.RunLeaderLoop(runCtx, uc.db, advisorylock.KeyBackendUpgradeCheck, uc.checkInterval, func() {
			uc.runCheck(runCtx)
		})
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert bash developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/upgrade_checker.go
Lines: 82-84
Issue Type: robustness-medium
Severity: medium

Issue Description:
In distributed mode, `Run()` spawns a `RunLeaderLoop` goroutine whose only cancellation signal is `ctx`. When `Shutdown()` is called (closes `uc.stop`), the outer `for` loop returns and `close(uc.done)` is executed: but the spawned goroutine keeps running until the parent context is cancelled. Any caller (including tests) that invokes `Shutdown()` before cancelling `ctx` incorrectly believes all activity has stopped, while the goroutine continues to hold the advisory lock and run `runCheck`. Wrapping `ctx` in a derived cancellable context and cancelling it via `defer` in `Run()` would propagate the stop signal to the goroutine.

Current Code:
		go advisorylock.RunLeaderLoop(ctx, uc.db, advisorylock.KeyBackendUpgradeCheck, uc.checkInterval, func() {
			uc.runCheck(ctx)
		})

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow bash best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue


// Still listen for on-demand triggers (from API / settings change)
// and stop signal — these run on every instance.
for {
select {
case <-ctx.Done():
return
case <-uc.stop:
return
case <-uc.triggerCh:
uc.runCheck(ctx)
Comment on lines +82 to +95

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional Critical

runCheck can execute concurrently from the local trigger path and the leader loop, causing overlapping upgrades of the same backend; serialize the whole check-and-upgrade routine with a dedicated mutex.

Suggested fix
type UpgradeChecker struct {
	appConfig   *config.ApplicationConfig
	modelLoader *model.ModelLoader
	galleries   []config.Gallery
	systemState *system.SystemState
	db          *gorm.DB

	checkInterval time.Duration
	stop          chan struct{}
	done          chan struct{}
	triggerCh     chan struct{}
	runMu         sync.Mutex

	mu            sync.RWMutex
	lastUpgrades  map[string]gallery.UpgradeInfo
	lastCheckTime time.Time
}

func (uc *UpgradeChecker) runCheck(ctx context.Context) {
	uc.runMu.Lock()
	defer uc.runMu.Unlock()

	upgrades, err := gallery.CheckBackendUpgrades(ctx, uc.galleries, uc.systemState)
	// existing logic...
}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/upgrade_checker.go
Lines: 82-95
Issue Type: functional-critical
Severity: critical

Issue Description:
runCheck can execute concurrently from the local trigger path and the leader loop, causing overlapping upgrades of the same backend; serialize the whole check-and-upgrade routine with a dedicated mutex.

Current Code:
go advisorylock.RunLeaderLoop(ctx, uc.db, advisorylock.KeyBackendUpgradeCheck, uc.checkInterval, func() {
	uc.runCheck(ctx)
})

...
case <-uc.triggerCh:
	uc.runCheck(ctx)

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue Jira

}
}
} else {
// Standalone mode: simple ticker loop
ticker := time.NewTicker(uc.checkInterval)
defer ticker.Stop()

for {
select {
case <-ctx.Done():
return
case <-uc.stop:
return
case <-ticker.C:
uc.runCheck(ctx)
case <-uc.triggerCh:
uc.runCheck(ctx)
}
}
}
}

// Shutdown stops the upgrade checker loop.
func (uc *UpgradeChecker) Shutdown() {
close(uc.stop)
<-uc.done
Comment on lines +119 to +121

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional High

Shutdown closes uc.stop unconditionally, so a second call will panic; guard it with sync.Once or a non-blocking close path.

Suggested fix
func (uc *UpgradeChecker) Shutdown() {
	uc.shutdownOnce.Do(func() {
		close(uc.stop)
	})
	<-uc.done
}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/upgrade_checker.go
Lines: 119-121
Issue Type: functional-high
Severity: high

Issue Description:
Shutdown closes uc.stop unconditionally, so a second call will panic; guard it with sync.Once or a non-blocking close path.

Current Code:
func (uc *UpgradeChecker) Shutdown() {
	close(uc.stop)
	<-uc.done
}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue

Comment on lines +119 to +121

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional Critical

Shutdown closes uc.stop unconditionally, so a second call panics; guard the close with sync.Once or a non-blocking state check.

Suggested fix
type UpgradeChecker struct {
	appConfig   *config.ApplicationConfig
	modelLoader *model.ModelLoader
	galleries   []config.Gallery
	systemState *system.SystemState
	db          *gorm.DB

	checkInterval time.Duration
	stop          chan struct{}
	done          chan struct{}
	triggerCh     chan struct{}
	shutdownOnce  sync.Once

	mu            sync.RWMutex
	lastUpgrades  map[string]gallery.UpgradeInfo
	lastCheckTime time.Time
}

func (uc *UpgradeChecker) Shutdown() {
	uc.shutdownOnce.Do(func() {
		close(uc.stop)
	})
	<-uc.done
}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/upgrade_checker.go
Lines: 119-121
Issue Type: functional-critical
Severity: critical

Issue Description:
Shutdown closes uc.stop unconditionally, so a second call panics; guard the close with sync.Once or a non-blocking state check.

Current Code:
func (uc *UpgradeChecker) Shutdown() {
	close(uc.stop)
	<-uc.done
}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue Jira

}
Comment on lines +119 to +122

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robustness Medium

Shutdown() calls close(uc.stop) without any guard against a second call. Closing an already-closed channel panics in Go. If Shutdown() is ever called twice: e.g., via a deferred call and an explicit call in an error path, or in tests: the process crashes. sync.Once is the idiomatic fix; requires adding a stopOnce sync.Once field to UpgradeChecker.

Suggested fix
// Shutdown stops the upgrade checker loop. Safe to call more than once.
func (uc *UpgradeChecker) Shutdown() {
	uc.stopOnce.Do(func() { close(uc.stop) })
	<-uc.done
}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert bash developer with deep knowledge of security, performance, and best practices.

### Context

File: core/application/upgrade_checker.go
Lines: 119-122
Issue Type: robustness-medium
Severity: medium

Issue Description:
`Shutdown()` calls `close(uc.stop)` without any guard against a second call. Closing an already-closed channel panics in Go. If `Shutdown()` is ever called twice: e.g., via a deferred call and an explicit call in an error path, or in tests: the process crashes. `sync.Once` is the idiomatic fix; requires adding a `stopOnce sync.Once` field to `UpgradeChecker`.

Current Code:
// Shutdown stops the upgrade checker loop.
func (uc *UpgradeChecker) Shutdown() {
	close(uc.stop)
	<-uc.done
}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow bash best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue


// TriggerCheck forces an immediate upgrade check on this instance.
func (uc *UpgradeChecker) TriggerCheck() {
select {
case uc.triggerCh <- struct{}{}:
default:
// Already triggered, skip
}
}

// GetAvailableUpgrades returns the cached upgrade check results.
func (uc *UpgradeChecker) GetAvailableUpgrades() map[string]gallery.UpgradeInfo {
uc.mu.RLock()
defer uc.mu.RUnlock()

// Return a copy to avoid races
result := make(map[string]gallery.UpgradeInfo, len(uc.lastUpgrades))
for k, v := range uc.lastUpgrades {
result[k] = v
}
return result
}

func (uc *UpgradeChecker) runCheck(ctx context.Context) {
upgrades, err := gallery.CheckBackendUpgrades(ctx, uc.galleries, uc.systemState)

uc.mu.Lock()
uc.lastCheckTime = time.Now()
if err != nil {
xlog.Debug("Backend upgrade check failed", "error", err)
uc.mu.Unlock()
return
}
uc.lastUpgrades = upgrades
uc.mu.Unlock()

if len(upgrades) == 0 {
xlog.Debug("All backends up to date")
return
}

// Log available upgrades
for name, info := range upgrades {
if info.AvailableVersion != "" {
xlog.Info("Backend upgrade available",
"backend", name,
"installed", info.InstalledVersion,
Comment on lines +150 to +169

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Auto-upgrade bypasses advisory lock on initial check in distributed mode

runCheck is called unconditionally before the advisory-lock loop starts (line 151). Because runCheck also executes UpgradeBackend for each detected upgrade when AutoUpgradeBackends=true, every frontend replica will simultaneously attempt to upgrade the same backends on startup — the advisory lock that is meant to prevent exactly this race does not protect the initial check. Concurrent UpgradeBackend calls on the same backend can corrupt the directory (two instances racing through the backendPath → backupPath → tmpPath → backendPath rename sequence on the same paths).

The same issue applies to TriggerCheck-initiated calls (line 168): on-demand checks run on every instance and also execute auto-upgrades without holding the advisory lock.

"available", info.AvailableVersion)
} else {
xlog.Info("Backend upgrade available (new build)",
"backend", name)
}
}

// Auto-upgrade if enabled
if uc.appConfig.AutoUpgradeBackends {
for name, info := range upgrades {
xlog.Info("Auto-upgrading backend", "backend", name,
"from", info.InstalledVersion, "to", info.AvailableVersion)
if err := gallery.UpgradeBackend(ctx, uc.systemState, uc.modelLoader,
uc.galleries, name, nil); err != nil {
xlog.Error("Failed to auto-upgrade backend",
"backend", name, "error", err)
} else {
xlog.Info("Backend upgraded successfully", "backend", name,
"version", info.AvailableVersion)
}
}
// Re-check to update cache after upgrades
if freshUpgrades, err := gallery.CheckBackendUpgrades(ctx, uc.galleries, uc.systemState); err == nil {
uc.mu.Lock()
uc.lastUpgrades = freshUpgrades
uc.mu.Unlock()
}
Comment on lines +193 to +196

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Shutdown panics on double-close

Shutdown calls close(uc.stop) with no guard. Calling it a second time, or calling it after the context is already cancelled (which causes Run to return and close uc.done), will panic with "close of closed channel". A sync.Once around the close would make this safe.

}
}
100 changes: 98 additions & 2 deletions core/cli/backends.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,17 @@ type BackendsUninstall struct {
BackendsCMDFlags `embed:""`
}

type BackendsUpgrade struct {
BackendArgs []string `arg:"" optional:"" name:"backends" help:"Backend names to upgrade (empty = upgrade all)"`

BackendsCMDFlags `embed:""`
}

type BackendsCMD struct {
List BackendsList `cmd:"" help:"List the backends available in your galleries" default:"withargs"`
Install BackendsInstall `cmd:"" help:"Install a backend from the gallery"`
Uninstall BackendsUninstall `cmd:"" help:"Uninstall a backend"`
Upgrade BackendsUpgrade `cmd:"" help:"Upgrade backends to latest versions"`
}

func (bl *BackendsList) Run(ctx *cliContext.Context) error {
Expand All @@ -64,11 +71,27 @@ func (bl *BackendsList) Run(ctx *cliContext.Context) error {
if err != nil {
return err
}

// Check for upgrades
upgrades, _ := gallery.CheckBackendUpgrades(context.Background(), galleries, systemState)
Comment on lines +75 to +76

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robustness High

The error from CheckBackendUpgrades is discarded here, so return or surface it instead of silently hiding upgrade-check failures.

Suggested fix
	// Check for upgrades
	upgrades, err := gallery.CheckBackendUpgrades(context.Background(), galleries, systemState)
	if err != nil {
		return err
	}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/cli/backends.go
Lines: 75-76
Issue Type: robustness-high
Severity: high

Issue Description:
The error from CheckBackendUpgrades is discarded here, so return or surface it instead of silently hiding upgrade-check failures.

Current Code:
	// Check for upgrades
	upgrades, _ := gallery.CheckBackendUpgrades(context.Background(), galleries, systemState)

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue

Comment on lines +75 to +76

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robustness High

The error from CheckBackendUpgrades is discarded, so listing can silently hide upgrade-check failures; return or surface the error instead of ignoring it.

Suggested fix
	// Check for upgrades
	upgrades, err := gallery.CheckBackendUpgrades(context.Background(), galleries, systemState)
	if err != nil {
		return fmt.Errorf("failed to check backend upgrades: %w", err)
	}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/cli/backends.go
Lines: 75-76
Issue Type: robustness-high
Severity: high

Issue Description:
The error from CheckBackendUpgrades is discarded, so listing can silently hide upgrade-check failures; return or surface the error instead of ignoring it.

Current Code:
	// Check for upgrades
	upgrades, _ := gallery.CheckBackendUpgrades(context.Background(), galleries, systemState)

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue Jira


for _, backend := range backends {
versionStr := ""
if backend.Version != "" {
versionStr = " v" + backend.Version
}
if backend.Installed {
fmt.Printf(" * %s@%s (installed)\n", backend.Gallery.Name, backend.Name)
if info, ok := upgrades[backend.Name]; ok {
upgradeStr := info.AvailableVersion
if upgradeStr == "" {
upgradeStr = "new build"
}
fmt.Printf(" * %s@%s%s (installed, upgrade available: %s)\n", backend.Gallery.Name, backend.Name, versionStr, upgradeStr)
} else {
fmt.Printf(" * %s@%s%s (installed)\n", backend.Gallery.Name, backend.Name, versionStr)
}
} else {
fmt.Printf(" - %s@%s\n", backend.Gallery.Name, backend.Name)
fmt.Printf(" - %s@%s%s\n", backend.Gallery.Name, backend.Name, versionStr)
}
}
return nil
Expand Down Expand Up @@ -111,6 +134,79 @@ func (bi *BackendsInstall) Run(ctx *cliContext.Context) error {
return nil
}

func (bu *BackendsUpgrade) Run(ctx *cliContext.Context) error {
var galleries []config.Gallery
if err := json.Unmarshal([]byte(bu.BackendGalleries), &galleries); err != nil {
xlog.Error("unable to load galleries", "error", err)
}
Comment on lines +139 to +141

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional High

JSON unmarshal error is logged and silently swallowed in BackendsUpgrade.Run. Execution continues with an empty galleries slice, causing CheckBackendUpgrades to return zero results. The command then prints "All backends are up to date." even when the gallery configuration is malformed: the user gets a misleading success message instead of an actionable error. The identical pattern in the pre-existing BackendsList.Run and BackendsInstall.Run should also be fixed, but this is a new, user-facing subcommand that makes the silent failure especially harmful.

Suggested fix
	if err := json.Unmarshal([]byte(bu.BackendGalleries), &galleries); err != nil {
		return fmt.Errorf("unable to load galleries: %w", err)
	}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert bash developer with deep knowledge of security, performance, and best practices.

### Context

File: core/cli/backends.go
Lines: 139-141
Issue Type: functional-high
Severity: high

Issue Description:
JSON unmarshal error is logged and silently swallowed in `BackendsUpgrade.Run`. Execution continues with an empty `galleries` slice, causing `CheckBackendUpgrades` to return zero results. The command then prints "All backends are up to date." even when the gallery configuration is malformed: the user gets a misleading success message instead of an actionable error. The identical pattern in the pre-existing `BackendsList.Run` and `BackendsInstall.Run` should also be fixed, but this is a new, user-facing subcommand that makes the silent failure especially harmful.

Current Code:
	if err := json.Unmarshal([]byte(bu.BackendGalleries), &galleries); err != nil {
		xlog.Error("unable to load galleries", "error", err)
	}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow bash best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue


systemState, err := system.GetSystemState(
system.WithBackendSystemPath(bu.BackendsSystemPath),
system.WithBackendPath(bu.BackendsPath),
)
if err != nil {
return err
}

upgrades, err := gallery.CheckBackendUpgrades(context.Background(), galleries, systemState)
if err != nil {
return fmt.Errorf("failed to check for upgrades: %w", err)
}

if len(upgrades) == 0 {
fmt.Println("All backends are up to date.")
return nil
}

// Filter to specified backends if args given
toUpgrade := upgrades
if len(bu.BackendArgs) > 0 {
toUpgrade = make(map[string]gallery.UpgradeInfo)
for _, name := range bu.BackendArgs {
if info, ok := upgrades[name]; ok {
toUpgrade[name] = info
} else {
fmt.Printf("Backend %s: no upgrade available\n", name)
}
}
}

if len(toUpgrade) == 0 {
fmt.Println("No upgrades to apply.")
return nil
}

modelLoader := model.NewModelLoader(systemState)
for name, info := range toUpgrade {
versionStr := ""
if info.AvailableVersion != "" {
versionStr = " to v" + info.AvailableVersion
}
fmt.Printf("Upgrading %s%s...\n", name, versionStr)

progressBar := progressbar.NewOptions(
1000,
progressbar.OptionSetDescription(fmt.Sprintf("downloading %s", name)),
progressbar.OptionShowBytes(false),
progressbar.OptionClearOnFinish(),
)
progressCallback := func(fileName string, current string, total string, percentage float64) {
v := int(percentage * 10)
if err := progressBar.Set(v); err != nil {
xlog.Error("error updating progress bar", "error", err)
}
}

if err := gallery.UpgradeBackend(context.Background(), systemState, modelLoader, galleries, name, progressCallback); err != nil {
fmt.Printf("Failed to upgrade %s: %v\n", name, err)
} else {
fmt.Printf("Backend %s upgraded successfully\n", name)
}
}
Comment on lines +180 to +205

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional Medium

The upgrade command returns success even when one or more backend upgrades fail, so accumulate failures and return a non-nil error at the end.

Suggested fix
	var upgradeErrs []error
	for name, info := range toUpgrade {
		versionStr := ""
		if info.AvailableVersion != "" {
			versionStr = " to v" + info.AvailableVersion
		}
		fmt.Printf("Upgrading %s%s...\n", name, versionStr)

		progressBar := progressbar.NewOptions(
			1000,
			progressbar.OptionSetDescription(fmt.Sprintf("downloading %s", name)),
			progressbar.OptionShowBytes(false),
			progressbar.OptionClearOnFinish(),
		)
		progressCallback := func(fileName string, current string, total string, percentage float64) {
			v := int(percentage * 10)
			if err := progressBar.Set(v); err != nil {
				xlog.Error("error updating progress bar", "error", err)
			}
		}

		if err := gallery.UpgradeBackend(context.Background(), systemState, modelLoader, galleries, name, progressCallback); err != nil {
			fmt.Printf("Failed to upgrade %s: %v\n", name, err)
			upgradeErrs = append(upgradeErrs, fmt.Errorf("%s: %w", name, err))
		} else {
			fmt.Printf("Backend %s upgraded successfully\n", name)
		}
	}
	if len(upgradeErrs) > 0 {
		return errors.Join(upgradeErrs...)
	}
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert go developer with deep knowledge of security, performance, and best practices.

### Context

File: core/cli/backends.go
Lines: 180-205
Issue Type: functional-medium
Severity: medium

Issue Description:
The upgrade command returns success even when one or more backend upgrades fail, so accumulate failures and return a non-nil error at the end.

Current Code:
	for name, info := range toUpgrade {
		versionStr := ""
		if info.AvailableVersion != "" {
			versionStr = " to v" + info.AvailableVersion
		}
		fmt.Printf("Upgrading %s%s...\n", name, versionStr)

		progressBar := progressbar.NewOptions(
			1000,
			progressbar.OptionSetDescription(fmt.Sprintf("downloading %s", name)),
			progressbar.OptionShowBytes(false),
			progressbar.OptionClearOnFinish(),
		)
		progressCallback := func(fileName string, current string, total string, percentage float64) {
			v := int(percentage * 10)
			if err := progressBar.Set(v); err != nil {
				xlog.Error("error updating progress bar", "error", err)
			}
		}

		if err := gallery.UpgradeBackend(context.Background(), systemState, modelLoader, galleries, name, progressCallback); err != nil {
			fmt.Printf("Failed to upgrade %s: %v\n", name, err)
		} else {
			fmt.Printf("Backend %s upgraded successfully\n", name)
		}
	}

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow go best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue


return nil
}

func (bu *BackendsUninstall) Run(ctx *cliContext.Context) error {
for _, backendName := range bu.BackendArgs {
xlog.Info("uninstalling backend", "backend", backendName)
Expand Down
Loading