From c94d4cc024327d80e20d3a0a821b8e384db4e328 Mon Sep 17 00:00:00 2001 From: Vasyl Vdovychenko Date: Thu, 7 May 2026 20:26:11 -0400 Subject: [PATCH] =?UTF-8?q?fix(ollama):=20keep=5Falive=3D-1=20+=20bump=20a?= =?UTF-8?q?pi=20timeout=2010s=E2=86=9230s?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two reliability fixes for the gemma4:e4b setup on prod. 1. OLLAMA_KEEP_ALIVE=-1 in compose env Default keep_alive is 5min. After idle, model unloads. Reload from cold takes ~50-60s on CPU + Ollama's pre-load memory check goes conservative (says 9.8 GiB needed but only 8.2 GiB available, even though host has 27 GiB free) — both make the API's fire-and-forget distractor call fail intermittently. -1 keeps model resident as long as the container is up, so cold-load only happens on container restart (deploy). 2. Api/appsettings.json Ollama:TimeoutSeconds 10s → 30s Worker is already 30s. API was 10s — works for warm gemma4 (measured 2.8s for distractor inference) but leaves no margin and breaks on cold-load. Aligning to 30s removes the asymmetry. Verified on prod (after manual `ollama run --keepalive=24h`): - ollama ps: gemma4:e4b loaded, UNTIL=24h - distractor inference: 2.8s for 5 single-word answers - free -h: 13 GiB used (was 3 GiB) — model is in RAM Post-deploy step: `docker compose exec ollama ollama run gemma4:e4b ""` once to trigger first load. Then it stays via keep_alive=-1. Co-Authored-By: Claude Opus 4.7 (1M context) --- backend/src/Api/appsettings.json | 2 +- docker-compose.yml | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/backend/src/Api/appsettings.json b/backend/src/Api/appsettings.json index aec4ad10..289ae7f6 100644 --- a/backend/src/Api/appsettings.json +++ b/backend/src/Api/appsettings.json @@ -45,7 +45,7 @@ "Ollama": { "BaseUrl": "http://localhost:11434", "Model": "gemma4:e4b", - "TimeoutSeconds": 10 + "TimeoutSeconds": 30 }, "LLM": { "DefaultProvider": "openai", diff --git a/docker-compose.yml b/docker-compose.yml index 51d54bb1..b6482032 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -232,6 +232,8 @@ services: container_name: textstack_ollama volumes: - ./data/ollama:/root/.ollama + environment: + OLLAMA_KEEP_ALIVE: "-1" restart: always healthcheck: test: ["CMD-SHELL", "ollama list >/dev/null 2>&1 || exit 1"]