chore(llm): swap Ollama model qwen3:8b → gemma4:e4b + bump container memory#232
Merged
Conversation
Updates the default local LLM across appsettings, source defaults, docker- compose env, and documentation. Same ILlmService interface, no API changes. Why: - Released May 2026 by Google; gemma4:e4b is the multimodal "effective 4B" variant (text + vision + audio capable), competitive with qwen3:8b on the distractor/hint/metadata generation tasks we use it for. - Eligible for Dev.to's Gemma 4 Challenge submission. Container changes: - Pin ollama image to ollama/ollama:0.23.1. The floating `latest` tag was still 0.22.x in registry, which does not recognise the `gemma4` family (`ollama pull` returns "please download the latest version"). 0.23.1 is the first stable that ships gemma4 support. - Bump ollama memory: 4G/2G → 12G/8G. gemma4:e4b needs ~9.8 GiB RAM to load weights + KV cache. Server has 31 GB total RAM so headroom is fine. Rollback: set Ollama__Model=qwen3:8b in env or revert this commit. Files changed: - backend/src/Api/Program.cs (default fallback) - backend/src/Application/LLM/OllamaLlmService.cs (default fallback) - backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs (default) - backend/src/Api/appsettings.json (config) - backend/src/Worker/appsettings.json (config) - docker-compose.yml (env + image pin + memory bump) - README.md, CLAUDE.md, docs/04-dev/llm-provider-swap.md, docs/ux-roadmap/17-ai-auto-tags.md, PLAN-elevenreader-parity.md (current docs) - docs/02-system/database.md, docs/01-architecture/README.md, docs/05-features/vocabulary-srs.md (cleanup of stale gemma3:4b refs) - TODO.md (drop "llama3, mistral, qwen, phi" speculation entry) - CHANGELOG.md (Unreleased entry) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Swap the local LLM model used for distractor / hint / metadata / tag generation from
qwen3:8btogemma4:e4b(Google Gemma 4 effective-4B, multimodal — text + vision + audio). SameILlmServiceinterface, no API changes.What's in the PR
Program.cs,OllamaLlmService.cs,VocabularyOptions.cs.Api/appsettings.json,Worker/appsettings.json,docker-compose.yml.gemma3:4bcleanup indocs/02-system/database.md,docs/01-architecture/README.md,docs/05-features/vocabulary-srs.md.### Changedentry under[Unreleased].Container changes (this is the load-bearing part)
ollama/ollama:0.23.1. The floatinglatesttag was still serving 0.22.x in the registry, which does not recognise thegemma4family —ollama pull gemma4:e4breturns "please download the latest version". 0.23.1 (released 2026-05-05) is the first stable that ships gemma4 support.4G/2G → 12G/8G.gemma4:e4bneeds ~9.8 GiB RAM to load weights + KV cache (multimodal architecture is bigger than the prompt-author's claim suggested). Server has 31 GB total RAM so the headroom is fine.Verification done locally
dotnet build textstack.sln— green, 0 warn / 0 err.dotnet test tests/TextStack.UnitTests— 216/216.dotnet test tests/TextStack.Extraction.Tests— 169/169.dotnet test tests/TextStack.Search.Tests— 20/20.dotnet test tests/TextStack.IntegrationTests(vs running API) — 136/136.pnpm -C apps/web test— 419/419 (Vitest, 50 files).pnpm -C apps/web test:e2e --project=chromium tests/smoke.spec.ts— 5/5.apps/admin+apps/mobiletsc --noEmit— clean.Local Ollama JSON smoke could not run on this Mac (Docker VM has only 7.7 GiB;
gemma4:e4bneeds 9.8 GiB to load). Smoke will run on prod after merge — see post-deploy section below.Post-merge / post-deploy actions (I will do these from SSH after CI deploy)
docker compose pull ollama— fetch the pinned0.23.1.docker compose up -d --force-recreate ollama— pick up the new image + memory limits.docker compose exec ollama ollama pull gemma4:e4b— first-time pull (~10 min, 9.6 GB).ollama run gemma4:e4bwith the distractor prompt — verify it returns a clean JSON array (not markdown-fenced).docker compose restart api worker— reconnect to fresh Ollama.Note: prod's Ollama volume is currently empty (0 models installed).
qwen3:8bwas set as the model name but never pulled — distractor generation has been silently falling back to random-word generation since deploy. This PR plus the post-deploy steps are also the moment we make LLM-backed features actually work in production.Rollback
If
gemma4:e4bproduces broken output in prod:Or revert this commit and redeploy.
🤖 Generated with Claude Code