chore(llm): swap Ollama model qwen3:8b → gemma4:e4b + bump container memory by mrviduus · Pull Request #232 · mrviduus/textstack

mrviduus · 2026-05-07T22:46:39Z

Summary

Swap the local LLM model used for distractor / hint / metadata / tag generation from qwen3:8b to gemma4:e4b (Google Gemma 4 effective-4B, multimodal — text + vision + audio). Same ILlmService interface, no API changes.

What's in the PR

3 source defaults: Program.cs, OllamaLlmService.cs, VocabularyOptions.cs.
3 config files: Api/appsettings.json, Worker/appsettings.json, docker-compose.yml.
9 doc updates: README, CLAUDE.md, llm-provider-swap, 17-ai-auto-tags, PLAN-elevenreader-parity, plus stale gemma3:4b cleanup in docs/02-system/database.md, docs/01-architecture/README.md, docs/05-features/vocabulary-srs.md.
TODO.md: drop "llama3, mistral, qwen, phi" speculation entry — choice is made.
CHANGELOG.md: ### Changed entry under [Unreleased].

Container changes (this is the load-bearing part)

Pin ollama image to ollama/ollama:0.23.1. The floating latest tag was still serving 0.22.x in the registry, which does not recognise the gemma4 family — ollama pull gemma4:e4b returns "please download the latest version". 0.23.1 (released 2026-05-05) is the first stable that ships gemma4 support.
Memory limits raised: 4G/2G → 12G/8G. gemma4:e4b needs ~9.8 GiB RAM to load weights + KV cache (multimodal architecture is bigger than the prompt-author's claim suggested). Server has 31 GB total RAM so the headroom is fine.

Verification done locally

dotnet build textstack.sln — green, 0 warn / 0 err.
dotnet test tests/TextStack.UnitTests — 216/216.
dotnet test tests/TextStack.Extraction.Tests — 169/169.
dotnet test tests/TextStack.Search.Tests — 20/20.
dotnet test tests/TextStack.IntegrationTests (vs running API) — 136/136.
pnpm -C apps/web test — 419/419 (Vitest, 50 files).
pnpm -C apps/web test:e2e --project=chromium tests/smoke.spec.ts — 5/5.
apps/admin + apps/mobile tsc --noEmit — clean.
Total: 965 tests passed.

Local Ollama JSON smoke could not run on this Mac (Docker VM has only 7.7 GiB; gemma4:e4b needs 9.8 GiB to load). Smoke will run on prod after merge — see post-deploy section below.

Post-merge / post-deploy actions (I will do these from SSH after CI deploy)

docker compose pull ollama — fetch the pinned 0.23.1.
docker compose up -d --force-recreate ollama — pick up the new image + memory limits.
docker compose exec ollama ollama pull gemma4:e4b — first-time pull (~10 min, 9.6 GB).
Smoke test: ollama run gemma4:e4b with the distractor prompt — verify it returns a clean JSON array (not markdown-fenced).
docker compose restart api worker — reconnect to fresh Ollama.
Save a vocabulary word via UI → confirm 5 distractors + hint + explanation appear in the DB.

Note: prod's Ollama volume is currently empty (0 models installed). qwen3:8b was set as the model name but never pulled — distractor generation has been silently falling back to random-word generation since deploy. This PR plus the post-deploy steps are also the moment we make LLM-backed features actually work in production.

Rollback

If gemma4:e4b produces broken output in prod:

ssh asus
cd ~/projects/onlinelib/textstack
echo 'Ollama__Model=qwen3:8b' >> .env
docker compose exec ollama ollama pull qwen3:8b   # ~5 min, 5 GB
docker compose restart api worker

Or revert this commit and redeploy.

🤖 Generated with Claude Code

Updates the default local LLM across appsettings, source defaults, docker- compose env, and documentation. Same ILlmService interface, no API changes. Why: - Released May 2026 by Google; gemma4:e4b is the multimodal "effective 4B" variant (text + vision + audio capable), competitive with qwen3:8b on the distractor/hint/metadata generation tasks we use it for. - Eligible for Dev.to's Gemma 4 Challenge submission. Container changes: - Pin ollama image to ollama/ollama:0.23.1. The floating `latest` tag was still 0.22.x in registry, which does not recognise the `gemma4` family (`ollama pull` returns "please download the latest version"). 0.23.1 is the first stable that ships gemma4 support. - Bump ollama memory: 4G/2G → 12G/8G. gemma4:e4b needs ~9.8 GiB RAM to load weights + KV cache. Server has 31 GB total RAM so headroom is fine. Rollback: set Ollama__Model=qwen3:8b in env or revert this commit. Files changed: - backend/src/Api/Program.cs (default fallback) - backend/src/Application/LLM/OllamaLlmService.cs (default fallback) - backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs (default) - backend/src/Api/appsettings.json (config) - backend/src/Worker/appsettings.json (config) - docker-compose.yml (env + image pin + memory bump) - README.md, CLAUDE.md, docs/04-dev/llm-provider-swap.md, docs/ux-roadmap/17-ai-auto-tags.md, PLAN-elevenreader-parity.md (current docs) - docs/02-system/database.md, docs/01-architecture/README.md, docs/05-features/vocabulary-srs.md (cleanup of stale gemma3:4b refs) - TODO.md (drop "llama3, mistral, qwen, phi" speculation entry) - CHANGELOG.md (Unreleased entry) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mrviduus merged commit 39a72a6 into main May 7, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(llm): swap Ollama model qwen3:8b → gemma4:e4b + bump container memory#232

chore(llm): swap Ollama model qwen3:8b → gemma4:e4b + bump container memory#232
mrviduus merged 1 commit into
mainfrom
chore/swap-ollama-model-to-gemma4

mrviduus commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrviduus commented May 7, 2026

Summary

What's in the PR

Container changes (this is the load-bearing part)

Verification done locally

Post-merge / post-deploy actions (I will do these from SSH after CI deploy)

Rollback

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant