Skip to content

chore(llm): swap Ollama model qwen3:8b → gemma4:e4b + bump container memory#232

Merged
mrviduus merged 1 commit into
mainfrom
chore/swap-ollama-model-to-gemma4
May 7, 2026
Merged

chore(llm): swap Ollama model qwen3:8b → gemma4:e4b + bump container memory#232
mrviduus merged 1 commit into
mainfrom
chore/swap-ollama-model-to-gemma4

Conversation

@mrviduus
Copy link
Copy Markdown
Owner

@mrviduus mrviduus commented May 7, 2026

Summary

Swap the local LLM model used for distractor / hint / metadata / tag generation from qwen3:8b to gemma4:e4b (Google Gemma 4 effective-4B, multimodal — text + vision + audio). Same ILlmService interface, no API changes.

What's in the PR

  • 3 source defaults: Program.cs, OllamaLlmService.cs, VocabularyOptions.cs.
  • 3 config files: Api/appsettings.json, Worker/appsettings.json, docker-compose.yml.
  • 9 doc updates: README, CLAUDE.md, llm-provider-swap, 17-ai-auto-tags, PLAN-elevenreader-parity, plus stale gemma3:4b cleanup in docs/02-system/database.md, docs/01-architecture/README.md, docs/05-features/vocabulary-srs.md.
  • TODO.md: drop "llama3, mistral, qwen, phi" speculation entry — choice is made.
  • CHANGELOG.md: ### Changed entry under [Unreleased].

Container changes (this is the load-bearing part)

  1. Pin ollama image to ollama/ollama:0.23.1. The floating latest tag was still serving 0.22.x in the registry, which does not recognise the gemma4 family — ollama pull gemma4:e4b returns "please download the latest version". 0.23.1 (released 2026-05-05) is the first stable that ships gemma4 support.
  2. Memory limits raised: 4G/2G → 12G/8G. gemma4:e4b needs ~9.8 GiB RAM to load weights + KV cache (multimodal architecture is bigger than the prompt-author's claim suggested). Server has 31 GB total RAM so the headroom is fine.

Verification done locally

  • dotnet build textstack.sln — green, 0 warn / 0 err.
  • dotnet test tests/TextStack.UnitTests — 216/216.
  • dotnet test tests/TextStack.Extraction.Tests — 169/169.
  • dotnet test tests/TextStack.Search.Tests — 20/20.
  • dotnet test tests/TextStack.IntegrationTests (vs running API) — 136/136.
  • pnpm -C apps/web test — 419/419 (Vitest, 50 files).
  • pnpm -C apps/web test:e2e --project=chromium tests/smoke.spec.ts — 5/5.
  • apps/admin + apps/mobile tsc --noEmit — clean.
  • Total: 965 tests passed.

Local Ollama JSON smoke could not run on this Mac (Docker VM has only 7.7 GiB; gemma4:e4b needs 9.8 GiB to load). Smoke will run on prod after merge — see post-deploy section below.

Post-merge / post-deploy actions (I will do these from SSH after CI deploy)

  1. docker compose pull ollama — fetch the pinned 0.23.1.
  2. docker compose up -d --force-recreate ollama — pick up the new image + memory limits.
  3. docker compose exec ollama ollama pull gemma4:e4b — first-time pull (~10 min, 9.6 GB).
  4. Smoke test: ollama run gemma4:e4b with the distractor prompt — verify it returns a clean JSON array (not markdown-fenced).
  5. docker compose restart api worker — reconnect to fresh Ollama.
  6. Save a vocabulary word via UI → confirm 5 distractors + hint + explanation appear in the DB.

Note: prod's Ollama volume is currently empty (0 models installed). qwen3:8b was set as the model name but never pulled — distractor generation has been silently falling back to random-word generation since deploy. This PR plus the post-deploy steps are also the moment we make LLM-backed features actually work in production.

Rollback

If gemma4:e4b produces broken output in prod:

ssh asus
cd ~/projects/onlinelib/textstack
echo 'Ollama__Model=qwen3:8b' >> .env
docker compose exec ollama ollama pull qwen3:8b   # ~5 min, 5 GB
docker compose restart api worker

Or revert this commit and redeploy.

🤖 Generated with Claude Code

Updates the default local LLM across appsettings, source defaults, docker-
compose env, and documentation. Same ILlmService interface, no API changes.

Why:
- Released May 2026 by Google; gemma4:e4b is the multimodal "effective 4B"
  variant (text + vision + audio capable), competitive with qwen3:8b on
  the distractor/hint/metadata generation tasks we use it for.
- Eligible for Dev.to's Gemma 4 Challenge submission.

Container changes:
- Pin ollama image to ollama/ollama:0.23.1. The floating `latest` tag was
  still 0.22.x in registry, which does not recognise the `gemma4` family
  (`ollama pull` returns "please download the latest version"). 0.23.1 is
  the first stable that ships gemma4 support.
- Bump ollama memory: 4G/2G → 12G/8G. gemma4:e4b needs ~9.8 GiB RAM to
  load weights + KV cache. Server has 31 GB total RAM so headroom is fine.

Rollback: set Ollama__Model=qwen3:8b in env or revert this commit.

Files changed:
- backend/src/Api/Program.cs (default fallback)
- backend/src/Application/LLM/OllamaLlmService.cs (default fallback)
- backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs (default)
- backend/src/Api/appsettings.json (config)
- backend/src/Worker/appsettings.json (config)
- docker-compose.yml (env + image pin + memory bump)
- README.md, CLAUDE.md, docs/04-dev/llm-provider-swap.md,
  docs/ux-roadmap/17-ai-auto-tags.md, PLAN-elevenreader-parity.md (current docs)
- docs/02-system/database.md, docs/01-architecture/README.md,
  docs/05-features/vocabulary-srs.md (cleanup of stale gemma3:4b refs)
- TODO.md (drop "llama3, mistral, qwen, phi" speculation entry)
- CHANGELOG.md (Unreleased entry)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mrviduus mrviduus merged commit 39a72a6 into main May 7, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant