mrviduus · mrviduus · May 7, 2026 · May 7, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,17 @@
 
 ## [Unreleased]
 
+### Changed
+- **Local LLM model**: switched from `qwen3:8b` to `gemma4:e4b` (Google's
+  Gemma 4 effective-4B, multimodal — text + vision + audio capable). Same
+  `ILlmService` interface, no API changes.
+- **Ollama container**: image pinned to `ollama/ollama:0.23.1` (the floating
+  `latest` tag was still serving 0.22.x which doesn't recognise the
+  `gemma4` family). Memory limits raised from 4G/2G to 12G/8G — `gemma4:e4b`
+  needs ~9.8 GiB RAM to load weights + KV cache. Server has 31 GB total so
+  the headroom is plenty.
+- To roll back: set `Ollama__Model=qwen3:8b` env var or revert this commit.
+
 ## [v0.1.0] — 2026-05-06
 
 ### Headline

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -199,7 +199,7 @@ Upload EPUB/PDF/FB2 → BookFile (stored) → IngestionJob (queued)
 - **Review entity**: `VocabularyReview` — tracks each answer (isCorrect, responseTimeMs, reviewMode)
 - **5 SRS stages**: New(0) → Recognition(1) → Recall(2) → Context(3) → Mastered(4). Logic in `Application/Vocabulary/SrsEngine.cs`
 - **2 review modes**: `multiple_choice` (all stages, Blitz + Context Cloze), `classic` (flashcard with self-assessment). Typing mode removed.
-- **MC distractors + hint + explanation**: Ollama LLM (`qwen3:8b`) generates 5 distractors + hint + 2-3 sentence explanation (in native language) per word at save time. Stored in `Distractors` (JSON), `Hint` (varchar 500), `Explanation` (varchar 1000). Fallback: random words from user's vocab pool + hardcoded list. Generator: `Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs`
+- **MC distractors + hint + explanation**: Ollama LLM (`gemma4:e4b`) generates 5 distractors + hint + 2-3 sentence explanation (in native language) per word at save time. Stored in `Distractors` (JSON), `Hint` (varchar 500), `Explanation` (varchar 1000). Fallback: random words from user's vocab pool + hardcoded list. Generator: `Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs`
 - **Ollama**: Docker service (`ollama/ollama`), config: `Ollama:BaseUrl`, `Ollama:Model`, `Ollama:TimeoutSeconds` (default 30s). Fire-and-forget generation via `IServiceScopeFactory` after word save
 - **MC fallback cascade**: definition → translation → blank sentence (if LLM distractors exist) → downgrade to context/typed_recall
 - **Frontend**: `VocabularyPage.tsx` (word list, filters, search, stats), `VocabularyReviewPage.tsx` (review session), components in `components/vocabulary/`

diff --git a/PLAN-elevenreader-parity.md b/PLAN-elevenreader-parity.md
@@ -25,7 +25,7 @@ This unblocks everything downstream — copy, pricing, feature priorities all fl
 ## What exists today
 
 **Reader**: tap-to-translate (18+ languages), dictionary popup, highlights, bookmarks, vocab marks
-**SRS**: 5-stage spaced repetition, multiple choice + context cloze, LLM-generated distractors (Ollama qwen3)
+**SRS**: 5-stage spaced repetition, multiple choice + context cloze, LLM-generated distractors (Ollama gemma4:e4b)
 **Catalog**: 146 books, 47 authors, 20 genres, SSG-prerendered SEO pages
 **User uploads**: EPUB + PDF + FB2 → extraction → chapters + metadata enrichment (Ollama)
 **Stats**: sessions, streaks, 20 achievements, heatmap, goals

diff --git a/README.md b/README.md
@@ -94,7 +94,7 @@ out; only technical vocabulary gets surfaced.
 - Auto-added while reading — sentence context, definition, translation
 - 5 stages (New → Recognition → Recall → Context → Mastered)
 - Capped weekly queue + LLM-generated distractors and hints (Ollama
-  `qwen3:8b`)
+  `gemma4:e4b`)
 - Review modes: multiple choice, classic flashcard
 
 **Library**
@@ -121,7 +121,7 @@ out; only technical vocabulary gets surfaced.
 | Search | PostgreSQL FTS |
 | Web | React 19, Vite, pnpm |
 | Mobile | React Native (Expo 55) |
-| LLM | OpenAI `gpt-5-mini` (explanations + translation) + Ollama `qwen3:8b` (distractors, local) |
+| LLM | OpenAI `gpt-5-mini` (explanations + translation) + Ollama `gemma4:e4b` (distractors, local) |
 | TTS | Edge TTS (WebSocket, no API key) |
 | SSG | Puppeteer prerender, nginx serves static first |
 | Telemetry | OpenTelemetry → Aspire Dashboard |

diff --git a/TODO.md b/TODO.md
@@ -29,11 +29,10 @@
 - Review queue view
 
 ### Open Questions (TBD)
-1. **Ollama model?** - llama3, mistral, qwen, phi? Need to test quality vs speed tradeoff
-2. **Prompt templates storage?** - Options: DB (editable at runtime), config file (version controlled), hardcoded (simplest). Decision depends on how often prompts need tuning
-3. **Rate limiting?** - How many ms/sec between LLM calls? Depends on Ollama performance on server hardware
-4. **Error handling?** - Retry count? Exponential backoff? Mark as failed and skip?
-5. **Rollback storage?** - New `Edition.original_description` field or separate `edition_history` table?
+1. **Prompt templates storage?** - Options: DB (editable at runtime), config file (version controlled), hardcoded (simplest). Decision depends on how often prompts need tuning
+2. **Rate limiting?** - How many ms/sec between LLM calls? Depends on Ollama performance on server hardware
+3. **Error handling?** - Retry count? Exponential backoff? Mark as failed and skip?
+4. **Rollback storage?** - New `Edition.original_description` field or separate `edition_history` table?
 
 ---
 

diff --git a/backend/src/Api/Program.cs b/backend/src/Api/Program.cs
@@ -101,7 +101,7 @@
 builder.Services.AddTextStackVocabulary(options =>
 {
     options.OllamaBaseUrl = builder.Configuration["Ollama:BaseUrl"] ?? "http://localhost:11434";
-    options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "qwen3:8b";
+    options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "gemma4:e4b";
     options.OllamaTimeoutSeconds = builder.Configuration.GetValue("Ollama:TimeoutSeconds", 30);
 });
 

diff --git a/backend/src/Api/appsettings.json b/backend/src/Api/appsettings.json
@@ -44,7 +44,7 @@
   },
   "Ollama": {
     "BaseUrl": "http://localhost:11434",
-    "Model": "qwen3:8b",
+    "Model": "gemma4:e4b",
     "TimeoutSeconds": 10
   },
   "LLM": {

diff --git a/backend/src/Application/LLM/OllamaLlmService.cs b/backend/src/Application/LLM/OllamaLlmService.cs
@@ -25,7 +25,7 @@ public OllamaLlmService(
             ?? "http://localhost:11434";
         _model = config["Ollama:Model"]
             ?? Environment.GetEnvironmentVariable("OLLAMA_MODEL")
-            ?? "qwen3:8b";
+            ?? "gemma4:e4b";
         _timeoutSeconds = config.GetValue("Ollama:TimeoutSeconds", 30);
     }
 

diff --git a/backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs b/backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs
@@ -3,6 +3,6 @@ namespace TextStack.Vocabulary;
 public class VocabularyOptions
 {
     public string OllamaBaseUrl { get; set; } = "http://localhost:11434";
-    public string OllamaModel { get; set; } = "qwen3:8b";
+    public string OllamaModel { get; set; } = "gemma4:e4b";
     public int OllamaTimeoutSeconds { get; set; } = 30;
 }
diff --git a/backend/src/Worker/appsettings.json b/backend/src/Worker/appsettings.json
@@ -11,7 +11,7 @@
   },
   "Ollama": {
     "BaseUrl": "http://localhost:11434",
-    "Model": "qwen3:8b",
+    "Model": "gemma4:e4b",
     "TimeoutSeconds": 30
   },
   "LLM": {

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -66,7 +66,7 @@ services:
       Explain__CachePath: /data/explain-cache
       Translate__CachePath: /data/translate-cache
       Ollama__BaseUrl: http://ollama:11434
-      Ollama__Model: qwen3:8b
+      Ollama__Model: gemma4:e4b
       OpenAI__ApiKey: ${OPENAI_API_KEY:-}
       OpenAI__Model: ${OPENAI_MODEL:-gpt-4.1-nano}
       Resend__ApiKey: ${RESEND_API_KEY:-}
@@ -228,7 +228,7 @@ services:
   # ===========================================
 
   ollama:
-    image: ollama/ollama
+    image: ollama/ollama:0.23.1
     container_name: textstack_ollama
     volumes:
       - ./data/ollama:/root/.ollama
@@ -242,7 +242,7 @@ services:
     deploy:
       resources:
         limits:
-          memory: 4G
+          memory: 12G
         reservations:
-          memory: 2G
+          memory: 8G
 
diff --git a/docs/01-architecture/README.md b/docs/01-architecture/README.md
@@ -28,7 +28,7 @@ Modular monolith: single API + Worker, layered architecture, PostgreSQL.
                     ▼               │
             ┌───────────────┐       │     ┌───────────────┐
             │   Storage     │◄──────┘     │    Ollama     │
-            │ (bind mount)  │             │  gemma3:4b    │
+            │ (bind mount)  │             │  gemma4:e4b   │
             └───────────────┘             └───────────────┘
                                           ┌───────────────┐
                                           │LibreTranslate │

diff --git a/docs/02-system/database.md b/docs/02-system/database.md
@@ -200,7 +200,7 @@ All services: API :8080 | Web :5173 | Admin :81 | DB :5432
 │                                                                             │
 │   SRS: New(0) → Recognition(1) → Recall(2) → Context(3) → Mastered(4)     │
 │   Modes: multiple_choice | typed_recall | context                           │
-│   Distractors: Ollama gemma3:4b generates 5 plausible wrong answers        │
+│   Distractors: Ollama gemma4:e4b generates 5 plausible wrong answers       │
 └─────────────────────────────────────────────────────────────────────────────┘
 
 ┌─────────────────────────────────────────────────────────────────────────────┐
@@ -767,6 +767,6 @@ AssetKind          { Cover=0, InlineImage=1 }
 21. **Reading goals** - Daily minutes or books/year with streak tracking (min minutes threshold)
 22. **Achievements** - 20 codes across milestone/streak/time/special; AchievementChecker runs after sessions
 23. **Vocabulary SRS** - 5-stage spaced repetition (New→Recognition→Recall→Context→Mastered)
-25. **LLM distractors** - Ollama gemma3:4b generates MC wrong answers at word save time; stored as JSON
+25. **LLM distractors** - Ollama gemma4:e4b generates MC wrong answers at word save time; stored as JSON
 26. **Fire-and-forget distractors** - IServiceScopeFactory creates scoped DbContext for background generation
 27. **Vocabulary word uniqueness** - unique(user_id, site_id, word, language) prevents duplicates
diff --git a/docs/04-dev/llm-provider-swap.md b/docs/04-dev/llm-provider-swap.md
@@ -1,6 +1,6 @@
 # LLM Provider Swap Guide
 
-Default: self-hosted **Ollama** (`qwen3:8b`) in Docker. This doc shows how to swap it for a managed LLM API (OpenAI, Anthropic, Groq, etc.) if self-hosting is undesirable.
+Default: self-hosted **Ollama** (`gemma4:e4b`) in Docker. This doc shows how to swap it for a managed LLM API (OpenAI, Anthropic, Groq, etc.) if self-hosting is undesirable.
 
 ## What the LLM does
 
@@ -78,7 +78,7 @@ var response = await client.PostAsJsonAsync($"{_options.BaseUrl}/v1/messages", r
 Remove:
 ```
 Ollama__BaseUrl=http://ollama:11434
-Ollama__Model=qwen3:8b
+Ollama__Model=gemma4:e4b
 ```
 
 Add (example for OpenAI):

diff --git a/docs/05-features/vocabulary-srs.md b/docs/05-features/vocabulary-srs.md
@@ -40,7 +40,7 @@ When building MC prompt: definition → translation → blank sentence (if LLM d
 
 MC quiz quality depends on plausible wrong answers. Random words = too easy.
 
-**Solution**: Local Ollama LLM (`gemma3:4b`) generates 5 semantically similar distractors per word.
+**Solution**: Local Ollama LLM (`gemma4:e4b`) generates 5 semantically similar distractors per word.
 
 ### Flow
 1. User saves word in reader → API saves to DB immediately (fast response)
@@ -64,11 +64,11 @@ ollama:
         memory: 4G
 ```
 
-Config: `Ollama:BaseUrl`, `Ollama:Model` (gemma3:4b), `Ollama:TimeoutSeconds` (10)
+Config: `Ollama:BaseUrl`, `Ollama:Model` (gemma4:e4b), `Ollama:TimeoutSeconds` (10)
 
 ### Model Pull
 ```bash
-docker compose exec ollama ollama pull gemma3:4b
+docker compose exec ollama ollama pull gemma4:e4b
 ```
 
 ## API Endpoints

diff --git a/docs/ux-roadmap/17-ai-auto-tags.md b/docs/ux-roadmap/17-ai-auto-tags.md
@@ -46,7 +46,7 @@ After ingestion completes, Ollama proposes 3–5 tags for the book based on titl
 
   Tags should be in {userNativeLanguage}.
   ```
-- **Model:** reuse existing `Ollama:Model` config (`qwen3:8b` per CLAUDE.md). No new model.
+- **Model:** reuse existing `Ollama:Model` config (`gemma4:e4b` per CLAUDE.md). No new model.
 - **Timeout:** 30s (matches existing). On timeout/error, log and skip — never block ingestion.
 - **Validation:** parsed tags must be 1–30 chars, lowercase, alphanumeric+hyphen. Drop invalid silently.
 - **De-duplicate** suggestions against user's existing tags (via `useUserTags()` data).