diff --git a/CHANGELOG.md b/CHANGELOG.md index 7042d8ea..841f8492 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,17 @@ ## [Unreleased] +### Changed +- **Local LLM model**: switched from `qwen3:8b` to `gemma4:e4b` (Google's + Gemma 4 effective-4B, multimodal — text + vision + audio capable). Same + `ILlmService` interface, no API changes. +- **Ollama container**: image pinned to `ollama/ollama:0.23.1` (the floating + `latest` tag was still serving 0.22.x which doesn't recognise the + `gemma4` family). Memory limits raised from 4G/2G to 12G/8G — `gemma4:e4b` + needs ~9.8 GiB RAM to load weights + KV cache. Server has 31 GB total so + the headroom is plenty. +- To roll back: set `Ollama__Model=qwen3:8b` env var or revert this commit. + ## [v0.1.0] — 2026-05-06 ### Headline diff --git a/CLAUDE.md b/CLAUDE.md index 4bf129a6..bc506a64 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -199,7 +199,7 @@ Upload EPUB/PDF/FB2 → BookFile (stored) → IngestionJob (queued) - **Review entity**: `VocabularyReview` — tracks each answer (isCorrect, responseTimeMs, reviewMode) - **5 SRS stages**: New(0) → Recognition(1) → Recall(2) → Context(3) → Mastered(4). Logic in `Application/Vocabulary/SrsEngine.cs` - **2 review modes**: `multiple_choice` (all stages, Blitz + Context Cloze), `classic` (flashcard with self-assessment). Typing mode removed. -- **MC distractors + hint + explanation**: Ollama LLM (`qwen3:8b`) generates 5 distractors + hint + 2-3 sentence explanation (in native language) per word at save time. Stored in `Distractors` (JSON), `Hint` (varchar 500), `Explanation` (varchar 1000). Fallback: random words from user's vocab pool + hardcoded list. Generator: `Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs` +- **MC distractors + hint + explanation**: Ollama LLM (`gemma4:e4b`) generates 5 distractors + hint + 2-3 sentence explanation (in native language) per word at save time. Stored in `Distractors` (JSON), `Hint` (varchar 500), `Explanation` (varchar 1000). Fallback: random words from user's vocab pool + hardcoded list. Generator: `Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs` - **Ollama**: Docker service (`ollama/ollama`), config: `Ollama:BaseUrl`, `Ollama:Model`, `Ollama:TimeoutSeconds` (default 30s). Fire-and-forget generation via `IServiceScopeFactory` after word save - **MC fallback cascade**: definition → translation → blank sentence (if LLM distractors exist) → downgrade to context/typed_recall - **Frontend**: `VocabularyPage.tsx` (word list, filters, search, stats), `VocabularyReviewPage.tsx` (review session), components in `components/vocabulary/` diff --git a/PLAN-elevenreader-parity.md b/PLAN-elevenreader-parity.md index f367134b..b0d3c0e5 100644 --- a/PLAN-elevenreader-parity.md +++ b/PLAN-elevenreader-parity.md @@ -25,7 +25,7 @@ This unblocks everything downstream — copy, pricing, feature priorities all fl ## What exists today **Reader**: tap-to-translate (18+ languages), dictionary popup, highlights, bookmarks, vocab marks -**SRS**: 5-stage spaced repetition, multiple choice + context cloze, LLM-generated distractors (Ollama qwen3) +**SRS**: 5-stage spaced repetition, multiple choice + context cloze, LLM-generated distractors (Ollama gemma4:e4b) **Catalog**: 146 books, 47 authors, 20 genres, SSG-prerendered SEO pages **User uploads**: EPUB + PDF + FB2 → extraction → chapters + metadata enrichment (Ollama) **Stats**: sessions, streaks, 20 achievements, heatmap, goals diff --git a/README.md b/README.md index 9fbdc3ca..ecebe492 100644 --- a/README.md +++ b/README.md @@ -94,7 +94,7 @@ out; only technical vocabulary gets surfaced. - Auto-added while reading — sentence context, definition, translation - 5 stages (New → Recognition → Recall → Context → Mastered) - Capped weekly queue + LLM-generated distractors and hints (Ollama - `qwen3:8b`) + `gemma4:e4b`) - Review modes: multiple choice, classic flashcard **Library** @@ -121,7 +121,7 @@ out; only technical vocabulary gets surfaced. | Search | PostgreSQL FTS | | Web | React 19, Vite, pnpm | | Mobile | React Native (Expo 55) | -| LLM | OpenAI `gpt-5-mini` (explanations + translation) + Ollama `qwen3:8b` (distractors, local) | +| LLM | OpenAI `gpt-5-mini` (explanations + translation) + Ollama `gemma4:e4b` (distractors, local) | | TTS | Edge TTS (WebSocket, no API key) | | SSG | Puppeteer prerender, nginx serves static first | | Telemetry | OpenTelemetry → Aspire Dashboard | diff --git a/TODO.md b/TODO.md index 1513ecac..51447870 100644 --- a/TODO.md +++ b/TODO.md @@ -29,11 +29,10 @@ - Review queue view ### Open Questions (TBD) -1. **Ollama model?** - llama3, mistral, qwen, phi? Need to test quality vs speed tradeoff -2. **Prompt templates storage?** - Options: DB (editable at runtime), config file (version controlled), hardcoded (simplest). Decision depends on how often prompts need tuning -3. **Rate limiting?** - How many ms/sec between LLM calls? Depends on Ollama performance on server hardware -4. **Error handling?** - Retry count? Exponential backoff? Mark as failed and skip? -5. **Rollback storage?** - New `Edition.original_description` field or separate `edition_history` table? +1. **Prompt templates storage?** - Options: DB (editable at runtime), config file (version controlled), hardcoded (simplest). Decision depends on how often prompts need tuning +2. **Rate limiting?** - How many ms/sec between LLM calls? Depends on Ollama performance on server hardware +3. **Error handling?** - Retry count? Exponential backoff? Mark as failed and skip? +4. **Rollback storage?** - New `Edition.original_description` field or separate `edition_history` table? --- diff --git a/backend/src/Api/Program.cs b/backend/src/Api/Program.cs index e013de37..e23f36dc 100644 --- a/backend/src/Api/Program.cs +++ b/backend/src/Api/Program.cs @@ -101,7 +101,7 @@ builder.Services.AddTextStackVocabulary(options => { options.OllamaBaseUrl = builder.Configuration["Ollama:BaseUrl"] ?? "http://localhost:11434"; - options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "qwen3:8b"; + options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "gemma4:e4b"; options.OllamaTimeoutSeconds = builder.Configuration.GetValue("Ollama:TimeoutSeconds", 30); }); diff --git a/backend/src/Api/appsettings.json b/backend/src/Api/appsettings.json index 9356d358..aec4ad10 100644 --- a/backend/src/Api/appsettings.json +++ b/backend/src/Api/appsettings.json @@ -44,7 +44,7 @@ }, "Ollama": { "BaseUrl": "http://localhost:11434", - "Model": "qwen3:8b", + "Model": "gemma4:e4b", "TimeoutSeconds": 10 }, "LLM": { diff --git a/backend/src/Application/LLM/OllamaLlmService.cs b/backend/src/Application/LLM/OllamaLlmService.cs index db714731..a1261328 100644 --- a/backend/src/Application/LLM/OllamaLlmService.cs +++ b/backend/src/Application/LLM/OllamaLlmService.cs @@ -25,7 +25,7 @@ public OllamaLlmService( ?? "http://localhost:11434"; _model = config["Ollama:Model"] ?? Environment.GetEnvironmentVariable("OLLAMA_MODEL") - ?? "qwen3:8b"; + ?? "gemma4:e4b"; _timeoutSeconds = config.GetValue("Ollama:TimeoutSeconds", 30); } diff --git a/backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs b/backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs index bc0cbe82..386f3977 100644 --- a/backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs +++ b/backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs @@ -3,6 +3,6 @@ namespace TextStack.Vocabulary; public class VocabularyOptions { public string OllamaBaseUrl { get; set; } = "http://localhost:11434"; - public string OllamaModel { get; set; } = "qwen3:8b"; + public string OllamaModel { get; set; } = "gemma4:e4b"; public int OllamaTimeoutSeconds { get; set; } = 30; } diff --git a/backend/src/Worker/appsettings.json b/backend/src/Worker/appsettings.json index fb1d4830..fada083a 100644 --- a/backend/src/Worker/appsettings.json +++ b/backend/src/Worker/appsettings.json @@ -11,7 +11,7 @@ }, "Ollama": { "BaseUrl": "http://localhost:11434", - "Model": "qwen3:8b", + "Model": "gemma4:e4b", "TimeoutSeconds": 30 }, "LLM": { diff --git a/docker-compose.yml b/docker-compose.yml index 612bfe00..7abbaf79 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -66,7 +66,7 @@ services: Explain__CachePath: /data/explain-cache Translate__CachePath: /data/translate-cache Ollama__BaseUrl: http://ollama:11434 - Ollama__Model: qwen3:8b + Ollama__Model: gemma4:e4b OpenAI__ApiKey: ${OPENAI_API_KEY:-} OpenAI__Model: ${OPENAI_MODEL:-gpt-4.1-nano} Resend__ApiKey: ${RESEND_API_KEY:-} @@ -228,7 +228,7 @@ services: # =========================================== ollama: - image: ollama/ollama + image: ollama/ollama:0.23.1 container_name: textstack_ollama volumes: - ./data/ollama:/root/.ollama @@ -242,7 +242,7 @@ services: deploy: resources: limits: - memory: 4G + memory: 12G reservations: - memory: 2G + memory: 8G diff --git a/docs/01-architecture/README.md b/docs/01-architecture/README.md index 7154cf79..e1e92c53 100644 --- a/docs/01-architecture/README.md +++ b/docs/01-architecture/README.md @@ -28,7 +28,7 @@ Modular monolith: single API + Worker, layered architecture, PostgreSQL. ▼ │ ┌───────────────┐ │ ┌───────────────┐ │ Storage │◄──────┘ │ Ollama │ - │ (bind mount) │ │ gemma3:4b │ + │ (bind mount) │ │ gemma4:e4b │ └───────────────┘ └───────────────┘ ┌───────────────┐ │LibreTranslate │ diff --git a/docs/02-system/database.md b/docs/02-system/database.md index b469edc8..ec0a06ae 100644 --- a/docs/02-system/database.md +++ b/docs/02-system/database.md @@ -200,7 +200,7 @@ All services: API :8080 | Web :5173 | Admin :81 | DB :5432 │ │ │ SRS: New(0) → Recognition(1) → Recall(2) → Context(3) → Mastered(4) │ │ Modes: multiple_choice | typed_recall | context │ -│ Distractors: Ollama gemma3:4b generates 5 plausible wrong answers │ +│ Distractors: Ollama gemma4:e4b generates 5 plausible wrong answers │ └─────────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────────┐ @@ -767,6 +767,6 @@ AssetKind { Cover=0, InlineImage=1 } 21. **Reading goals** - Daily minutes or books/year with streak tracking (min minutes threshold) 22. **Achievements** - 20 codes across milestone/streak/time/special; AchievementChecker runs after sessions 23. **Vocabulary SRS** - 5-stage spaced repetition (New→Recognition→Recall→Context→Mastered) -25. **LLM distractors** - Ollama gemma3:4b generates MC wrong answers at word save time; stored as JSON +25. **LLM distractors** - Ollama gemma4:e4b generates MC wrong answers at word save time; stored as JSON 26. **Fire-and-forget distractors** - IServiceScopeFactory creates scoped DbContext for background generation 27. **Vocabulary word uniqueness** - unique(user_id, site_id, word, language) prevents duplicates diff --git a/docs/04-dev/llm-provider-swap.md b/docs/04-dev/llm-provider-swap.md index 1e7056a2..3c99f9ba 100644 --- a/docs/04-dev/llm-provider-swap.md +++ b/docs/04-dev/llm-provider-swap.md @@ -1,6 +1,6 @@ # LLM Provider Swap Guide -Default: self-hosted **Ollama** (`qwen3:8b`) in Docker. This doc shows how to swap it for a managed LLM API (OpenAI, Anthropic, Groq, etc.) if self-hosting is undesirable. +Default: self-hosted **Ollama** (`gemma4:e4b`) in Docker. This doc shows how to swap it for a managed LLM API (OpenAI, Anthropic, Groq, etc.) if self-hosting is undesirable. ## What the LLM does @@ -78,7 +78,7 @@ var response = await client.PostAsJsonAsync($"{_options.BaseUrl}/v1/messages", r Remove: ``` Ollama__BaseUrl=http://ollama:11434 -Ollama__Model=qwen3:8b +Ollama__Model=gemma4:e4b ``` Add (example for OpenAI): diff --git a/docs/05-features/vocabulary-srs.md b/docs/05-features/vocabulary-srs.md index 6075e017..559ac2d6 100644 --- a/docs/05-features/vocabulary-srs.md +++ b/docs/05-features/vocabulary-srs.md @@ -40,7 +40,7 @@ When building MC prompt: definition → translation → blank sentence (if LLM d MC quiz quality depends on plausible wrong answers. Random words = too easy. -**Solution**: Local Ollama LLM (`gemma3:4b`) generates 5 semantically similar distractors per word. +**Solution**: Local Ollama LLM (`gemma4:e4b`) generates 5 semantically similar distractors per word. ### Flow 1. User saves word in reader → API saves to DB immediately (fast response) @@ -64,11 +64,11 @@ ollama: memory: 4G ``` -Config: `Ollama:BaseUrl`, `Ollama:Model` (gemma3:4b), `Ollama:TimeoutSeconds` (10) +Config: `Ollama:BaseUrl`, `Ollama:Model` (gemma4:e4b), `Ollama:TimeoutSeconds` (10) ### Model Pull ```bash -docker compose exec ollama ollama pull gemma3:4b +docker compose exec ollama ollama pull gemma4:e4b ``` ## API Endpoints diff --git a/docs/ux-roadmap/17-ai-auto-tags.md b/docs/ux-roadmap/17-ai-auto-tags.md index 510645b1..8a820a82 100644 --- a/docs/ux-roadmap/17-ai-auto-tags.md +++ b/docs/ux-roadmap/17-ai-auto-tags.md @@ -46,7 +46,7 @@ After ingestion completes, Ollama proposes 3–5 tags for the book based on titl Tags should be in {userNativeLanguage}. ``` -- **Model:** reuse existing `Ollama:Model` config (`qwen3:8b` per CLAUDE.md). No new model. +- **Model:** reuse existing `Ollama:Model` config (`gemma4:e4b` per CLAUDE.md). No new model. - **Timeout:** 30s (matches existing). On timeout/error, log and skip — never block ingestion. - **Validation:** parsed tags must be 1–30 chars, lowercase, alphanumeric+hyphen. Drop invalid silently. - **De-duplicate** suggestions against user's existing tags (via `useUserTags()` data).