Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

## [Unreleased]

### Changed
- **Local LLM model**: switched from `qwen3:8b` to `gemma4:e4b` (Google's
Gemma 4 effective-4B, multimodal — text + vision + audio capable). Same
`ILlmService` interface, no API changes.
- **Ollama container**: image pinned to `ollama/ollama:0.23.1` (the floating
`latest` tag was still serving 0.22.x which doesn't recognise the
`gemma4` family). Memory limits raised from 4G/2G to 12G/8G — `gemma4:e4b`
needs ~9.8 GiB RAM to load weights + KV cache. Server has 31 GB total so
the headroom is plenty.
- To roll back: set `Ollama__Model=qwen3:8b` env var or revert this commit.

## [v0.1.0] — 2026-05-06

### Headline
Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ Upload EPUB/PDF/FB2 → BookFile (stored) → IngestionJob (queued)
- **Review entity**: `VocabularyReview` — tracks each answer (isCorrect, responseTimeMs, reviewMode)
- **5 SRS stages**: New(0) → Recognition(1) → Recall(2) → Context(3) → Mastered(4). Logic in `Application/Vocabulary/SrsEngine.cs`
- **2 review modes**: `multiple_choice` (all stages, Blitz + Context Cloze), `classic` (flashcard with self-assessment). Typing mode removed.
- **MC distractors + hint + explanation**: Ollama LLM (`qwen3:8b`) generates 5 distractors + hint + 2-3 sentence explanation (in native language) per word at save time. Stored in `Distractors` (JSON), `Hint` (varchar 500), `Explanation` (varchar 1000). Fallback: random words from user's vocab pool + hardcoded list. Generator: `Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs`
- **MC distractors + hint + explanation**: Ollama LLM (`gemma4:e4b`) generates 5 distractors + hint + 2-3 sentence explanation (in native language) per word at save time. Stored in `Distractors` (JSON), `Hint` (varchar 500), `Explanation` (varchar 1000). Fallback: random words from user's vocab pool + hardcoded list. Generator: `Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs`
- **Ollama**: Docker service (`ollama/ollama`), config: `Ollama:BaseUrl`, `Ollama:Model`, `Ollama:TimeoutSeconds` (default 30s). Fire-and-forget generation via `IServiceScopeFactory` after word save
- **MC fallback cascade**: definition → translation → blank sentence (if LLM distractors exist) → downgrade to context/typed_recall
- **Frontend**: `VocabularyPage.tsx` (word list, filters, search, stats), `VocabularyReviewPage.tsx` (review session), components in `components/vocabulary/`
Expand Down
2 changes: 1 addition & 1 deletion PLAN-elevenreader-parity.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ This unblocks everything downstream — copy, pricing, feature priorities all fl
## What exists today

**Reader**: tap-to-translate (18+ languages), dictionary popup, highlights, bookmarks, vocab marks
**SRS**: 5-stage spaced repetition, multiple choice + context cloze, LLM-generated distractors (Ollama qwen3)
**SRS**: 5-stage spaced repetition, multiple choice + context cloze, LLM-generated distractors (Ollama gemma4:e4b)
**Catalog**: 146 books, 47 authors, 20 genres, SSG-prerendered SEO pages
**User uploads**: EPUB + PDF + FB2 → extraction → chapters + metadata enrichment (Ollama)
**Stats**: sessions, streaks, 20 achievements, heatmap, goals
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ out; only technical vocabulary gets surfaced.
- Auto-added while reading — sentence context, definition, translation
- 5 stages (New → Recognition → Recall → Context → Mastered)
- Capped weekly queue + LLM-generated distractors and hints (Ollama
`qwen3:8b`)
`gemma4:e4b`)
- Review modes: multiple choice, classic flashcard

**Library**
Expand All @@ -121,7 +121,7 @@ out; only technical vocabulary gets surfaced.
| Search | PostgreSQL FTS |
| Web | React 19, Vite, pnpm |
| Mobile | React Native (Expo 55) |
| LLM | OpenAI `gpt-5-mini` (explanations + translation) + Ollama `qwen3:8b` (distractors, local) |
| LLM | OpenAI `gpt-5-mini` (explanations + translation) + Ollama `gemma4:e4b` (distractors, local) |
| TTS | Edge TTS (WebSocket, no API key) |
| SSG | Puppeteer prerender, nginx serves static first |
| Telemetry | OpenTelemetry → Aspire Dashboard |
Expand Down
9 changes: 4 additions & 5 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,10 @@
- Review queue view

### Open Questions (TBD)
1. **Ollama model?** - llama3, mistral, qwen, phi? Need to test quality vs speed tradeoff
2. **Prompt templates storage?** - Options: DB (editable at runtime), config file (version controlled), hardcoded (simplest). Decision depends on how often prompts need tuning
3. **Rate limiting?** - How many ms/sec between LLM calls? Depends on Ollama performance on server hardware
4. **Error handling?** - Retry count? Exponential backoff? Mark as failed and skip?
5. **Rollback storage?** - New `Edition.original_description` field or separate `edition_history` table?
1. **Prompt templates storage?** - Options: DB (editable at runtime), config file (version controlled), hardcoded (simplest). Decision depends on how often prompts need tuning
2. **Rate limiting?** - How many ms/sec between LLM calls? Depends on Ollama performance on server hardware
3. **Error handling?** - Retry count? Exponential backoff? Mark as failed and skip?
4. **Rollback storage?** - New `Edition.original_description` field or separate `edition_history` table?

---

Expand Down
2 changes: 1 addition & 1 deletion backend/src/Api/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@
builder.Services.AddTextStackVocabulary(options =>
{
options.OllamaBaseUrl = builder.Configuration["Ollama:BaseUrl"] ?? "http://localhost:11434";
options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "qwen3:8b";
options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "gemma4:e4b";
options.OllamaTimeoutSeconds = builder.Configuration.GetValue("Ollama:TimeoutSeconds", 30);
});

Expand Down
2 changes: 1 addition & 1 deletion backend/src/Api/appsettings.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
},
"Ollama": {
"BaseUrl": "http://localhost:11434",
"Model": "qwen3:8b",
"Model": "gemma4:e4b",
"TimeoutSeconds": 10
},
"LLM": {
Expand Down
2 changes: 1 addition & 1 deletion backend/src/Application/LLM/OllamaLlmService.cs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ public OllamaLlmService(
?? "http://localhost:11434";
_model = config["Ollama:Model"]
?? Environment.GetEnvironmentVariable("OLLAMA_MODEL")
?? "qwen3:8b";
?? "gemma4:e4b";
_timeoutSeconds = config.GetValue("Ollama:TimeoutSeconds", 30);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ namespace TextStack.Vocabulary;
public class VocabularyOptions
{
public string OllamaBaseUrl { get; set; } = "http://localhost:11434";
public string OllamaModel { get; set; } = "qwen3:8b";
public string OllamaModel { get; set; } = "gemma4:e4b";
public int OllamaTimeoutSeconds { get; set; } = 30;
}
2 changes: 1 addition & 1 deletion backend/src/Worker/appsettings.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
},
"Ollama": {
"BaseUrl": "http://localhost:11434",
"Model": "qwen3:8b",
"Model": "gemma4:e4b",
"TimeoutSeconds": 30
},
"LLM": {
Expand Down
8 changes: 4 additions & 4 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ services:
Explain__CachePath: /data/explain-cache
Translate__CachePath: /data/translate-cache
Ollama__BaseUrl: http://ollama:11434
Ollama__Model: qwen3:8b
Ollama__Model: gemma4:e4b
OpenAI__ApiKey: ${OPENAI_API_KEY:-}
OpenAI__Model: ${OPENAI_MODEL:-gpt-4.1-nano}
Resend__ApiKey: ${RESEND_API_KEY:-}
Expand Down Expand Up @@ -228,7 +228,7 @@ services:
# ===========================================

ollama:
image: ollama/ollama
image: ollama/ollama:0.23.1
container_name: textstack_ollama
volumes:
- ./data/ollama:/root/.ollama
Expand All @@ -242,7 +242,7 @@ services:
deploy:
resources:
limits:
memory: 4G
memory: 12G
reservations:
memory: 2G
memory: 8G

2 changes: 1 addition & 1 deletion docs/01-architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Modular monolith: single API + Worker, layered architecture, PostgreSQL.
▼ │
┌───────────────┐ │ ┌───────────────┐
│ Storage │◄──────┘ │ Ollama │
│ (bind mount) │ │ gemma3:4b
│ (bind mount) │ │ gemma4:e4b
└───────────────┘ └───────────────┘
┌───────────────┐
│LibreTranslate │
Expand Down
4 changes: 2 additions & 2 deletions docs/02-system/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ All services: API :8080 | Web :5173 | Admin :81 | DB :5432
│ │
│ SRS: New(0) → Recognition(1) → Recall(2) → Context(3) → Mastered(4) │
│ Modes: multiple_choice | typed_recall | context │
│ Distractors: Ollama gemma3:4b generates 5 plausible wrong answers
│ Distractors: Ollama gemma4:e4b generates 5 plausible wrong answers │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
Expand Down Expand Up @@ -767,6 +767,6 @@ AssetKind { Cover=0, InlineImage=1 }
21. **Reading goals** - Daily minutes or books/year with streak tracking (min minutes threshold)
22. **Achievements** - 20 codes across milestone/streak/time/special; AchievementChecker runs after sessions
23. **Vocabulary SRS** - 5-stage spaced repetition (New→Recognition→Recall→Context→Mastered)
25. **LLM distractors** - Ollama gemma3:4b generates MC wrong answers at word save time; stored as JSON
25. **LLM distractors** - Ollama gemma4:e4b generates MC wrong answers at word save time; stored as JSON
26. **Fire-and-forget distractors** - IServiceScopeFactory creates scoped DbContext for background generation
27. **Vocabulary word uniqueness** - unique(user_id, site_id, word, language) prevents duplicates
4 changes: 2 additions & 2 deletions docs/04-dev/llm-provider-swap.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# LLM Provider Swap Guide

Default: self-hosted **Ollama** (`qwen3:8b`) in Docker. This doc shows how to swap it for a managed LLM API (OpenAI, Anthropic, Groq, etc.) if self-hosting is undesirable.
Default: self-hosted **Ollama** (`gemma4:e4b`) in Docker. This doc shows how to swap it for a managed LLM API (OpenAI, Anthropic, Groq, etc.) if self-hosting is undesirable.

## What the LLM does

Expand Down Expand Up @@ -78,7 +78,7 @@ var response = await client.PostAsJsonAsync($"{_options.BaseUrl}/v1/messages", r
Remove:
```
Ollama__BaseUrl=http://ollama:11434
Ollama__Model=qwen3:8b
Ollama__Model=gemma4:e4b
```

Add (example for OpenAI):
Expand Down
6 changes: 3 additions & 3 deletions docs/05-features/vocabulary-srs.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ When building MC prompt: definition → translation → blank sentence (if LLM d

MC quiz quality depends on plausible wrong answers. Random words = too easy.

**Solution**: Local Ollama LLM (`gemma3:4b`) generates 5 semantically similar distractors per word.
**Solution**: Local Ollama LLM (`gemma4:e4b`) generates 5 semantically similar distractors per word.

### Flow
1. User saves word in reader → API saves to DB immediately (fast response)
Expand All @@ -64,11 +64,11 @@ ollama:
memory: 4G
```

Config: `Ollama:BaseUrl`, `Ollama:Model` (gemma3:4b), `Ollama:TimeoutSeconds` (10)
Config: `Ollama:BaseUrl`, `Ollama:Model` (gemma4:e4b), `Ollama:TimeoutSeconds` (10)

### Model Pull
```bash
docker compose exec ollama ollama pull gemma3:4b
docker compose exec ollama ollama pull gemma4:e4b
```

## API Endpoints
Expand Down
2 changes: 1 addition & 1 deletion docs/ux-roadmap/17-ai-auto-tags.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ After ingestion completes, Ollama proposes 3–5 tags for the book based on titl

Tags should be in {userNativeLanguage}.
```
- **Model:** reuse existing `Ollama:Model` config (`qwen3:8b` per CLAUDE.md). No new model.
- **Model:** reuse existing `Ollama:Model` config (`gemma4:e4b` per CLAUDE.md). No new model.
- **Timeout:** 30s (matches existing). On timeout/error, log and skip — never block ingestion.
- **Validation:** parsed tags must be 1–30 chars, lowercase, alphanumeric+hyphen. Drop invalid silently.
- **De-duplicate** suggestions against user's existing tags (via `useUserTags()` data).
Expand Down
Loading