diff --git a/.~lock.Vasyl-Personal-Projects.pdf# b/.~lock.Vasyl-Personal-Projects.pdf# new file mode 100644 index 00000000..7163a43e --- /dev/null +++ b/.~lock.Vasyl-Personal-Projects.pdf# @@ -0,0 +1 @@ +,blissful-funny-gauss,claude,14.05.2026 16:12,file:///sessions/blissful-funny-gauss/.config/libreoffice/4; \ No newline at end of file diff --git a/.~lock.Vasyl-Resume.pdf# b/.~lock.Vasyl-Resume.pdf# new file mode 100644 index 00000000..7163a43e --- /dev/null +++ b/.~lock.Vasyl-Resume.pdf# @@ -0,0 +1 @@ +,blissful-funny-gauss,claude,14.05.2026 16:12,file:///sessions/blissful-funny-gauss/.config/libreoffice/4; \ No newline at end of file diff --git a/.~lock.textstack-gemma4-submission-package.pdf# b/.~lock.textstack-gemma4-submission-package.pdf# new file mode 100644 index 00000000..62f5cd38 --- /dev/null +++ b/.~lock.textstack-gemma4-submission-package.pdf# @@ -0,0 +1 @@ +,blissful-funny-gauss,claude,08.05.2026 15:15,file:///sessions/blissful-funny-gauss/.config/libreoffice/4; \ No newline at end of file diff --git a/AI-ENGINEER-ROADMAP.md b/AI-ENGINEER-ROADMAP.md new file mode 100644 index 00000000..cee97bca --- /dev/null +++ b/AI-ENGINEER-ROADMAP.md @@ -0,0 +1,388 @@ +# AI Engineer — Career Roadmap + +**Owner:** Vasyl Vdovychenko +**Created:** 2026-05-13 +**Goal:** Position as AI Engineer and start interviewing in ~6 weeks + +--- + +## Part 1 — Current Profile Audit + +### Claims I removed from LinkedIn About (and why) + +| Removed | Reason | +|---------|--------| +| **"Hybrid search"** (RAG bullet) | Hybrid = vector + BM25 / keyword. My `rag-chatbot-dotnet` uses vector retrieval only. | +| **"Multi-agent workflows"** (Agent bullet) | No 2+ agents collaborating in my repos. Have single agents with tools, not multi-agent. | +| **"Response distillation"** (Production AI infra) | Distillation = training a smaller model on outputs of a larger one. I do model **selection** (E4B → E2B), which is different. | +| **"Kubernetes"** (Full-stack) | Use Docker Compose + Cloudflare Tunnel, not K8s. No K8s manifests in any repo. | +| **"Anthropic Claude"** (LLM integration) | `.env.example` has `CLAUDE_API_KEY` placeholder, but I haven't verified actual Claude SDK calls in code. **Don't claim until verified.** | + +### Final defensible LinkedIn About (every word backed by code) + +``` +AI Engineer with 10+ years in software engineering. Building production AI systems — +RAG pipelines, agent architectures, LLM orchestration, and observability for ML workloads. + +Currently a Senior Software Engineer at Pinnacle focused on SDK and testing tooling; +pursuing AI engineering through personal open-source work. Looking to make this my +full-time focus. + +What I build: + +• RAG pipelines — Pinecone vector retrieval, OpenAI embeddings, document chunking, + context-grounded generation via Semantic Kernel + +• Agent architectures — ReAct loops, tool orchestration, MCP server implementation, + function calling, single-agent RAG workflows + +• LLM integration — OpenAI, Ollama, Semantic Kernel; model selection, + quantization tradeoffs, prompt design + +• Production AI infrastructure — cost optimization ($0.002/load test cycle), + multi-tier caching, fall-back routing, model selection + +• Observability for ML systems — distributed tracing (OpenTelemetry + Aspire), + load testing at burst (63K req, p95 20.5ms validated) + +• Full-stack delivery — ASP.NET Core, React, React Native (Expo), Postgres, + Docker Compose, CI/CD + +Open to: AI Engineer · ML Engineer · LLM Engineer · AI Infrastructure Engineer · +AI Platform Engineer · Applied AI Engineer. + +Background: 10+ years building scalable cloud systems on .NET / Azure stack. +``` + +### Headline (final) + +``` +AI Engineer | RAG · Agents · LLM Infrastructure | 10+ years in software engineering +``` + +--- + +## Part 2 — AI Engineer Skill Matrix + +What an "ideal AI Engineer" portfolio typically has, and where I stand. + +Legend: ✅ have it · ⚠️ partial · ❌ missing + +### LLM Integration + +| Skill | Status | Evidence | +|-------|--------|----------| +| OpenAI API | ✅ | rag-chatbot, textstack, AiAgents | +| Anthropic Claude API | ⚠️ | .env exists, real SDK call to verify | +| Google Gemini | ⚠️ | .env exists, real SDK call to verify | +| Ollama (open-source models) | ✅ | textstack (Gemma 4) | +| Multi-provider abstraction | ✅ | AI_PROVIDER switch in AiAgents | + +### RAG (Retrieval-Augmented Generation) + +| Skill | Status | Evidence | +|-------|--------|----------| +| Vector databases (Pinecone/Qdrant/Chroma) | ✅ | Pinecone in rag-chatbot | +| OpenAI embeddings | ✅ | rag-chatbot | +| Document chunking | ✅ | rag-chatbot (Wikipedia indexer) | +| Hybrid search (vector + BM25) | ❌ | — | +| Re-ranking (Cohere, etc.) | ❌ | — | +| Long-context strategies | ❌ | — | +| RAG evaluation (RAGAS) | ❌ | — | + +### Agents + +| Skill | Status | Evidence | +|-------|--------|----------| +| Function calling | ✅ | FunctionRegistry.cs in AiAgents | +| Tool orchestration | ✅ | ConsoleAgent (weather tool) | +| ReAct loop | ✅ | AiAgents docs + code | +| Agent memory | ✅ | Memory systems in AiAgents docs | +| **MCP server** | ✅ | **Real McpServer (tools + prompts + resources)** | +| MCP client | ❌ | Built server only, not client | +| Multi-agent collaboration | ❌ | — | +| Agent evaluation | ❌ | — | + +### Local LLM / On-device + +| Skill | Status | Evidence | +|-------|--------|----------| +| Ollama | ✅ | textstack | +| WebLLM (browser inference) | ✅ | ReplyMate | +| Quantization knowledge (Q4, GGUF) | ✅ | VRAM math post | +| GPU offload tuning | ✅ | VRAM math post (2.5× measured) | +| llama.cpp directly | ❌ | — | +| vLLM / TGI / Triton | ❌ | — | + +### Production AI Infrastructure + +| Skill | Status | Evidence | +|-------|--------|----------| +| Caching strategies | ✅ | textstack disk cache + IndexedDB | +| Cost optimization | ✅ | $0.002/load test (measured) | +| Fall-back routing | ✅ | MC fallback cascade | +| Rate limiting | ✅ | textstack rate-limit middleware | +| Async inference / queues | ✅ | textstack worker | +| A/B testing models | ❌ | — | +| Shadow deployment | ❌ | — | + +### Observability / MLOps + +| Skill | Status | Evidence | +|-------|--------|----------| +| Distributed tracing OTel | ✅ | xUnitOTel + textstack | +| Load testing | ✅ | LoadSurge (63K req validated) | +| Latency budgets | ✅ | Load test report | +| Cost / token tracking | ⚠️ | Partial, not explicit dashboard | +| LLM eval suites | ❌ | — | +| LangSmith / Helicone | ❌ | — | +| Prompt versioning | ❌ | — | + +### ML Fundamentals + +| Skill | Status | Evidence | +|-------|--------|----------| +| Embeddings | ✅ | rag-chatbot | +| Transformer arch (engineer-level) | ⚠️ | NVIDIA course in progress | +| Quantization formats (GGUF/AWQ) | ⚠️ | Knows via Ollama, not deep | +| Tokenizers (BPE) | ⚠️ | Concept, not hands-on | +| Fine-tuning / LoRA / PEFT | ❌ | — | +| RLHF / DPO | ❌ | — | + +### Frameworks + +| Skill | Status | Evidence | +|-------|--------|----------| +| Semantic Kernel | ✅ | rag-chatbot | +| Microsoft.Extensions.AI | ✅ | rag-chatbot | +| LangChain | ❌ | — | +| LlamaIndex | ❌ | — | +| DSPy | ❌ | — | +| AutoGen / CrewAI | ❌ | — | +| Haystack | ❌ | — | + +### Deployment + +| Skill | Status | Evidence | +|-------|--------|----------| +| Docker / Docker Compose | ✅ | every repo | +| Azure (cloud) | ✅ | Pinnacle background | +| AWS / GCP | ⚠️ | Familiar, not visible in repos | +| Kubernetes | ❌ | use compose | +| GPU cluster mgmt | ❌ | — | +| Model serving (vLLM, Triton) | ❌ | — | + +### Safety / Eval + +| Skill | Status | Evidence | +|-------|--------|----------| +| Prompt injection defense | ⚠️ | textstack SeoPromptSanitizer (basic) | +| Guardrails | ❌ | — | +| Red-teaming | ❌ | — | +| RAGAS / eval frameworks | ❌ | — | + +### Languages + +| Lang | Status | Notes | +|------|--------|-------| +| C# / .NET | ✅✅✅ | 12 years | +| TypeScript | ✅ | textstack frontend, ReplyMate | +| **Python** | ❌ | **Biggest gap for AI roles** | +| SQL | ✅ | Postgres | + +--- + +## Part 3 — 6-Week Learning Plan + +**Goal:** Close the top 3 most-demanded gaps — Python, LangChain/LlamaIndex, RAG evaluation — before flipping LinkedIn Open-to-Work toggle. + +### Cadence assumption + +Day job: ~40h/week. Available learning time: ~8-10h/week (evenings + weekends). + +Each week has one **shippable artifact** (a repo, doc, or LinkedIn-ready milestone). + +--- + +### Week 1 (May 13 – May 19) — Python AI Foundations + +**Goal:** Stop being a "C# guy who knows AI" — establish Python AI workflow. + +**Tasks:** +- [ ] Set up Python 3.12 + uv + ruff + ipython on dev machine +- [ ] Install OpenAI Python SDK, run hello-world chat completion +- [ ] Install Anthropic Python SDK, run hello-world (covers the Claude gap honestly) +- [ ] Re-implement `rca` (your LLM-powered test-log analyzer) in Python — single file, ~200 LoC +- [ ] Add it to GitHub as `rca-py` with a clear README + +**Deliverable:** `github.com/mrviduus/rca-py` — Python port with both OpenAI and Claude support. + +**Why:** This single repo lets me legitimately add "Python · Anthropic Claude" to my LinkedIn skills. + +**Resources:** +- Anthropic SDK quickstart: https://docs.anthropic.com/en/api/getting-started +- OpenAI Python SDK: https://github.com/openai/openai-python + +--- + +### Week 2 (May 20 – May 26) — LangChain RAG + +**Goal:** Build the same thing my `rag-chatbot-dotnet` does, but in LangChain. This is the most-searched framework on LinkedIn. + +**Tasks:** +- [ ] Read LangChain "Get started" + "RAG tutorial" +- [ ] Build a RAG chatbot in Python + LangChain that mirrors my rag-chatbot-dotnet (Wikipedia → Pinecone → OpenAI) +- [ ] Add semantic chunking (`langchain.text_splitter.RecursiveCharacterTextSplitter` with overlap) +- [ ] Add streaming responses +- [ ] Document the architecture in README with a diagram + +**Deliverable:** `github.com/mrviduus/rag-chatbot-langchain` — Python equivalent with proper chunking strategy documented. + +**Why:** Recruiter searches for "LangChain" return 10× more results than "Semantic Kernel". This single repo makes me searchable. + +**Resources:** +- LangChain RAG tutorial: https://python.langchain.com/docs/tutorials/rag/ +- Pinecone + LangChain: https://docs.pinecone.io/integrations/langchain + +--- + +### Week 3 (May 27 – Jun 2) — Hybrid Search + Re-ranking + +**Goal:** Close the "hybrid search" gap honestly — actually build it. + +**Tasks:** +- [ ] Add BM25 retriever to the LangChain rag-chatbot (from `langchain_community.retrievers`) +- [ ] Combine BM25 + vector via `EnsembleRetriever` (50/50 weight) +- [ ] Add Cohere or local re-ranker on top-N results +- [ ] A/B test: pure vector vs hybrid+rerank on 20 hand-written queries — log win rate +- [ ] Write up findings as a dev.to post: "Hybrid search measured: I tested vector vs BM25 vs ensemble" + +**Deliverable:** +- Updated rag-chatbot-langchain with hybrid + rerank +- A dev.to post with measurement table + +**Why:** Now "hybrid search" is REAL on my profile. Plus another technical post for portfolio. + +**Resources:** +- LangChain EnsembleRetriever: https://python.langchain.com/docs/how_to/ensemble_retriever/ +- Cohere rerank: https://docs.cohere.com/docs/reranking + +--- + +### Week 4 (Jun 3 – Jun 9) — RAG Evaluation with RAGAS + +**Goal:** Add LLM-eval skill — a senior-marker most engineers don't have. + +**Tasks:** +- [ ] Install RAGAS (`pip install ragas`) +- [ ] Build a 30-question eval dataset for the rag-chatbot (manual, ~2 hours) +- [ ] Run RAGAS metrics: faithfulness, answer relevancy, context precision, context recall +- [ ] Generate a metrics report (markdown table) +- [ ] Optionally: add a CI workflow (`.github/workflows/ragas-eval.yml`) that runs on every PR + +**Deliverable:** +- RAGAS eval suite in rag-chatbot-langchain repo +- README section showing the metrics +- (stretch) GitHub Actions running RAGAS on every commit + +**Why:** "LLM evaluation" with concrete metrics is what separates senior from mid AI engineer candidates. + +**Resources:** +- RAGAS: https://docs.ragas.io/en/stable/ +- LangSmith for tracing (optional): https://docs.smith.langchain.com/ + +--- + +### Week 5 (Jun 10 – Jun 16) — Multi-Agent Collaboration + +**Goal:** Close the "multi-agent" gap honestly — build it. + +**Tasks:** +- [ ] Pick CrewAI or AutoGen (CrewAI is friendlier; AutoGen is Microsoft so synergies with my .NET background) +- [ ] Build a 3-agent system: **Planner → Researcher → Critic** + - Planner breaks down a question into sub-tasks + - Researcher uses RAG (your existing LangChain pipeline) to gather facts + - Critic reviews the synthesis and asks for clarifications +- [ ] Pick a non-trivial task: "Plan a 3-day technical conference agenda from a PDF of past schedules" +- [ ] Document the agent topology with a diagram + +**Deliverable:** `github.com/mrviduus/multi-agent-research` — working 3-agent pipeline + +**Why:** Now "multi-agent workflows" is REAL. Also opens conversation in interviews about agent design tradeoffs. + +**Resources:** +- CrewAI: https://docs.crewai.com/ +- AutoGen: https://microsoft.github.io/autogen/ + +--- + +### Week 6 (Jun 17 – Jun 23) — Polish + LinkedIn Flip + +**Goal:** Update profile to reflect new skills and start interviewing. + +**Tasks:** +- [ ] Update LinkedIn About to add new bullets: + - Python in LLM Integration + - Hybrid search + RAGAS in RAG Pipelines + - Multi-agent in Agent Architectures +- [ ] Add Featured items: rca-py, rag-chatbot-langchain, multi-agent-research +- [ ] Add Skills: Python, LangChain, LlamaIndex (if used), RAGAS, CrewAI / AutoGen +- [ ] Reorder Top Skills: LLM, RAG, AI Engineering, **Python**, Agents +- [ ] **Flip "Open to Work" toggle ON** (Recruiters only visibility) +- [ ] Add 5 AI Engineer job titles in Open To Work +- [ ] Write a LinkedIn post: "What I built in 6 weeks pivoting to AI Engineering" — links to all 4 new repos +- [ ] Connect with 10-15 AI engineers in Toronto / Waterloo / remote-Canada + +**Deliverable:** Profile that reads as senior AI Engineer with proof, recruiter messages start arriving within 1-2 weeks. + +--- + +## Part 4 — Top 3 Gaps to Close (TL;DR) + +In priority order: + +1. **Python** — Week 1. Without this, "AI Engineer" doesn't read as believable for most roles. +2. **LangChain or LlamaIndex** — Week 2. Recruiter searches for these in 10× more volume than Semantic Kernel. +3. **LLM evaluation (RAGAS)** — Week 4. Senior-marker that most candidates miss. + +Everything else (multi-agent, hybrid search, rerank) is layered on top across weeks 3 and 5. + +--- + +## Part 5 — Nice-to-Have (After Week 6) + +If interview pipeline is slow or I want extra ammo: + +| Skill | Why | Time | +|-------|-----|------| +| Real Anthropic Claude integration code | Closes the .env-only gap | 1 evening | +| Vector DB diversity (add Qdrant or Chroma) | Shows you can pick, not just use one | 1 weekend | +| Fine-tuning experience (LoRA) | One HuggingFace training run = resume signal | 1 weekend | +| Kubernetes basics (deploy textstack to k3s) | Required for some enterprise roles | 1 week | +| LangSmith / Helicone observability | LLM-specific tracing tools | 1 evening | + +--- + +## Part 6 — Success Metrics + +How I know this worked: + +- [ ] All 4 new repos live on GitHub with README + topics +- [ ] LinkedIn About updated with backed claims (every word defensible) +- [ ] Open To Work flipped on (recruiters only) +- [ ] LinkedIn search for "AI Engineer Canada" surfaces my profile in top 50 +- [ ] First recruiter inreach within 2 weeks of Open To Work flip +- [ ] First technical screen by week 8 (2 weeks after flip) + +--- + +## Decision Log + +| Date | Decision | Reason | +|------|----------|--------| +| 2026-05-13 | Skip $125 NVIDIA cert payment | Senior engineers don't need "Fundamentals" cert; portfolio speaks louder | +| 2026-05-13 | Headline: "AI Engineer \| RAG · Agents · LLM Infrastructure \| 10+ years in software engineering" | Capability-focused, no project URL, no "Local LLM" niche | +| 2026-05-13 | About: capability-led, not project-led | Senior engineer About reads as skills, not portfolio site | +| 2026-05-13 | Don't flip Open to Work yet | 1 month buffer to add Python + LangChain + RAGAS + multi-agent before recruiter spam | +| 2026-05-13 | Add textstack as separate Experience entry (Self-employed, AI Engineer) | Recruiter search filters by Experience title — needs that entry to be findable | +| 2026-05-13 | "Share profile updates" already OFF | Profile changes don't broadcast to colleagues | diff --git a/PLAN-ai-portfolio.md b/PLAN-ai-portfolio.md new file mode 100644 index 00000000..e05b65fd --- /dev/null +++ b/PLAN-ai-portfolio.md @@ -0,0 +1,226 @@ +# TextStack — AI Portfolio Roadmap + +**Fixed**: 2026-05-15 · **Target**: pre-Oct 2026 launch · **Mode**: AI-engineering portfolio + product differentiator + +## Why this plan + +Project goal is twofold: (a) ship paying-customer product by Oct 2026, (b) build a serious AI-engineering portfolio. Existing AI surfaces (Explain, Translate, Distractor/Hint/Explanation gen via Ollama, SEO generation via Claude CLI, prompt injection sanitizer, immutable replay for SEO jobs) are already production-grade and underused as portfolio material. This plan sequences the next moves without sacrificing the Oct 2026 deadline. + +**Hard rule**: nothing here ships before mobile feature-parity on Google Play. Without users, AI features are demos. + +--- + +## Sequence + +| # | Step | Duration | Why now | +|---|------|----------|---------| +| 1 | **Mobile feature-parity + Google Play launch** | 3–4 weeks | Without mobile, no paying customers, no real users for AI features | +| 2 | **Observability + eval on existing AI** | 1 week | Free portfolio uplift; required before adding new AI | +| 3 | **Podcast generation (MVP)** | 1 week | Killer differentiator, viral-friendly, simple stack, uses existing Edge TTS | +| 4 | **RAG "Ask this book"** | 2–3 weeks | Deep AI feature with pgvector + hybrid retrieval | +| 5 | **Podcast voice upgrade (optional)** | 1 day | Swap Edge TTS → ElevenLabs or OpenAI `tts-1-hd` for quality | + +--- + +## Step 1 · Mobile Google Play launch *(unchanged — already in flight)* + +Out of scope for this doc. See existing mobile track in `PLAN-presale-8w.md`. + +--- + +## Step 2 · Observability + eval on existing AI + +Goal: turn the "I shipped 5 AI features" into "I shipped 5 AI features with eval + observability". This is what separates mid+ from junior on interviews. + +**What to build:** + +- Log every LLM call (Explain, Translate, Distractor, Hint, Explanation, BookMetadata, SEO) with: input, output, model, latency, token cost, IP/user, cache hit/miss. +- Admin page `AI Quality`: + - Recent calls table, filter by surface (explain / translate / …) + - Manual rating buttons: good / bad / needs-fix + - Cost dashboard (per surface, per day) +- Eval dataset (~100 examples per surface): + - Inputs + expected concepts the output should include + - Run on every prompt change (CLI: `dotnet run --project tools/AiEval`) + - Outputs: pass rate, regression diff vs previous run +- LLM-as-judge for soft criteria (faithfulness, helpfulness) — use Claude as judge over gpt-5-mini outputs + +**Acceptance**: prompt change blocked from merging unless eval suite passes. Talking point on interview: "every AI surface has a regression-tested prompt". + +--- + +## Step 3 · Podcast generation (MVP) + +Goal: "Listen to DDIA Chapter 5 as a 20-min podcast". Killer differentiator, no one else does this for the technical-books niche. Plays into "Finish English technical books" positioning. + +### Architecture + +``` +backend/src/Domain/Entities/ + PodcastGenerationJob.cs // queued / processing / ready / failed + Podcast.cs // EditionId, Mp3Path, Duration, ScriptJson, CreatedAt + +backend/src/Worker/Services/ + PodcastWorkerService.cs // polls queue, runs pipeline + +backend/src/Application/Podcast/ + ScriptGenerator.cs // LLM → JSON dialogue + PodcastSynthesizer.cs // multi-voice TTS + FFmpeg stitch + +backend/src/Api/Endpoints/ + PodcastEndpoints.cs // GET /api/books/{slug}/podcast(.mp3|/script) +``` + +### Pipeline + +1. **Chunk content** — collect `Chapter.PlainText` per edition. For large books: summarize each chapter to 500–800 words via gpt-5-mini → "book brief" ~5–10K words. +2. **Generate script** — prompt gpt-5-mini / Claude to produce a 20-min dialogue: + - Host (curious, asks questions) + Expert (knowledgeable, explains) + - Output strict JSON: `[{speaker: "host"|"expert", text: string, pause_after_ms: number}]` + - Aim for ~3000–5000 words of dialogue +3. **Synthesize each line** — call existing `EdgeTtsService` per line: + - Host = `en-US-AriaNeural`, Expert = `en-US-GuyNeural` + - Parallelize per line, cache per SHA256(line + voice) +4. **Stitch with FFmpeg** — concat reps with 300–500ms pauses, normalize with `loudnorm`, output mp3 ~30–50MB. +5. **Store & serve** — + - `data/storage/books/{editionId}/podcast.mp3` + - `data/storage/books/{editionId}/podcast-script.json` + - Endpoint streams mp3 with `Range` header support +6. **Reader integration** — `🎧 Listen` button on book detail and reader; player + transcript with click-to-jump. +7. **Mobile** — RN audio player with lock-screen controls (`expo-av` or `react-native-track-player`). + +### Cost per podcast (30 min) + +- LLM script gen: ~$0.05 (gpt-5-mini) +- TTS: $0 (Edge TTS) +- Storage: 30–50MB +- For full 1500-book corpus pre-generation: ~$75 + ~75GB disk + +### Acceptance + +- Generate podcast for one technical book (e.g. DDIA) end-to-end +- Plays smoothly on web + mobile +- Lock-screen controls work on Android +- Admin can re-trigger generation + +### Marketing payoff + +Short demo video for Twitter/Dev.to: "Listen to DDIA Chapter 5 as a podcast — generated on the fly, free, with TextStack". This is the post that gets reshared. + +--- + +## Step 4 · RAG "Ask this book" + +Goal: user reading DDIA can tap `Ask` in reader → chat that answers from (a) chapters they've read so far, (b) their own highlights/notes across all books. With citations and jump-to-chapter. + +### Constraints / rules + +- **Hand-rolled in C# with Npgsql.** No LangChain / LlamaIndex / agents. Two SQL queries + one prompt. Showing you understand RAG is more impressive than importing it. +- **Spoiler-safe**: only retrieve from chapters with `reading_progress >= chapter_end`. +- **Private corpus per user**: user highlights + notes are part of retrieval (unique angle nobody else has). + +### Stack + +- **pgvector** on existing Postgres (one migration, no new infra) +- **Embeddings**: `nomic-embed-text` via existing Ollama (free, local, portfolio bonus). Fallback to `text-embedding-3-small` if Ollama unavailable. +- **Chunking**: paragraph-level with 50–100 word window, 1 sentence overlap. Store `embedding`, `chapter_id`, `paragraph_index`, `text`. +- **Hybrid search**: existing Postgres FTS + cosine similarity, combined via **Reciprocal Rank Fusion** (`RRF score = sum(1 / (k + rank_i))`, k=60). Blog-post-worthy talking point. +- **LLM answer**: gpt-5-mini, streamed via SSE. + +### Schema additions + +```sql +CREATE EXTENSION IF NOT EXISTS vector; + +CREATE TABLE chapter_embeddings ( + id BIGSERIAL PRIMARY KEY, + chapter_id UUID NOT NULL REFERENCES chapters(id) ON DELETE CASCADE, + paragraph_ix INT NOT NULL, + text TEXT NOT NULL, + embedding VECTOR(768) NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +CREATE INDEX ON chapter_embeddings USING hnsw (embedding vector_cosine_ops); + +CREATE TABLE highlight_embeddings ( + id BIGSERIAL PRIMARY KEY, + highlight_id UUID NOT NULL REFERENCES highlights(id) ON DELETE CASCADE, + user_id UUID NOT NULL, + embedding VECTOR(768) NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +``` + +### Retrieval flow + +``` +1. Embed user question via Ollama nomic-embed-text +2. Two retrievals (parallel): + a) chapter_embeddings WHERE chapter_id IN (already_read_chapters_for_user_in_book) + ORDER BY embedding <=> query LIMIT 20 + b) Postgres FTS on chapter.plain_text scoped to same chapters +3. RRF combine → top 5 chunks +4. Same retrieval against highlight_embeddings WHERE user_id = current_user + → top 3 personal highlights +5. Prompt gpt-5-mini with: question + 5 chunks + 3 highlights + instruction + "answer only from provided context, cite chapter+paragraph" +6. Stream response via SSE; client renders citations as jump-to-chapter links +``` + +### Eval setup + +- 30 hand-crafted Q&A pairs against DDIA +- Metrics: `recall@5` (did right chunk show up?), `faithfulness` (LLM-as-judge: does answer rely on retrieved context?), `latency p95` +- Run on every prompt or retrieval change + +### Acceptance + +- Ask answers grounded questions from DDIA with chapter citations +- Spoilers blocked (verified by test) +- Personal highlights surface when relevant +- Eval suite green +- Streaming UI on web + mobile + +--- + +## Step 5 · Podcast voice upgrade (optional, post-launch) + +Swap `ITtsService` Edge implementation → ElevenLabs or OpenAI `tts-1-hd` behind a flag for podcast generation only (regular TTS stays on Edge — it's free and fine for in-reader use). + +- ElevenLabs: ~$1.50 per 30-min podcast, NotebookLM-level quality +- OpenAI `tts-1-hd`: ~$0.15 per 30-min podcast, decent quality + +Trigger only when worth it — e.g. for featured books, or as a paid tier. + +--- + +## What's explicitly NOT in this plan + +- **LangChain / LlamaIndex / agent frameworks** — hand-rolled is more impressive in 2026. +- **General "chat with any book"** — breaks "deep reading" positioning, weakens the spoiler-safe story. +- **Multi-modal (images, video)** — out of scope. +- **Voice cloning of the user / custom narrators** — fun, but adds zero portfolio weight. +- **Fine-tuning custom models** — wrong layer for this project. + +--- + +## Portfolio talking points (collected for resume / interviews) + +After this plan is done: + +1. **Production AI system in C#/.NET** with 7+ LLM surfaces, multi-model architecture (OpenAI + Ollama + Claude CLI), prompt injection defense, audit trail with immutable replay. +2. **Observability + eval pipeline** — every prompt change gated by regression tests with LLM-as-judge. +3. **Hand-rolled RAG** with pgvector, hybrid retrieval via RRF, private per-user corpus, spoiler-safe scoping. +4. **Audio AI pipeline** — content summarization → dialogue generation → multi-voice synthesis → FFmpeg post-processing, all from existing infra. +5. **Real users on real product** — Google Play app, paying customer target. + +This is the resume of someone who builds AI in production, not someone who imports it. + +--- + +## Open questions to answer before Step 3 starts + +- Pre-generate podcasts for the 15–20 curated AI-engineering corpus, or on-demand per book? +- Per-chapter podcasts (short, scoped) vs whole-book podcasts (long, big-picture)? — probably both, start with whole-book. +- Free for all users, or gated to logged-in / paid? +- Transcript-as-SEO: serve `podcast-script.json` rendered as HTML for SEO crawlers? (likely yes — free SEO win) diff --git a/SEO_FIX_TASK.md b/SEO_FIX_TASK.md new file mode 100644 index 00000000..c27f91db --- /dev/null +++ b/SEO_FIX_TASK.md @@ -0,0 +1,182 @@ +# SEO Index Drop — Investigation Task for Claude Code + +**Context for Claude Code**: Vasyl ran a Cowork analysis on 2026-05-19 against Ahrefs Site Audit, Google Search Console, and the live site. Index dropped sharply after **2026-05-12**. This document is the handoff — it lists symptoms, the root cause hypothesis, and the exact files/commands to verify and fix. + +## Symptoms (observed, not assumed) + +1. **GSC `Page indexing` (textstack.app, last update 2026-05-14)** + - Indexed: **331** + - Not indexed: **3,320** across 11 reasons + - Top reasons: + - `Excluded by 'noindex' tag` — 2,112 (mostly reader/library — intentional) + - `Crawled — currently not indexed` — **639** (Google quality demotion) + - `Not found (404)` — **149** + - `Page with redirect` — 108 + - `Soft 404` — **83** + - `Server error (5xx)` — 76 + - `Duplicate, Google chose different canonical` — 65 + +2. **Ahrefs Site Audit (crawl 2026-05-19)** + - `404 page` — **130**, all are author URLs e.g. `/en/authors/william-makepeace-thackeray/`, `/en/authors/d-h-lawrence/`, `/en/authors/arnold-bennett/`, `/en/authors/margaret-oliphant/`, `/en/authors/ambrose-bierce/`, `/en/authors/kenneth-grahame/`, `/en/authors/ethel-voynich/`, `/en/authors/j-j-connington/`, `/en/authors/thomas-de-quincey/`, `/en/authors/harry-harrison/` (and 120 more) + - All 130 are linked from `/en/books/...` detail pages + - `Page has broken JavaScript` — **1,815** referencing `/assets/index-Dj8T4aeH.js` (status was 404 during Ahrefs crawl, **200 now** → bundle hash changed during deploy and stale SSG HTML still referenced old name) + - `Duplicate pages without canonical` — 7 + - `Noindex page` — 1,412 (consistent with reader/library noindex routes; not the issue) + +3. **Live HTTP behavior verified from browser fetch (2026-05-19)** + - `https://textstack.app/en/books/dracula/` → 200, **`X-SEO-Render: spa`** — but `CLAUDE.md` says this URL MUST return `X-SEO-Render: ssg`. + - Same for `/en/authors/jane-austen/`, `/en/authors/`, `/en/genres/`. Every URL tested returned `spa` even with `User-Agent: Googlebot`. + - **Note**: browser `fetch()` may not propagate custom `User-Agent` headers — re-test with `curl -H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"` from a host that can reach the production server, NOT from inside Docker. + +4. **Sitemap vs internal links mismatch** + - `https://textstack.app/sitemaps/authors.xml` → only **56 authors** + - But book detail pages link to **130+ author URLs** that are NOT in the sitemap + - `william-makepeace-thackeray` is reachable in the browser (SPA renders it fine — there's an Author entity in the DB) but is not in the authors sitemap + +## Root-cause hypothesis (in priority order) + +### H1 — SSG dist is stale or missing for many authors (HIGH confidence) + +`infra/nginx/textstack.conf` routes bots through: +```nginx +location ~ ^/en/authors/[^/]+/?$ { + add_header X-SEO-Render $seo_render_tag always; + try_files $ssg_file @spa; # $ssg_file = /ssg$uri/index.html for bots +} + +location @spa { + if ($is_bot) { + return 404 '...'; # ← HARD 404 by design when SSG file missing + } + ... +} +``` + +If `data/ssg/en/authors/william-makepeace-thackeray/index.html` does not exist, Ahrefs/Googlebot get a real HTTP 404. This is **by design** (comment says "prevents Google Soft 404") but combined with author URLs being linked from book pages while NOT having SSG generated, it bleeds 404s. + +**Verify on the server**: +```bash +ssh into prod, then: +ls /home/vasyl/projects/onlinelib/textstack/data/ssg/en/authors/ | wc -l +ls /home/vasyl/projects/onlinelib/textstack/data/ssg/en/authors/william-makepeace-thackeray/index.html +# Also compare to authors that ARE in the sitemap: +ls /home/vasyl/projects/onlinelib/textstack/data/ssg/en/authors/jane-austen/index.html + +# Check ssg-worker is running +docker compose ps ssg-worker +docker compose logs --tail 200 ssg-worker + +# Check the periodic rebuild worker (Api hosts it: SsgPeriodicRebuildWorker) +docker compose logs api 2>&1 | grep -i ssg | tail -50 +``` + +Then check what changed in git between 2026-05-11 and 2026-05-19: +```bash +cd /Users/vasylvdovychenko/projects/textstack/textstack +git log --since=2026-05-11 --until=2026-05-19 --oneline +git log --since=2026-05-11 --until=2026-05-19 -- apps/web/scripts/prerender.mjs backend/src/Api/Services/SsgPeriodicRebuildWorker.cs infra/nginx/textstack.conf +``` + +### H2 — Book detail pages link to unpublished authors (HIGH confidence) + +`apps/web/sitemaps/authors.xml` only lists 56 published authors, but `BookDetailPage.tsx` (or wherever book→author links render) emits links to **all** authors associated with editions — including unpublished ones. Those orphan author URLs get crawled, hit `@spa`, and return 404 to bots. + +**Investigate**: +```bash +# Find where book pages render author links +grep -rn "authors/" apps/web/src/pages/ apps/web/src/components/ | grep -v "node_modules" | head -30 + +# Find sitemap generation logic (likely in Api) +grep -rn "sitemap" backend/src/Api/Endpoints/ | head -20 + +# Compare: which editions have authors that aren't in the sitemap? +# SQL on prod: +docker compose exec db psql -U app books -c " +SELECT a.slug, a.name, e.status AS edition_status +FROM authors a +JOIN edition_authors ea ON ea.author_id = a.id +JOIN editions e ON e.id = ea.edition_id +WHERE e.status = 'Published' + AND a.slug NOT IN ( + SELECT a2.slug FROM authors a2 + JOIN edition_authors ea2 ON ea2.author_id = a2.id + JOIN editions e2 ON e2.id = ea2.edition_id + WHERE e2.status = 'Published' + GROUP BY a2.slug + -- Whatever filter the sitemap uses + ); +" +# (Refine the query against the actual sitemap query in backend code.) +``` + +### H3 — Stale SSG bundle reference from a deploy on/around 2026-05-12 (MEDIUM) + +Ahrefs found 1,815 pages referencing `/assets/index-Dj8T4aeH.js` (404 during their crawl, 200 now). Vite emits hashed filenames; each rebuild changes the hash. If `make rebuild-ssg` runs from an old `apps/web/dist/` without rebuilding the SPA first, the prerendered HTML pins an old hash. On the next SPA rebuild the old `index-*.js` is gone → bots see broken JS → soft-404. + +**Investigate**: +```bash +git log --since=2026-05-11 --oneline -- Makefile apps/web/scripts/prerender.mjs .github/workflows/deploy.yml + +# Look at the deploy.yml sequence — does it build apps/web BEFORE rebuilding SSG? +cat .github/workflows/deploy.yml +``` + +## Suggested fix order (verify each before moving on) + +1. **`make rebuild-ssg`** on prod. Single command. Should regenerate every SSG file from current DB + current `apps/web/dist`. After it finishes: + ```bash + curl -sI -H "User-Agent: Googlebot" https://textstack.app/en/books/dracula/ | grep -i x-seo-render + # Expect: x-seo-render: ssg + curl -sI -H "User-Agent: Googlebot" https://textstack.app/en/authors/jane-austen/ | grep -i x-seo-render + # Expect: x-seo-render: ssg + ``` + If this alone restores `ssg`, the root cause was either a missed rebuild or a deploy-order bug. + +2. **Stop emitting orphan author links.** In whatever component renders the author chip/link on book pages, only render `` when the author has a published page (or check the same condition the sitemap uses). Otherwise render plain text. File to inspect first: `apps/web/src/pages/BookDetailPage.tsx`, and any author-link sub-component. + +3. **Add publish gating to authors mirror what sitemap uses.** Decide a single source of truth (probably `IsPublished` flag on Author entity OR `EXISTS(published edition)` check), then use it in three places: (a) sitemap generator, (b) SSG prerender list in `apps/web/scripts/prerender.mjs`, (c) book→author link rendering. + +4. **Re-examine `@spa` hard-404 for bots.** The comment claims it prevents soft-404, which is correct for a bot hitting a *truly* nonexistent URL. But it's currently firing for *valid* URLs that just don't have SSG yet. Two safer options: + - Make `@spa` for bots return a properly rendered minimal HTML with the page's H1 + canonical (i.e., serve the SPA shell but with server-injected ``, `<h1>`, `<meta description>`, breadcrumb — enough for Google not to call it Soft 404). + - Or, ensure SSG covers 100% of indexable URLs by tying `SsgPeriodicRebuildWorker` to the sitemap query. + +5. **Fix the deploy race.** In `.github/workflows/deploy.yml`, confirm sequence is: build `apps/web` → `docker compose up` → THEN trigger SSG rebuild. If SSG runs against an older `apps/web/dist`, you'll keep getting stale JS-bundle references. + +6. **Submit to IndexNow + GSC re-validate.** After fixes deploy: `make rebuild-ssg`, then in GSC click "Validate fix" on each of the failed reasons (Not found 404, Soft 404, Crawled-not-indexed). + +## Quick health check Claude Code should run first + +```bash +cd /Users/vasylvdovychenko/projects/textstack/textstack + +# 1. What deployed between May 11–19? +git log --since=2026-05-11 --until=2026-05-19 --stat | head -200 + +# 2. Does the local dist look fresh? +ls -lah apps/web/dist/assets/ | head -20 +grep -r "index-" apps/web/dist/index.html + +# 3. What does prerender.mjs use as its URL list? +sed -n '1,80p' apps/web/scripts/prerender.mjs + +# 4. Where do book pages link to authors? +grep -rn "authors/" apps/web/src/pages/BookDetailPage.tsx apps/web/src/components/ | head +``` + +## Files most likely involved + +| Concern | File | +|---|---| +| SSG prerender list | `apps/web/scripts/prerender.mjs` | +| Periodic rebuild trigger | `backend/src/Api/Services/SsgPeriodicRebuildWorker.cs` | +| nginx 404 behavior for bots | `infra/nginx/textstack.conf` (`location @spa`) | +| Sitemap generation | `backend/src/Api/Endpoints/` (search for `sitemap`) | +| Book → author link | `apps/web/src/pages/BookDetailPage.tsx` and any `AuthorLink`/`AuthorChip` component | +| Deploy order | `.github/workflows/deploy.yml`, `Makefile` (`rebuild-ssg` target) | + +## Done criteria + +- `curl -sI -H "User-Agent: Googlebot" https://textstack.app/en/books/dracula/` returns `X-SEO-Render: ssg` +- Ahrefs re-crawl shows `404 page` ≤ 5 (some 404s are legitimate) +- GSC "Validate fix" passes on Not found (404) and Soft 404 +- Sitemap count of authors matches the count of author URLs linked from indexable pages diff --git a/SEO_INDEX_DROP_REPORT.md b/SEO_INDEX_DROP_REPORT.md new file mode 100644 index 00000000..e13bd31e --- /dev/null +++ b/SEO_INDEX_DROP_REPORT.md @@ -0,0 +1,51 @@ +# Отчёт: почему упал индекс textstack.app + +Дата анализа: 2026-05-19. Источники: Ahrefs Site Audit (crawl 19-05-2026), Google Search Console (Page indexing, last update 14-05-2026), прямые HTTP-запросы к textstack.app, исходники проекта в `/Users/vasylvdovychenko/projects/textstack/textstack`. + +## Главный вывод одной фразой + +Боты получают **реальный HTTP 404** на десятках валидных страниц авторов, потому что эти страницы линкуются с книжных страниц, но никогда не пререндерятся в SSG, а nginx по дизайну отдаёт ботам жёсткий 404 на любой SSG-промах. Google это видит, помечает страницы как Not found / Soft 404, и заодно демотирует ссылающиеся на них книги. Отсюда же 639 страниц в категории «Crawled — currently not indexed». + +## Что видно в Google Search Console + +На 14 мая в индексе сидит всего **331 страница**, а **3 320 страниц не проиндексировано**. Если разложить причины, картина такая. Самая большая категория — 2 112 страниц `Excluded by 'noindex' tag`, это ридер, библиотека, highlights и прочее приватное, оно так и должно быть. Дальше 639 страниц `Crawled — currently not indexed` — Google зашёл, посмотрел и решил не индексировать; обычно это сигнал низкого качества или соотношения шум/контент. Потом 149 страниц `Not found (404)`, 108 `Page with redirect`, 83 `Soft 404`, 76 `Server error (5xx)` и около сотни дубликатов канониклов в сумме. Soft 404 и Crawled-not-indexed — это и есть симптом того, что бот видит у вас «пустую» или «сломанную» страницу. + +## Что показал Ahrefs + +Свежий crawl Site Audit нашёл **130 страниц с кодом 404**, и все 130 — это URL'ы авторов вида `/en/authors/william-makepeace-thackeray/`, `/en/authors/d-h-lawrence/`, `/en/authors/arnold-bennett/` и так далее. Ahrefs пришёл на них с книжных страниц — в столбце «First found at» везде стоит `https://textstack.app/en/books/...`. Кроме того, у Ahrefs стоит 1 815 страниц с пометкой `Page has broken JavaScript`, и копаясь в деталях видно, что они все ссылаются на `/assets/index-Dj8T4aeH.js`. На момент crawl'а Ahrefs этот файл отдавал 404, сейчас отдаёт 200 — это след деплоя, во время которого Vite пересобрал бандл с новым хешем, а SSG-HTML остался прибит к старому имени файла. Также Ahrefs насчитал 7 страниц без canonical и 182 страницы со ссылками на сломанное. + +## Что отдаёт сервер сейчас + +Я фетчил живые страницы из браузера. Каждый URL — `/en/books/dracula/`, `/en/authors/jane-austen/`, `/en/authors/`, `/en/genres/` — возвращает `200 OK` и заголовок `X-SEO-Render: spa`. По вашей же `CLAUDE.md` нормальное значение для книжной/авторской/жанровой страницы должно быть `ssg`. То, что мы получили `spa`, означает одно из двух: либо nginx не опознал User-Agent как бота (браузерный fetch иногда не пробрасывает кастомный User-Agent), либо SSG-файлов реально нет на диске и `try_files` падает в `@spa`. Это нужно перепроверить с прода через `curl -H "User-Agent: Googlebot"` без браузера — это первая команда, которую стоит запустить, чтобы понять, в какую ветку nginx уходит для бота. + +В sitemap `/sitemaps/authors.xml` у вас **56 авторов**, а Ahrefs нашёл **130 URL'ов авторов**, на которые ссылаются книжные страницы. Разница в 70+ — это и есть «висячие» авторы: они существуют как сущности в БД (страница в браузере открывается, SPA её рендерит из API), но нигде не помечены как «опубликованные», поэтому не попадают ни в sitemap, ни в список URL'ов для пререндера, ни в авторитетную выборку, и SSG-файла под них на диске нет. + +## Корневая причина и почему она убивает индекс + +В `infra/nginx/textstack.conf` стоит вот такая логика. Для каждого индексируемого пути есть блок `try_files $ssg_file @spa`, где `$ssg_file` для ботов раскрывается в `/ssg{uri}/index.html`, а для людей — в `/nonexistent`. Если файл существует, бот получает статический HTML и заголовок `X-SEO-Render: ssg`. Если файла нет, оба уходят в `@spa`, и там стоит ветка: + +```nginx +if ($is_bot) { + return 404 '<!DOCTYPE html>... 404 — Page Not Found ...'; +} +``` + +Комментарий рядом честно объясняет: «This prevents Google Soft 404» — мол, лучше честный 404, чем SPA-shell без контента. Логика правильная для несуществующих URL, но она срабатывает и для **существующих** страниц, которым просто не достался SSG. То есть `/en/authors/thackeray/` валидна (SPA её рендерит), но если её нет в `data/ssg/`, бот получает hard 404. Дальше Google убирает её из индекса и заодно ухудшает рейтинг книжных страниц, которые на неё линкуют. + +Что точно сломалось около 12 мая, я по этим данным определить не могу — нет доступа к git log и логам docker compose с продакшна. Но окно сходится: 12 мая был деплой, во время которого Vite пересобрал бандл (старое имя `index-Dj8T4aeH.js`, новое уже другое), SSG не догнался и какое-то время отдавал HTML со ссылкой на удалённый JS-файл. Этого хватило, чтобы Google пометил пачку страниц как Soft 404, и они ушли из индекса. + +## Что делать (порядок) + +Прежде всего нужно подтвердить с прода `curl -sI -H "User-Agent: Googlebot" https://textstack.app/en/books/dracula/` и `curl -sI -H "User-Agent: Googlebot" https://textstack.app/en/authors/jane-austen/`. Если оба возвращают `X-SEO-Render: ssg` — значит, проблема только в авторах не из sitemap, и достаточно правок в пункте 2. Если оба возвращают `spa` или 404, то SSG лежит целиком, и сначала нужен `make rebuild-ssg` + разбирательство с `ssg-worker` и `SsgPeriodicRebuildWorker`. + +Дальше нужно решить вопрос «висячих» авторов. Самое прозрачное — на стороне рендера книжной страницы не линковать на автора, если он не в той же выборке, что и sitemap. Альтернатива — расширить выборку SSG-пререндера, чтобы покрывать всех авторов, которых издаёт хотя бы одна опубликованная edition. Какой бы из двух путей вы ни выбрали, источник правды должен быть один: и sitemap, и `prerender.mjs`, и компонент `AuthorLink` обязаны спрашивать у одного и того же предиката «этот автор публичный?». + +Третья правка — пересмотреть `@spa` для ботов. Сейчас он бьёт 404 на всё, чему не повезло с SSG. Можно либо отдавать SPA-shell с серверно-инжектированным `<title>`, `<h1>`, `<meta description>` и breadcrumb (этого хватит, чтобы не было Soft 404), либо гарантировать, что SSG покрывает 100% индексируемых URL и `@spa` остаётся только для реально несуществующих путей. + +И последнее: в `.github/workflows/deploy.yml` нужно убедиться, что порядок шагов — сборка `apps/web` → `docker compose up` → **затем** SSG rebuild. Если SSG бежит против старой `apps/web/dist/`, вы будете снова и снова получать stale ссылки на JS-бандл. + +После фиксов: `make rebuild-ssg`, потом в GSC нажать «Validate fix» на категориях Not found (404), Soft 404, Crawled-not-indexed. Восстановление займёт от пары дней до пары недель, в зависимости от того, как быстро Google перекраулит. + +## Технический бриф для Claude Code + +Развёрнутый план с конкретными командами проверки, SQL-запросами и списком файлов уже лежит в репозитории: `SEO_FIX_TASK.md`. Запустите Claude Code в корне проекта и скажите: «прочитай `SEO_FIX_TASK.md` и начни с Quick health check, потом иди по H1». diff --git a/Vasyl-Personal-Projects.docx b/Vasyl-Personal-Projects.docx new file mode 100644 index 00000000..412afd66 Binary files /dev/null and b/Vasyl-Personal-Projects.docx differ diff --git a/Vasyl-Personal-Projects.pdf b/Vasyl-Personal-Projects.pdf new file mode 100644 index 00000000..757094dd Binary files /dev/null and b/Vasyl-Personal-Projects.pdf differ diff --git a/Vasyl-Resume-v2.pdf b/Vasyl-Resume-v2.pdf new file mode 100644 index 00000000..ef34bca5 Binary files /dev/null and b/Vasyl-Resume-v2.pdf differ diff --git a/Vasyl-Resume-v3.pdf b/Vasyl-Resume-v3.pdf new file mode 100644 index 00000000..d84759e2 Binary files /dev/null and b/Vasyl-Resume-v3.pdf differ diff --git a/Vasyl-Resume.docx b/Vasyl-Resume.docx new file mode 100644 index 00000000..b5c52148 Binary files /dev/null and b/Vasyl-Resume.docx differ diff --git a/Vasyl-Resume.pdf b/Vasyl-Resume.pdf new file mode 100644 index 00000000..ef34bca5 Binary files /dev/null and b/Vasyl-Resume.pdf differ diff --git a/_linkedin-migration/MIGRATE.md b/_linkedin-migration/MIGRATE.md new file mode 100644 index 00000000..06ec3b54 --- /dev/null +++ b/_linkedin-migration/MIGRATE.md @@ -0,0 +1,288 @@ +# LinkedIn project — migration bundle + +Prepared in the **textstack** Cowork project on 2026-05-15 (during the daily-x-reply-game scheduled-task session). Targets the new **linkedin** Cowork project that Vasyl created. + +## How to use this bundle + +**Vasyl:** open the **linkedin** project in Cowork, assign it a workspace folder (suggested: `/Users/vasylvdovychenko/projects/linkedin/` — create it first via Finder or `mkdir -p`), then start a new chat in that project and paste this single message: + +> Read `/Users/vasylvdovychenko/projects/textstack/textstack/_linkedin-migration/MIGRATE.md` and execute the migration plan inside it exactly. Confirm each step as you go. At the end, summarize what was created and what's still pending my action. + +The rest of this file is instructions for the linkedin-project Claude. Do not paste anything below this line — just point Claude at this file. + +--- + +# Claude (linkedin project): execute these steps + +You are running inside the freshly created **linkedin** Cowork project for Vasyl Vdovychenko. Your job is to set up this project for personal-brand work (LinkedIn comment-game + future career content) based on the artifacts and decisions made in the sibling **textstack** project. + +## Context + +This project's purpose: **build Vasyl's personal brand**. Audience: AI engineers, .NET dev community, recruiters, hiring managers. The neighbouring **textstack** project handles product PR for TextStack. These two projects should stay separate to avoid context-bleed — the worst-case mistake is folding TextStack-promotional references into LinkedIn content, which tanks reach with the LinkedIn audience. + +Vasyl is a solo developer in Toronto. Primary languages C#/.NET 10 and JavaScript/TypeScript. AI engineering interest. He's also the founder of TextStack (textstack.app, open-source AGPL-3.0 reader for English technical books with local-LLM features) — but in **this** project, TextStack is **not** the subject. Mention it only when asked directly. + +## Step 1 — Write three memory files + +Your memory directory exists at this Cowork space's path (use the absolute path the Cowork harness exposes — typically under `~/Library/Application Support/Claude/local-agent-mode-sessions/.../spaces/<this-space-id>/memory/`). Write these three files using the Write tool. If the memory directory doesn't exist yet, write to the path; if there's no MEMORY.md, create one too. + +### File 1: `user_vasyl.md` + +```markdown +--- +name: User profile - Vasyl +description: Solo developer in Toronto, primary languages C#/.NET 10 and JS/TS, building personal brand on LinkedIn alongside running TextStack as a product side-project. +type: user +--- + +Vasyl is a solo developer based in Toronto / Eastern Time. Email: mrviduus@gmail.com. + +**Background:** +- Primary languages: C# (.NET 10) and JavaScript / TypeScript +- AI engineering interest — building local-LLM features in production, observability and eval discipline +- Non-native English speaker (Ukrainian background based on conversational Russian/Ukrainian style) +- 10+ years dev experience per LinkedIn headline; current title "AI Engineer | RAG · Agents · LLM Infrastructure" + +**Communication style:** prefers Russian/Ukrainian for casual conversation, English for work output (LinkedIn comments, posts, articles). Wants direct, honest feedback — not yes-man responses. Values pragmatism: when given a choice between "ideal but slow" and "good enough but fast", leans toward the second unless it would burn a one-shot opportunity. + +**This project's scope:** personal brand on LinkedIn and adjacent career-side surfaces (eventually maybe newsletter, conference pitches). Product-side work (TextStack PR, X reply-game) lives in the sibling **textstack** Cowork project. +``` + +### File 2: `feedback_linkedin_personal_brand.md` + +```markdown +--- +name: LinkedIn is personal-brand only, no TextStack PR +description: On LinkedIn, comments and posts build Vasyl's personal brand only — no TextStack production-numbers references, no "we" framing tying back to the product. Different from X strategy. +type: feedback +--- + +On LinkedIn, all comments and posts should build Vasyl's **personal brand only**. Do not weave in TextStack production numbers, "we shipped X" framing, or anything that reads as product PR — even when it would technically fit the topic. + +**Why:** Vasyl flagged this directly on 2026-05-15 when reviewing a LinkedIn comment draft that referenced TextStack. His exact words: "в linkedin мы не делаем грязного пр для text stack только персональный бренд." LinkedIn audience (recruiters, hiring managers, AI/.NET decision-makers) responds to authority and perspective, not to founder-led product mentions; the latter looks promotional and tanks reach. Career-side visibility is the LinkedIn goal, not TextStack adoption. + +**How to apply:** +- LinkedIn comments: speak as a senior AI engineer with opinions and war stories, but never name TextStack or cite TextStack-specific numbers. +- LinkedIn posts (when those exist): same rule. +- This is the **opposite** of the X reply-game (lives in the textstack project), where 1 reply per session IS allowed to drop TextStack prod numbers. +- If a comment would only work with the TextStack reference, drop the comment entirely rather than rewriting around it — speak generically about CPU-only deployment, local LLM tradeoffs, etc., without naming the product. +- For founder/product war stories on LinkedIn, frame them generically: "shipped a side project where..." instead of "shipped TextStack with...". +``` + +### File 3: `project_linkedin_scope.md` + +```markdown +--- +name: LinkedIn project scope +description: This Cowork project covers LinkedIn comment-game + personal-brand career content (newsletter, conference pitches later). Sibling textstack project handles product PR. Created 2026-05-15. +type: project +--- + +This project's scope is **personal brand on LinkedIn and adjacent career surfaces**. Specifically: + +- LinkedIn comment-game routine (Mon/Wed/Fri, scheduled task `daily-linkedin-comment-game`) +- Eventual: LinkedIn posts in Vasyl's own voice (senior AI engineer authority) +- Eventual: dev.to articles framed for career credibility (separate from TextStack-founder voice) +- Eventual: conference talk pitches, recruiter/hiring outreach material + +**Out of scope (lives in sibling textstack project):** +- TextStack codebase, marketing, SSG/SEO +- X reply-game (TextStack-promotional) +- dev.to articles in founder voice +- TextStack AI portfolio roadmap (mobile, podcast MVP, RAG, etc.) + +**Why two projects:** Context-bleed risk. Mixing personal-brand and product-PR memory in one project led to TextStack references creeping into LinkedIn drafts. Vasyl flagged this on 2026-05-15 and we split. + +**Workspace folder:** assigned during this project's creation. +``` + +### Then: `MEMORY.md` index + +```markdown +- [User profile - Vasyl](user_vasyl.md) — solo dev in Toronto, C#/.NET + JS/TS, AI engineering +- [LinkedIn = personal brand only, no TextStack PR](feedback_linkedin_personal_brand.md) — never name TextStack on LinkedIn; opposite of X strategy +- [LinkedIn project scope](project_linkedin_scope.md) — comment-game + career content; product work stays in textstack project +``` + +## Step 2 — Copy the LinkedIn playbook into this workspace + +The full playbook (target tribe, tone calibration, output format, constraints) was already written in the textstack project. Copy it as-is into this project's workspace: + +```bash +mkdir -p <workspace>/docs/marketing/linkedin-routine +cp /Users/vasylvdovychenko/projects/textstack/textstack/docs/marketing/linkedin-routine/README.md <workspace>/docs/marketing/linkedin-routine/README.md +``` + +(Replace `<workspace>` with this project's assigned workspace folder.) + +After copying, edit the copied README to remove the line **"## TextStack reference rules — LinkedIn ≠ X"** intro paragraph that says "This is the **opposite** of the X reply-game" → reword to "TextStack must not be mentioned in this project at all" — the new project's perspective is that TextStack doesn't exist here. The rules themselves (zero mentions, no URLs) stay the same. + +## Step 3 — Create a CLAUDE.md for this workspace + +Write `<workspace>/CLAUDE.md` with the following content: + +```markdown +# CLAUDE.md — linkedin project + +This Cowork project covers **Vasyl's personal brand on LinkedIn and adjacent career surfaces**. + +## Scope + +- LinkedIn comment-game (Mon/Wed/Fri scheduled task `daily-linkedin-comment-game`) +- LinkedIn posts in personal voice (when ready) +- Career-side dev.to articles, newsletter drafts, conference pitches, recruiter material + +## What this project is NOT + +This project is **not** for TextStack work. TextStack-related codebase, marketing, X reply-game, SEO, and founder-voice content all live in the sibling **textstack** Cowork project at `/Users/vasylvdovychenko/projects/textstack/textstack/`. + +**Critical rule:** never mention TextStack by name, never cite TextStack production numbers (gemma4:e2b, 30GB VPS, p95 ~20ms, etc.), never include github.com/mrviduus/textstack or textstack.app URLs in any output of this project. If a topic would only work with a TextStack reference, drop the topic entirely. + +## Key files + +- `docs/marketing/linkedin-routine/README.md` — playbook (target tribe, tone, output format) +- `docs/marketing/linkedin-routine/YYYY-MM-DD.md` — daily drafts, saved by the scheduled task +- `docs/marketing/campaign-tracker.md` — cumulative LinkedIn log (create on first run) + +## Scheduled task + +`daily-linkedin-comment-game` runs Mon/Wed/Fri at 10:00 AM Toronto time. It drafts 3-5 LinkedIn comments for Vasyl to review and post manually — never posts autonomously, never sends connection requests autonomously. + +## Voice + +Speak as a senior AI engineer with opinions and war stories — peer-to-peer with the LinkedIn audience (AI/.NET decision-makers, recruiters, hiring managers). Not founder-voice. Not motivational. No LinkedIn-AI shibboleths ("Great post!", "Couldn't agree more!", inspirational closing questions). +``` + +## Step 4 — Register the scheduled task + +Use the `create_scheduled_task` MCP tool with these parameters: + +- **taskId:** `daily-linkedin-comment-game` +- **description:** `LinkedIn comment-game routine for Vasyl Vdovychenko — Mon/Wed/Fri scan target accounts, draft 3-5 substantive comments, save for user approval. Personal-brand only — no TextStack PR.` +- **cronExpression:** `0 10 * * 1,3,5` +- **prompt:** see the full prompt below + +Full scheduled-task prompt (copy verbatim): + +``` +LinkedIn comment-game routine for Vasyl Vdovychenko. Mon/Wed/Fri, ~30 min. Goal: build a niche professional audience (AI engineers + .NET / dev community + recruiters/hiring managers) through substantive comments on others' posts. + +# CRITICAL — personal brand only + +This project does NOT promote TextStack. Never name TextStack, never cite TextStack production numbers (gemma4:e2b, 30 GB CPU VPS, p95 ~20ms, 63k-request load test, etc.), never include github.com/mrviduus/textstack or textstack.app URLs. Speak as a senior AI engineer with opinions and war stories — peer-to-peer authority, not founder-voice product PR. If a comment would only work with a TextStack reference, drop the comment entirely. + +If you find yourself wanting to write "we shipped X" or "in production we run Y" — rewrite to generic framing ("a side project I shipped..." / "deployments I've seen in the wild...") or pick a different post to comment on. + +# EXECUTION STEPS + +## 1. Pre-flight check + +If Chrome MCP is unavailable, write a single-line markdown file at `<workspace>/docs/marketing/linkedin-routine/YYYY-MM-DD.md` (today's date in Toronto/EDT timezone) saying "Chrome MCP unavailable — skipped today's LinkedIn session" and exit. + +If Chrome MCP works: +- Open a tab via tabs_context_mcp with createIfEmpty=true +- Navigate to https://www.linkedin.com/feed/ +- Verify the logged-in account is Vasyl Vdovychenko. If not logged in or wrong account, write the file with "LinkedIn session not authenticated — skipped" and exit cleanly. + +## 2. Read the playbook + +Read `<workspace>/docs/marketing/linkedin-routine/README.md` for the current target tribe, tone calibration, and constraints. The playbook is the source of truth; this prompt is the procedural shell. + +## 3. Scan target accounts (10 min) + +Direct profile scans of ~6-8 target accounts from the playbook's Tier A (.NET ecosystem) and Tier B (AI engineering) lists. For each: + +- Navigate to `https://www.linkedin.com/in/[handle]/recent-activity/all/` (or to their profile → "See all activity" → "Posts") +- Scan top 3-5 posts. Identify candidates meeting ALL of: + - Posted in the last 48 hours + - Has under ~100 comments (your comment still has surface area) + - Contains a claim, real question, opinion, counter-perspective, metric, or debugging story + - Topic relevant to: AI engineering, local LLM, .NET / TypeScript / React Native, OSS maintenance, indie dev, build-in-public + - NOT pure promo / launch announcement / motivational content / political content + +If fewer than 5 candidates from direct profile scans, supplement by scrolling the home feed for 5 minutes. Skip if it's all promotional / motivational. + +## 4. Draft comments (12 min) + +For each chosen post, draft a 200-600 character comment (2-4 sentences) per the playbook's tone rules: + +- First sentence stands alone (LinkedIn collapses by default) +- Add concrete value: data point, counter-perspective, real-world experience, or thoughtful question +- War stories OK but framed generically — never "TextStack" or "our 30GB VPS"; use "a side project I shipped where..." or "a deployment I worked on..." instead +- NO "Great post!", NO "Couldn't agree more!", NO hashtags in comments, NO sign-offs like "Cheers, Vasyl", NO motivational platitudes, NO obvious AI tells +- No emoji unless the parent post's tone explicitly invites it +- Peer-to-peer, technical, direct + +## 5. Save drafts to file (3 min) + +Save the day's drafts to `<workspace>/docs/marketing/linkedin-routine/YYYY-MM-DD.md`. Format: + +```markdown +# LinkedIn comment-game drafts — [DATE] + +Generated by daily-linkedin-comment-game scheduled task. +N candidates selected. Pending user review and per-comment approval before posting. + +--- + +## Candidate 1 — [Name] ([Title at Company]) + +**Source post:** [LinkedIn URL] +**Posted:** [N hours ago] +**Excerpt:** "..." (2-3 lines of the original) + +**Draft comment:** +> [comment text — 200-600 chars] + +**Why this adds value:** [one line of reasoning] + +--- + +[continue for 2-5 candidates] +``` + +## 6. Check reciprocity (3 min) + +Visit Vasyl's recent comments and notifications. Find replies to prior session comments. Draft continued replies under a "## Continued conversations" section in the same day's file. Note any new connection requests in a "## Pending invitations" section — do not accept autonomously. + +## 7. Cumulative tracking + summary (2 min) + +Append a one-line summary to `<workspace>/docs/marketing/campaign-tracker.md` under `## LinkedIn comment-game log` (create the section if it doesn't exist). + +Output a 4-line plain-text summary back to user: + +``` +LinkedIn comment-game ready for [DATE]. +N candidates drafted, M continued conversations, K pending invitations. +Top pick: [Name] — [one-line reason]. +File: docs/marketing/linkedin-routine/YYYY-MM-DD.md +``` + +# CONSTRAINTS — never violate + +- NEVER post a comment autonomously. Each requires explicit per-message user approval. +- NEVER accept or send connection requests autonomously. +- NEVER react / like autonomously. +- NEVER engage with political, religious, geopolitical, or controversy content. +- NEVER name TextStack or cite TextStack production numbers. +- NEVER include github.com/mrviduus/textstack or textstack.app URLs. +- NEVER use emojis unless parent post explicitly invites it. +``` + +(Note for the linkedin-project Claude creating this task: replace `<workspace>` literals in the prompt with the actual workspace folder path before calling `create_scheduled_task`. Do not leave the literal `<workspace>` in the registered prompt.) + +## Step 5 — Confirm and report + +After completing steps 1-4: + +1. List the memory files you wrote (paths + names) +2. Confirm the playbook was copied +3. Confirm CLAUDE.md was written +4. Confirm the scheduled task was registered (taskId + cron + nextRunAt) +5. Note any failures or items that need Vasyl's manual action (e.g., workspace folder selection, LinkedIn login state) + +# Optional follow-ups (not part of migration — do not do these unless Vasyl asks) + +- Update the playbook's Tier B list — Vasyl mentioned @mudler_it (LocalAI maintainer) is worth adding +- Establish a `docs/career/` folder for CV, recruiter outreach, conference pitches (empty for now) +- Set up a separate scheduled task for weekly LinkedIn analytics check if Vasyl wants metric tracking diff --git a/alternativeto-submission.md b/alternativeto-submission.md new file mode 100644 index 00000000..a7dafbb5 --- /dev/null +++ b/alternativeto-submission.md @@ -0,0 +1,184 @@ +# AlternativeTo.net submission package + +AlternativeTo has Domain Authority 78. Listing TextStack as alternative to popular apps gives you 3-5 backlinks from one of the highest-authority software-discovery sites. Each "alternative to X" page has its own SEO real estate. + +## Prerequisites (you do this once) + +1. Go to https://alternativeto.net/ and click "Sign Up" (top right) +2. Create account with mrviduus@gmail.com — verify email +3. Login + +## Submission flow + +After login, go to https://alternativeto.net/profile/submit-software/ (or the homepage will show a "Submit a new app" button). + +The form will ask for the fields below. Copy-paste from each section. + +--- + +## Field-by-field copy-paste + +### Application name +``` +TextStack +``` + +### Tagline / short description (~150 chars max) +``` +Reader for technical books with LLM-powered context-aware term explanations and a capped weekly spaced repetition queue. +``` + +### Long description (~500 chars) +``` +TextStack is a reading tool for developers tackling dense technical books like Designing Data-Intensive Applications, ML papers, or distributed systems texts. Tap any unfamiliar term, get a 2-3 sentence LLM-powered explanation that's aware of the book's domain — "attention" in an ML book returns the ML meaning, not the everyday one. Surfaced terms enter a capped weekly spaced repetition queue (no infinite Anki backlog). Self-hosted, AGPL-3.0, supports EPUB / PDF / FB2 uploads. +``` + +### Website URL +``` +https://textstack.app +``` + +### License / Cost +``` +Free, Open Source +``` + +(AlternativeTo accepts AGPL-3.0 as Open Source.) + +### License identifier (if asked) +``` +AGPL-3.0 +``` + +### Application Type / Category +Pick all that apply (the form usually allows multiple): + +- **eBook Reader** (primary) +- **Read It Later Tool** +- **Language Learning Tool** (secondary — for the SRS angle) +- **Bookmark Manager** (secondary — for highlights) + +If only one is allowed, pick **eBook Reader**. + +### Platforms +Check all: + +- ☑ Web / Online +- ☑ Self-Hosted (if available as option) +- ☑ Android +- ☑ iPhone / iOS (TestFlight, but list it — App Store coming) +- ☑ Linux (self-hostable on any Docker host) +- ☑ Mac (via Docker) +- ☑ Windows (via Docker) + +### Tags (free-form, comma-separated) +``` +ebook reader, technical books, spaced repetition, srs, llm, vocabulary, self-hosted, agpl, open source, kindle alternative, lingq alternative, readwise alternative, ai engineering, learning tool, dotnet, react, expo +``` + +### Source code URL +``` +https://github.com/mrviduus/textstack +``` + +### License URL +``` +https://github.com/mrviduus/textstack/blob/main/LICENSE +``` + +### Logo +Upload `docs/assets/hero.png` from your repo, OR generate a square 512×512 logo. AlternativeTo prefers square logos with the app icon, not the full hero banner. + +If you don't have a square logo yet: +- Make one from the textstack.app favicon (top-left corner of every page) +- Or use a screenshot of the reader's icon on iOS/Android + +### Screenshots (upload 3-5) +Take screenshots of: +1. Reading view with a term tapped showing the explanation popup +2. Vocabulary SRS review (flashcard or multiple choice) +3. Library page showing the book grid +4. Stats page (heatmap calendar) +5. Mobile app screenshot + +--- + +## After basic submission — list as alternative to existing apps + +Once your TextStack page is live (~24h moderation review usually), go to each of these pages and click "+ Add as alternative" → select TextStack: + +### Primary alternatives (high-traffic pages) + +1. **Readwise** — https://alternativeto.net/software/readwise/ + - Search "Readwise" → click into its page → scroll to alternatives → "Add Alternative" + - Why: Readwise users want highlight management; TextStack's highlights + SRS are similar + - Note in your submission: "Open-source AGPL alternative with built-in LLM term explanations for technical books" + +2. **LingQ** — search "LingQ" on AlternativeTo, click into page + - Why: LingQ is for language learning via reading, TextStack adapts the model for technical learning + - Note: "For developers reading technical books rather than natural-language learners" + +3. **Calibre** — https://alternativeto.net/software/calibre/ + - Why: Calibre is the canonical self-hosted ebook library; TextStack is the modern reader-first cousin + - Note: "Modern reader with LLM term explanations and SRS, AGPL-3.0" + +### Secondary alternatives (nice to have) + +4. **Anki** — https://alternativeto.net/software/anki/ + - Why: TextStack's SRS engine + capped weekly queue is direct alternative for "Anki for reading vocabulary" + - Note: "SRS that auto-fills from books you read; capped queue prevents Anki-style backlog spiral" + +5. **Google Translate** — https://alternativeto.net/software/google-translate/ (translation feature) + - Note: "Translation built into a reader, with sentence-level context" + +6. **Pocket** / **Instapaper** — read-it-later tools + - Note: "Reader-first alternative; books not articles" + +7. **Wikipedia** — for term lookup + - Note: "Inline LLM explanations tied to the book's domain instead of generic Wikipedia" + +### Self-hosted-specific alternatives + +8. **Calibre Web** — https://alternativeto.net/software/calibre-web/ + - Why: most direct self-hosted competitor + - Note: "Reader-first; capped weekly SRS for vocabulary; LLM term explanations" + +9. **Kavita** — modern self-hosted reader + - Note: "Targeted at technical books; SRS + LLM explanations" + +10. **Komga** — self-hosted media server for comics/books + - Note: "Reading-focused alternative with vocabulary builder" + +--- + +## Why this gives you 5+ backlinks from one submission + +- The base TextStack page itself: 1 backlink with full description + Source Code + Website +- Each "alternative to" suggestion creates a backlink ON the alternative's page (when approved) +- 10 alternatives = 10 additional backlinks +- All from DA-78 alternativeto.net domain +- Listings tend to stick (rarely removed unless project dies) + +Total realistic: **3-8 backlinks live within 2-3 weeks of submission** (not all alternatives get approved, but most do). + +--- + +## Time estimate + +- Account creation: 5 min +- Initial app submission: 10 min +- Suggesting 5-8 alternatives: 15-20 min +- Total: ~30-40 min + +Wait time for moderation: 24-72 hours. + +--- + +## Pro tip — submit screenshots EARLY + +AlternativeTo apps with screenshots get ~3× more clicks than text-only listings. Don't skip this step. If you don't have polished screenshots ready: + +1. Open https://textstack.app on desktop (Chrome window 1280×800) +2. macOS Cmd+Shift+4 → press space → click window for clean screenshot of just the browser content +3. Save 4-5 screenshots covering: reader, term explanation, SRS review, library, stats +4. Upload all when submitting diff --git a/awesome-selfhosted-pr.md b/awesome-selfhosted-pr.md new file mode 100644 index 00000000..71acf5f7 --- /dev/null +++ b/awesome-selfhosted-pr.md @@ -0,0 +1,198 @@ +# PR to awesome-selfhosted (main, FOSS-only) + +Now eligible since you switched to AGPL-3.0 (OSI-approved). + +--- + +## ⚠️ ВАЖНО: блокер на 4 месяца + +awesome-selfhosted имеет жёсткое правило в CONTRIBUTING: + +> "Any software project you are adding was first released more than 4 months ago." + +У тебя сейчас **нет ни одного tagged release** (GitHub показывает "No releases published"). Если запушить PR прямо сейчас — мейнтейнеры закроют его автоматическим canned reply: + +> "Currently, this project has a release, but it is not yet 4 months old. Once the first release is four months old, feel free to resubmit." + +**Что делать прямо сейчас:** + +1. Создай release tag **сегодня**, чтобы запустить 4-месячный таймер: + + ```bash + cd /Users/vasylvdovychenko/projects/textstack/textstack + git tag -a v0.1.0 -m "Initial public release under AGPL-3.0 + + First tagged release. Project entered AGPL-3.0 with PR #201. + See CHANGELOG.md for details." + git push origin v0.1.0 + ``` + +2. На GitHub → Releases → "Draft a new release" → выбрать тег `v0.1.0` → опубликовать с release notes. + +3. Дата подачи в awesome-selfhosted: примерно **2026-09-04** (4 месяца от сегодня). + +В промежутке — подавайся в другие awesome-лист (см. секцию ниже), они принимают сразу. + +--- + +## Куда подавать (когда наступит время) + +**Главная важность**: PR идёт в **`awesome-selfhosted/awesome-selfhosted-data`**, НЕ в основной репо `awesome-selfhosted/awesome-selfhosted`. Основной автоматически генерируется из data-репо. + +URL для PR: https://github.com/awesome-selfhosted/awesome-selfhosted-data + +--- + +## Формат — YAML, не markdown + +Тебе нужно создать новый файл `software/textstack.yml` в data-репо. Содержание: + +```yaml +# software name +name: "TextStack" + +# URL of the software project's homepage +website_url: "https://textstack.app" + +# URL where the full source code of the program can be downloaded +source_code_url: "https://github.com/mrviduus/textstack" + +# description, shorter than 250 characters, sentence case +description: "Reader for technical books with LLM-powered context-aware term explanations and a capped weekly spaced repetition queue (alternative to Kindle Word Wise, LingQ)." + +# license identifiers — see licenses.yml in the data repo for the full list +licenses: + - AGPL-3.0 + +# languages/platforms — see platforms/ directory in data repo for the full list +platforms: + - C# + - Nodejs + - Docker + +# tags (categories) — see tags/ directory in data repo for full list +# IMPORTANT: pick the most fitting tag — in single-page mode software appears under the FIRST tag +# Likely candidates: "Note-taking & Editors" — verify exact name in tags/ directory before submitting +tags: + - Note-taking and Editors + +# software depends on a third-party service outside user's control +# TRUE because OpenAI API is required for the explanations feature +depends_3rdparty: true + +# link to an interactive demo +demo_url: "https://textstack.app" +``` + +### Что проверить в YAML перед submit + +1. **Tag name** — открой https://github.com/awesome-selfhosted/awesome-selfhosted-data/tree/master/tags и найди подходящий. "Note-taking and Editors" — моё лучшее предположение, но проверь точное имя файла. Если есть тег `Books` или `Reading` — он лучше. + +2. **License identifier** — `AGPL-3.0` должно быть в `licenses.yml`. Проверь по https://github.com/awesome-selfhosted/awesome-selfhosted-data/blob/master/licenses.yml + +3. **Platform names** — `C#`, `Nodejs`, `Docker` должны существовать в `platforms/` directory. Если у `C#` другое имя в их системе (например, `CSharp` или `dotnet`) — поменяй. + +4. **Description**: + - ✅ Под 250 символов (твоя ~190) + - ✅ Sentence case (заглавная только в начале) + - ✅ НЕ упоминает "open-source", "free", "self-hosted" — это implicit + - ✅ "Alternative to X, Y" в конце — есть + +--- + +## Что НЕ qualify (проверочный чеклист) + +awesome-selfhosted отказывает если: +- ❌ Software depends on a specific cloud provider — OK, ты на любом VPS работаешь +- ❌ Desktop/mobile/CLI app требующий отдельный server — OK, ты server-side +- ❌ Library/SDK requiring app code — OK, ты end-user app +- ❌ PaaS/platform — OK +- ❌ Generic container/deployment tool — OK +- ❌ Dockerization existing app — OK, original work + +TextStack проходит все критерии. + +--- + +## Шаги для PR (когда минут 4 месяца) + +1. Зайди на https://github.com/awesome-selfhosted/awesome-selfhosted-data +2. Открой папку `software/` +3. Кликни "Add file" → "Create new file" +4. Имя файла: `textstack.yml` (kebab-case) +5. Вставь YAML выше (с проверенными tag/license/platform именами) +6. Снизу: "Commit changes" → "Create a new branch for this commit and start a pull request" +7. PR title: `add TextStack` +8. PR body — минимально: + + ``` + Adding TextStack — a self-hosted reader for technical books that gives + LLM-powered context-aware explanations of unknown terms. + + - Demo: https://textstack.app + - Source: https://github.com/mrviduus/textstack + - License: AGPL-3.0 + - First release: v0.1.0 (released YYYY-MM-DD, more than 4 months ago) + - Documentation: https://github.com/mrviduus/textstack#readme + + Checklist: + - [x] Submit one item per issue + - [x] Searched existing issues and PRs + - [x] Not already listed in awesome-sysadmin or related + - [x] Actively maintained (commits in last week) + - [x] First release more than 4 months ago + - [x] Working installation instructions in README + ``` + +9. Submit и жди ~2-4 недели на review. + +--- + +## Что делать в эти 4 месяца — параллельные awesome-lists + +Эти не имеют 4-месячного правила, можно подавать сразу: + +### 1. awesome-readinglists / awesome-books +Поиск на GitHub: `topic:awesome topic:books`. Часто community-maintained, формат — обычный markdown. + +### 2. awesome-dotnet-applications +- URL: https://github.com/quozd/awesome-dotnet#applications (или https://github.com/oneapptiger/awesome-dotnet-core) +- Формат markdown, без 4-месячного правила +- Категория: end-user applications written in .NET +- TextStack quality: AGPL + active development = подойдёт + +### 3. awesome-llm-apps +- URL: https://github.com/Shubhamsaboo/awesome-llm-apps +- Категория: LLM applications in production +- TextStack использует OpenAI для explanations — fits + +### 4. awesome-dotnet (общий) +- URL: https://github.com/quozd/awesome-dotnet +- Формат markdown, секция "Applications" + +### 5. awesome-react-native (показать мобильное приложение) +- URL: https://github.com/jondot/awesome-react-native +- Секция: Apps and Examples + +### 6. awesome-aspnet-core +- URL: https://github.com/Kahbazi/awesome-aspnetcore-mvc +- Секция: Applications + +--- + +## Стратегия по времени + +**Сейчас (сегодня)**: +- Сделать release tag `v0.1.0` → стартует 4-месячный таймер для awesome-selfhosted +- Подать PR в awesome-llm-apps → быстрый принимающий список, AGPL ок + +**Через неделю**: +- Подать PR в awesome-dotnet (Applications секция) + +**Через 2 недели**: +- Подать PR в awesome-react-native (apps секция) + +**2026-09-04 (через 4 месяца)**: +- Подать YAML в awesome-selfhosted-data (главный приз) + +К сентябрю у тебя будет 3-4 backlinks с awesome-листов и main awesome-selfhosted на подходе. diff --git a/backend/src/Application/Common/Interfaces/IFileStorageService.cs b/backend/src/Application/Common/Interfaces/IFileStorageService.cs index 7e7c52da..2c83d33b 100644 --- a/backend/src/Application/Common/Interfaces/IFileStorageService.cs +++ b/backend/src/Application/Common/Interfaces/IFileStorageService.cs @@ -5,6 +5,8 @@ public interface IFileStorageService Task<string> SaveFileAsync(Guid entityId, string fileName, Stream content, CancellationToken ct = default); Task<string> SaveUserFileAsync(Guid userId, Guid userBookId, string fileName, Stream content, CancellationToken ct = default); Task<Stream?> GetFileAsync(string path, CancellationToken ct = default); + /// <summary>Cheap existence probe — does NOT open the file.</summary> + Task<bool> ExistsAsync(string path, CancellationToken ct = default); Task DeleteFileAsync(string path, CancellationToken ct = default); Task DeleteUserBookDirectoryAsync(Guid userId, Guid userBookId, CancellationToken ct = default); Task DeleteUserDirectoryAsync(Guid userId, CancellationToken ct = default); diff --git a/backend/src/Application/UserBooks/UserBookService.cs b/backend/src/Application/UserBooks/UserBookService.cs index 8a99e3ae..5375a084 100644 --- a/backend/src/Application/UserBooks/UserBookService.cs +++ b/backend/src/Application/UserBooks/UserBookService.cs @@ -338,6 +338,12 @@ public async Task<IReadOnlyList<UserBookListDto>> GetBooksAsync(Guid userId, Can if (bookFile is null) return (false, "No source file found"); + // Verify the backing file is actually still on disk. Without this + // guard the worker would happily queue a job that's destined to + // fail at extraction time and leave the book stuck in Processing. + if (!await storage.ExistsAsync(bookFile.StoragePath, ct)) + return (false, "Source file is missing from storage"); + // Create new ingestion job var job = new UserIngestionJob { diff --git a/backend/src/Extraction/TextStack.Extraction/Extractors/FrontMatterFilter.cs b/backend/src/Extraction/TextStack.Extraction/Extractors/FrontMatterFilter.cs index 178214d4..81482665 100644 --- a/backend/src/Extraction/TextStack.Extraction/Extractors/FrontMatterFilter.cs +++ b/backend/src/Extraction/TextStack.Extraction/Extractors/FrontMatterFilter.cs @@ -19,6 +19,39 @@ public static class FrontMatterFilter @"sommaire|inhaltsverzeichnis|índice|indice|sumário)\s*$", RegexOptions.IgnoreCase | RegexOptions.Compiled); + // A TOC entry line is "Chapter Title ... 47" — text + leader dots/ellipsis + // + a page number at the end. The leader can be ASCII "....", a run of + // spaced dots ". . . .", or "…" (U+2026, sometimes repeated). + private static readonly Regex TocLeaderLine = new( + @"(\.{3,}|(?:\.\s+){2,}\.|…(?:\s*…)*)\s*\d{1,4}\s*$", + RegexOptions.Compiled); + + // Back-matter titles that look like a TOC content-wise (leader dots + page + // numbers) but are legitimate reading content. Used to veto a positive + // LooksLikeTableOfContentsBody result. Index and Glossary in particular + // are the classic false-positive cases: "JavaScript ............ 47, 89". + // Top European languages covered explicitly; lookalike scripts (Russian / + // Ukrainian, German, French, Spanish, Italian, Portuguese). + private static readonly Regex BackMatterTitle = new( + @"^\s*(" + + @"index|glossary|bibliography|references|notes|" + + @"abbreviations|colophon|appendix|" + + // ru + @"индекс|глоссарий|библиография|примечания|приложение|" + + // uk + @"індекс|глосарій|бібліографія|примітки|додаток|" + + // de + @"glossar|literaturverzeichnis|anmerkungen|bibliographie|anhang|" + + // fr + @"glossaire|références|annexe|" + + // es / pt + @"índice|glosario|bibliografía|notas|referencias|apéndice|anexo|" + + @"glossário|bibliografia|referências|apêndice|" + + // it + @"indice|glossario|riferimenti|appendice" + + @")\s*$", + RegexOptions.IgnoreCase | RegexOptions.Compiled); + public static bool IsTableOfContents(string? title) { if (string.IsNullOrWhiteSpace(title)) return false; @@ -26,4 +59,41 @@ public static bool IsTableOfContents(string? title) var normalized = Regex.Replace(title, @"\s*\d+\s*$", "").Trim(); return TocTitle.IsMatch(normalized); } + + /// <summary> + /// True when the title matches a known back-matter section that we must + /// NOT drop even if its content-shape looks like a TOC (Index, Glossary, + /// Bibliography, etc.). + /// </summary> + public static bool IsKnownBackMatter(string? title) + { + if (string.IsNullOrWhiteSpace(title)) return false; + return BackMatterTitle.IsMatch(title.Trim()); + } + + /// <summary> + /// Content-level TOC detection — used when the bookmark title doesn't + /// match (e.g. page-split fallback labels the chapter "Pages 1–15"). + /// Takes the per-paragraph text BEFORE HTML conversion / typography + /// processing — that's where the structure we need still exists. (After + /// the pipeline, plainText is `\s+`-collapsed and there are no paragraph + /// boundaries left to count.) A chapter where ≥40% of substantive + /// paragraphs end in a leader-dot run plus a page number is + /// overwhelmingly likely to be a TOC. Threshold kept conservative; real + /// reading chapters almost never have 40% of paragraphs ending in "...47". + /// </summary> + public static bool LooksLikeTableOfContentsBody(IEnumerable<string>? paragraphTexts) + { + if (paragraphTexts is null) return false; + + var significant = paragraphTexts + .Where(s => !string.IsNullOrWhiteSpace(s)) + .Select(s => s.Trim()) + .Where(s => s.Length >= 4) + .ToList(); + if (significant.Count < 5) return false; + + var leaderLines = significant.Count(s => TocLeaderLine.IsMatch(s)); + return leaderLines * 100 >= significant.Count * 40; + } } diff --git a/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfPageTextExtractor.cs b/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfPageTextExtractor.cs index 3e82d1fb..8bf8a0b0 100644 --- a/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfPageTextExtractor.cs +++ b/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfPageTextExtractor.cs @@ -43,7 +43,12 @@ public static class PdfPageTextExtractor // paragraph (the "•" loses its line-break role when only y-gap is used). private static readonly HashSet<string> BulletGlyphs = new(StringComparer.Ordinal) { - "•", "●", "▪", "■", "◦", "○", "▫", "◆", "‣", "⁃", "►", "❖" + // Body bullets + "•", "●", "▪", "■", "◦", "○", "▫", "◆", "◇", "❖", "❍", + // Triangular / pointers + "‣", "⁃", "►", "▶", "▸", "▻", "➤", "➔", "➢", + // Checkmarks & stars used as list markers in modern books + "★", "☆", "✓", "✔", "✗", "✘", }; private static readonly Regex PageNumberPattern = new(@"^\d{1,4}$", RegexOptions.Compiled); @@ -237,14 +242,31 @@ private static List<List<List<Word>>> GroupLinesIntoParagraphs(List<List<Word>> var paragraphGapThreshold = baselineGap * ParagraphGapMultiplier; // Modal left margin: rounded so micro-jitter (sub-point alignment - // differences) doesn't disqualify a margin from being modal. + // differences) doesn't disqualify a margin from being modal. The + // single-threshold "modal ≥50%" guard flips behaviour on borderline + // pages (45% vs 55%); use a *dominance ratio* instead — modal must + // be at least 2.5× more common than the second-most-common margin. + // For a real 2-column page the top two are ~40%/~40% (ratio ≈1), + // for a single-column body it's ~85%/~5% (ratio ≈17). The cutoff + // sits well away from typical values on either side. var leftEdges = lines .Where(l => l.Count > 0) .Select(l => Math.Round(l.Min(w => w.BoundingBox.Left))) .ToList(); - var baseLeft = leftEdges.Count > 0 - ? leftEdges.GroupBy(e => e).OrderByDescending(g => g.Count()).First().Key - : 0.0; + double? baseLeft = null; + if (leftEdges.Count > 0) + { + var grouped = leftEdges + .GroupBy(e => e) + .OrderByDescending(g => g.Count()) + .ToList(); + var modal = grouped[0]; + var runnerUp = grouped.Count > 1 ? grouped[1].Count() : 0; + // Trust the modal margin when it dominates (single-column) OR is + // effectively the only margin (single-paragraph page). + if (runnerUp == 0 || modal.Count() >= runnerUp * 2.5) + baseLeft = modal.Key; + } var paragraphs = new List<List<List<Word>>>(); var currentParagraph = new List<List<Word>> { lines[0] }; @@ -257,7 +279,8 @@ private static List<List<List<Word>>> GroupLinesIntoParagraphs(List<List<Word>> var isYGapBreak = gap > paragraphGapThreshold; var isBulletBreak = StartsWithBulletGlyph(lines[i]); - var isIndentBreak = StartsWithIndent(lines[i], baseLeft); + var isIndentBreak = baseLeft.HasValue + && StartsWithIndent(lines[i], baseLeft.Value); if (isYGapBreak || isBulletBreak || isIndentBreak) { @@ -311,10 +334,19 @@ internal static bool StartsWithBulletGlyph(List<Word> line) internal static bool IsBulletPrefix(string? text) { if (string.IsNullOrEmpty(text)) return false; - if (BulletGlyphs.Contains(text)) return true; - // Some PDFs glue the bullet to the first word ("•You're"). - var firstChar = text[0].ToString(); - return BulletGlyphs.Contains(firstChar); + // Both checks operate on the FIRST CHARACTER so glued forms like + // "•You're" or "☑Item" are handled the same as standalone "•" / "☑". + var firstChar = text[0]; + var firstStr = firstChar.ToString(); + if (BulletGlyphs.Contains(firstStr)) return true; + // Generalization: first char in Unicode category "Symbol, Other" + // (So) — covers geometric shapes and dingbats from custom textbook + // fonts that aren't in our hardcoded set. Po (Punctuation Other) is + // deliberately excluded — it contains † ‡ § ¶ ※ which are footnote- + // reference markers, not paragraph starts. + return System.Globalization.CharUnicodeInfo.GetUnicodeCategory(firstChar) + == System.Globalization.UnicodeCategory.OtherSymbol + && !NoisePunctuation.Contains(firstStr); } private static string GetDominantFontName(List<Word> words) diff --git a/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfToHtmlConverter.cs b/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfToHtmlConverter.cs index 67948911..2ce7ff99 100644 --- a/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfToHtmlConverter.cs +++ b/backend/src/Extraction/TextStack.Extraction/Extractors/Pdf/PdfToHtmlConverter.cs @@ -13,7 +13,10 @@ public static (string Html, string PlainText) ConvertPages( IReadOnlyList<(int PageNumber, List<PdfTextElement> Elements)> pages) { var htmlBuilder = new StringBuilder(); - var plainBuilder = new StringBuilder(); + // (The plainText return comes from HtmlCleaner.Clean's pipeline — we + // used to also build it locally here in parallel and discard the + // result; dropped to avoid the implication that the local copy was + // somehow authoritative.) foreach (var (_, elements) in pages) { @@ -41,8 +44,6 @@ public static (string Html, string PlainText) ConvertPages( htmlBuilder.Append($"<p>{inline}</p>"); break; } - - plainBuilder.AppendLine(element.Text); } } diff --git a/backend/src/Extraction/TextStack.Extraction/Extractors/PdfTextExtractor.cs b/backend/src/Extraction/TextStack.Extraction/Extractors/PdfTextExtractor.cs index 1566ba1b..463dcc8d 100644 --- a/backend/src/Extraction/TextStack.Extraction/Extractors/PdfTextExtractor.cs +++ b/backend/src/Extraction/TextStack.Extraction/Extractors/PdfTextExtractor.cs @@ -141,14 +141,18 @@ private static ExtractionResult ExtractFromDocument( var chapter = chapters[chapterIdx]; var chapterNumber = chapterIdx + 1; + // TOC almost always sits in the front half of the book; Index / + // Glossary in the back. The guard prevents an Italian/Spanish + // "Indice/Índice" — same word means "Index" at the back and + // "TOC" at the front — from being mis-dropped when it's the Index. + var isFrontHalf = chapterIdx * 2 < chapters.Count; // Drop TOC chapters at extraction time. PDF TOCs come out as one // dense run of leader-dotted entries and we already build the - // reader-side TOC from the chapter list itself. Guard: never drop - // the only chapter — a single-chapter book literally titled - // "Contents" would otherwise vanish entirely (paranoid edge case - // raised in PR #244 bug report). - if (chapters.Count > 1 && FrontMatterFilter.IsTableOfContents(chapter.Title)) + // reader-side TOC from the chapter list itself. Two guards: + // • single-chapter book (don't disappear the only content); + // • chapter is in the front half (see ambiguous-word note above). + if (chapters.Count > 1 && isFrontHalf && FrontMatterFilter.IsTableOfContents(chapter.Title)) { warnings.Add(new ExtractionWarning( ExtractionWarningCode.ContentFiltered, @@ -201,6 +205,33 @@ private static ExtractionResult ExtractFromDocument( // genuine cross-chapter repetition like author bylines. var filteredPageElements = FilterRunningHeaders(pageElements); + // Content-level TOC drop happens BEFORE HTML conversion so: + // (a) we don't waste the HtmlCleaner pipeline on a chapter + // we're about to throw away, and + // (b) the detector works on raw paragraph texts — decoupled + // from PdfToHtmlConverter's markup choices. Three guards: + // • single-chapter book (don't disappear the only content); + // • chapter is in the back half (Index/Glossary live there + // and look exactly like TOC by content); + // • bookmark title matches a known back-matter section + // (Index, Glossary, Bibliography, …). + if (chapters.Count > 1 + && isFrontHalf + && !FrontMatterFilter.IsKnownBackMatter(chapter.Title)) + { + var paragraphTexts = filteredPageElements + .SelectMany(pe => pe.Elements) + .Where(e => e.Type == TextElementType.Paragraph) + .Select(e => e.Text); + if (FrontMatterFilter.LooksLikeTableOfContentsBody(paragraphTexts)) + { + warnings.Add(new ExtractionWarning( + ExtractionWarningCode.ContentFiltered, + $"Skipped Table of Contents chapter (content-detected): {chapter.Title}")); + continue; + } + } + // Convert to HTML var (html, plainText) = PdfToHtmlConverter.ConvertPages(filteredPageElements); if (string.IsNullOrWhiteSpace(plainText)) diff --git a/backend/src/Infrastructure/Services/LocalFileStorageService.cs b/backend/src/Infrastructure/Services/LocalFileStorageService.cs index e5474a7e..b691d254 100644 --- a/backend/src/Infrastructure/Services/LocalFileStorageService.cs +++ b/backend/src/Infrastructure/Services/LocalFileStorageService.cs @@ -103,6 +103,12 @@ public Task DeleteUserDirectoryAsync(Guid userId, CancellationToken ct = default return Task.FromResult<Stream?>(stream); } + public Task<bool> ExistsAsync(string path, CancellationToken ct = default) + { + var fullPath = GetFullPath(path); + return Task.FromResult(File.Exists(fullPath)); + } + public Task DeleteFileAsync(string path, CancellationToken ct = default) { var fullPath = GetFullPath(path); diff --git a/blog-draft-expo-google-play-2026.md b/blog-draft-expo-google-play-2026.md new file mode 100644 index 00000000..4fd9d27b --- /dev/null +++ b/blog-draft-expo-google-play-2026.md @@ -0,0 +1,448 @@ +--- +title: "Publishing an Expo App to Google Play in 2026: Four Gates Nobody Warned Me About" +slug: expo-google-play-android-developer-verification-2026 +date: 2026-05-14 +tags: [expo, react-native, android, google-play, eas-build] +status: draft +description: > + Android Developer Verification, package-name pre-registration, a token-file + marathon, and an Expo config plugin to survive prebuild. The pieces every + outdated tutorial leaves out — written the night I finally got TextStack + into Internal Testing. +--- + +> **TL;DR** +> +> Publishing a first Expo Android app to Google Play in 2026 is no longer +> "build AAB, click upload, done." Google rolled out **Android Developer +> Verification** ahead of its September 2026 mandate, and four undocumented +> (or under-documented) gates now sit between your EAS build and Internal +> Testing: +> +> 1. **Your dev account is probably on the wrong Google login.** Try `u/1`, +> `u/2` in the Play Console URL before you assume you need to register. +> 2. **Package names must be pre-registered** before `Create app` will +> accept them — and a "Draft" registration is not enough. +> 3. **Proving ownership requires an APK** (not the AAB you already have) +> with a specific token file in `assets/` and the same signing +> fingerprint. +> 4. **`expo prebuild` wipes `android/`**, so the token file vanishes. +> Solution: a tiny Expo config plugin using `withDangerousMod`. +> +> If you skim nothing else, jump to [The config plugin +> that fixes it](#the-config-plugin-that-fixes-it). + +--- + +## What I was trying to ship + +I'm working on [TextStack](https://textstack.app) — a reader for dense +technical books where you tap any term and get a domain-aware, +native-language explanation. The web app has been live for a while; what +I needed today was the **mobile companion** out the door, even if only to +four friends in Internal Testing. + +The mobile app is Expo 55 (React Native 0.83), TypeScript, file-based +Expo Router. Build pipeline: **EAS Build** producing a signed AAB. Total +"this should be easy" estimate: 30 minutes from `eas build` to a tester +installing on their phone. + +Real time: **about four hours and seven EAS builds.** + +Below is the map I wish I had at hour zero. + +--- + +## Gate 1: The dev account on a different Google login + +I opened `play.google.com/console` while signed into my main Google +account (`mrviduus@gmail.com`) and got the **"To get started, choose an +account type"** signup flow. Confusing — I knew I'd registered as a Play +developer years ago. + +The trick: Google Play Console keys the dev-account lookup off the +**account index in the URL**, not whatever's the active session. The +account index is the `u/N` segment: + +``` +https://play.google.com/console/u/0/developers ← mrviduus@gmail.com +https://play.google.com/console/u/1/developers ← vasyl.vdov@gmail.com ✓ +``` + +If you have multiple Google accounts signed into Chrome, Play Console +shows whichever one matches `u/N` — not necessarily the one most recently +used elsewhere. **Try `u/0`, `u/1`, `u/2` before assuming you need to +register.** + +Identity verification, by the way, is a separate one-time step that +takes 1–3 days for individuals and requires a government ID. Mine had +been done weeks earlier — easy to forget until you're staring at the +signup page wondering what went wrong. + +--- + +## Gate 2: Package names must be pre-registered + +After finding the right dev account, I clicked **Create app**, filled in +the form (`TextStack`, `app.textstack.mobile`, en-US, App, Free, accept +the three declarations — more on that below), clicked submit, and got: + +> You can't use this package name because it hasn't been registered. + +This wasn't in any 2023–2024 tutorial I'd seen. Old guides go "fill the +form, accept ToS, submit, done." In 2025 Google quietly added a new +left-nav item: **Android developer verification**. From the in-product +banner: + +> Starting in September 2026, all Android apps must be registered by +> verified developers in order to be installable on certified Android +> devices in select regions. + +The rollout is gradual but already enforced for new dev accounts. So +even though the public deadline is months away, you can't create your +first app until you've pre-registered the package name and proved you +own the signing key. + +### The two-step proof of ownership + +Open **Android developer verification → Register package name**. Enter +`app.textstack.mobile` and a friendly name. The package now sits in +**Draft** state with two unlocked tasks: + +1. **Select an eligible public key** — pick the SHA-256 fingerprint that + Google should associate with this package +2. **Sign and upload an APK** — prove you actually have the matching + private key + +#### Step 2.1: Selecting the eligible key + +I clicked **Select key** expecting an empty list (this is a new dev +account, after all). Instead, a fingerprint was already there: + +``` +9B:CC:0E:FF:68:26:AE:3C:23:FA:95:12:AC:4F:43:BD:CD:29:8D:60:CC:F7:C3:EC:28:A3:38:4C:42:9A:E1:D2 +``` + +That's the **EAS-managed upload key** fingerprint, automatically +populated because EAS had already produced a build artifact under this +package name. Google's ingestion pipeline records the signature of every +APK/AAB it sees, even via EAS infrastructure. Click the radio button, +**Add key**. + +#### Step 2.2: Sign and upload an APK + +Here's where it gets weird. The dialog says **"Sign and upload an +APK"** — and yes, it literally means **APK, not AAB**. The HTML file +input has `accept=".apk"`. EAS production builds default to AAB, so I +needed a separate APK build. + +Critically, the APK must contain a **unique token file** that proves the +APK was built specifically for ownership verification on your account: + +> 1. Copy the snippet below (unique to your account) +> 2. In your IDE, open the app's source tree +> 3. Inside the `assets` folder, create a file named exactly +> `adi-registration.properties` +> 4. Paste the snippet into the file +> 5. Build a release APK signed with the private key matching the +> fingerprint above +> 6. Upload it here + +The snippet looks like a base32-style nonce: `DP5ACMZ5E2B4MAAAAAAAAAAAAA` +(26 chars, account-specific). The full Google sample is at +[github.com/android/security-samples/.../AndroidDeveloperVerificationAPKSigningExample](https://github.com/android/security-samples/tree/main/AndroidDeveloperVerificationAPKSigningExample). + +This is where I lost three hours. + +--- + +## Gate 3: Three declarations, not two + +Side trip — most tutorials show **two** declarations on the Create app +form: + +- Play App Signing Terms of Service +- US export laws + +In 2026 there are **three**. The new one is at the top: + +- **Developer Program Policies** — "Confirm app meets the Developer + Program Policies" + +Miss it and the form re-renders with all three boxes scrolled out of +view and a single red line under the missed one. Easy to chase your own +tail looking for an "invisible" error. Scroll back to the top. + +--- + +## Gate 4: The token-file marathon + +This is the rabbit hole. Four EAS builds — each ~15 minutes — +before Play Console accepted the APK. + +### Build 4: token file missing + +EAS profile `preview` already produces an APK by default: + +```json +"preview": { + "distribution": "internal", + "android": { "buildType": "apk" } +} +``` + +So I ran `eas build -p android --profile preview`. After 15 minutes I +had a signed APK in `~/Downloads`. Dropped it into Play Console. + +> **The uploaded APK does not have the required token file.** + +I'd created `apps/mobile/android/app/src/main/assets/adi-registration.properties` +on disk before building, but the file wasn't in the APK. Why? + +```bash +git ls-files apps/mobile/android/ # → empty +cat apps/mobile/.gitignore | tail -2 +# generated native folders +# /ios +# /android +``` + +The `android/` directory is gitignored — meaning **EAS regenerates it +on every build via `expo prebuild`**, and anything I dropped in there +gets wiped before the actual `gradle assembleRelease`. + +### Build 5: trailing newline (27 bytes) + +To survive prebuild, I wrote my first Expo config plugin (full version +below) and added it to `app.json`. New build: + +> **The uploaded APK has an invalid token file.** + +Different error, but still failing. I copied the APK into my workspace +and unzipped the asset: + +```bash +unzip -p textstack-v1.0.0-build5-adi.apk \ + assets/adi-registration.properties | xxd +``` + +``` +00000000: 4450 3541 434d 5a35 4532 4234 4d41 4141 DP5ACMZ5E2B4MAAA +00000010: 4141 4141 4141 4141 4141 4141 0a AAAAAAAAAAA. +``` + +That `0a` at the end is `\n`. My plugin had `ADI_SNIPPET + '\n'` — a +27-byte file. Google's sample file +(`raw.githubusercontent.com/.../adi-registration.properties`) is **26 +bytes, no trailing newline**. Removed the `\n`. Rebuild. + +### Build 6: typo in the snippet (still 27 bytes) + +> **The uploaded APK has an invalid token file.** + +Same error. Confused, I unzipped again: + +``` +00000000: 4450 3541 434d 5a35 4532 4234 4d41 4141 DP5ACMZ5E2B4MAAA +00000010: 4141 4141 4141 4141 4141 41 AAAAAAAAAAA +``` + +Still 27 bytes. No newline this time — the **snippet itself** was +27 chars. I'd visually copied `DP5ACMZ5E2B4M` + "14 A's" instead of +13 A's. The trailing-A count is the kind of thing your eyes glide over. + +How did I finally verify the snippet correctly? I clicked the **copy +icon** next to the snippet in the Play Console dialog, then ran: + +```bash +pbpaste | xxd +pbpaste | wc -c +``` + +``` +00000000: 4450 3541 434d 5a35 4532 4234 4d41 4141 DP5ACMZ5E2B4MAAA +00000010: 4141 4141 4141 4141 4141 AAAAAAAAAA + 26 +``` + +There it is, byte-by-byte. Updated the plugin to **exactly 26 chars, +no newline**. One more build. + +### Build 7: pass + +> ✓ textstack-v1.0.0-build7-preview.apk + +Green checkmark. Submit. Status flips to **In review** — Google's email +confirmation arrives within a few hours. + +**Total cost of the token-file marathon: about 75 minutes of waiting on +EAS, plus the time to debug between attempts.** The lesson, in a sentence: +**verify the bytes inside the APK before clicking upload, every time.** + +--- + +## The config plugin that fixes it + +Save as `apps/mobile/plugins/with-adi-registration.js` and add +`"./plugins/with-adi-registration"` to your `app.json` `plugins` array. +Once Google verifies ownership, the plugin can be removed — the token +file only matters at verification time. + +```js +// Expo config plugin: writes assets/adi-registration.properties into the +// generated android/app/src/main/assets/ folder during `expo prebuild`. +// Required by Google Play "Android developer verification" to prove +// ownership of the package name. The snippet is unique to the Play +// Console account and is checked at upload time inside the +// Sign and upload an APK flow. + +const { withDangerousMod } = require('expo/config-plugins'); +const fs = require('fs'); +const path = require('path'); + +// 26-char token from Play Console "Sign and upload an APK" → +// "Copy the snippet". VERIFY this byte-for-byte with +// `pbpaste | wc -c` (must be 26). Easy to get wrong by visual copy — +// one extra/missing A and Google rejects with "invalid token file" +// after a 15-min build cycle. +const ADI_SNIPPET = 'DP5ACMZ5E2B4MAAAAAAAAAAAAA'; + +module.exports = function withAdiRegistration(config) { + return withDangerousMod(config, [ + 'android', + async (config) => { + const assetsDir = path.join( + config.modRequest.platformProjectRoot, + 'app', 'src', 'main', 'assets' + ); + fs.mkdirSync(assetsDir, { recursive: true }); + // Google compares the file content byte-for-byte — no trailing + // newline, no BOM, no surrounding whitespace. + fs.writeFileSync( + path.join(assetsDir, 'adi-registration.properties'), + ADI_SNIPPET, + 'utf8' + ); + return config; + }, + ]); +}; +``` + +The `withDangerousMod` hook runs after `expo prebuild` regenerates the +native folder, so files it writes survive into the actual gradle build. +Two minutes of plugin code, hours of pain avoided. + +--- + +## Sanity-check the APK before every upload + +Don't trust the build. Verify the bytes: + +```bash +APK=$(ls -t ~/Downloads/textstack-v*-preview.apk | head -1) + +# Should be 26 +unzip -p "$APK" assets/adi-registration.properties | wc -c + +# Should match the Play Console snippet exactly +unzip -p "$APK" assets/adi-registration.properties | xxd + +# Diff against the snippet you copied +diff <(unzip -p "$APK" assets/adi-registration.properties) \ + <(printf 'DP5ACMZ5E2B4MAAAAAAAAAAAAA') \ + && echo MATCH || echo MISMATCH +``` + +If `MATCH` and 26 bytes — drag the APK into Play Console. If +`MISMATCH` — fix the plugin, don't burn another build. + +--- + +## Once verification submits, the rest is normal + +After clicking **Submit** on the verification dialog, the package name +moves to **In review** status. Google's docs say up to 48 hours; mine +let me proceed with **Create app immediately** (the form stopped +red-X-ing `app.textstack.mobile`), and the email confirmation came +later. Your mileage may vary. + +From there, the Internal Testing setup is straightforward: + +1. **Create app**: Fill the same form (name, package, en-US, App, Free, + accept the three declarations). Submit. +2. **Internal testing → Create new release**: Drop the **production + AAB** (yes, AAB this time — different artifact than what you used + for verification) into the upload zone. Wait for Google's + distribution optimization (couple of minutes). +3. **Release notes**: The textarea expects XML language tags: + ```xml + <en-US> + TextStack 1.0.0 — Initial internal release. + • Browse public-domain books + • Offline reading, dictionary, translation, TTS + • Spaced-repetition vocabulary builder + </en-US> + ``` +4. **Testers tab → Create email list**: Comma-separated emails, Enter to + commit, Save. +5. **Releases → Save and publish**: One warning ("no deobfuscation file") + is informational — Expo doesn't run R8 by default. Click through. +6. **Copy opt-in URL** from "How testers join your test" and send to + testers. Each tester opens the URL on their Android phone (signed + into the same Google account that's in the email list), clicks + accept, and the app appears in the Play Store usually within an hour. + +Done. Real users on real phones from one EAS production AAB. + +--- + +## What I'd tell yesterday-me + +- **Check every Google account index** (`u/0`, `u/1`, `u/2`) in the + Play Console URL before assuming you don't have a dev account. +- **Pre-register the package name** as a first action, not when the + Create app form starts refusing. +- **Read every error literally.** "Does not have the required token + file" and "has an invalid token file" are different bugs. The first + is missing-file, the second is wrong-bytes. +- **Always verify bytes inside the APK** before uploading anything to + Play Console. `unzip -p ... | wc -c` should match what you expect. + This single habit would have saved me three 15-minute build cycles. +- **EAS prebuild wipes `android/`.** Any custom file you need in the + release artifact requires a config plugin. `withDangerousMod` is the + right hook for the simple cases. +- **Copy via the copy icon, not your eyes.** A 26-char string with + thirteen identical letters at the end will defeat your visual + counting every time. + +--- + +## What's next for TextStack + +Internal Testing is just the first track. To get the app onto the public +Play Store I still need to finish: + +- **App content**: privacy policy URL, data safety form, target + audience declarations +- **Store listing**: short and full descriptions, 8 phone screenshots, + a 1024×500 feature graphic, the 512×512 icon +- A **closed test** with broader feedback before requesting production + access + +Each is its own small adventure. I'll write those up if they turn out +to have hidden gates of their own. + +In the meantime, if you build dense technical books and want to try the +reader on the web first, the sample chapters are at +[textstack.app](https://textstack.app) — no signup. The mobile app is +in Internal Testing and rolling outward. + +Spent the evening. Got the app live. Wrote it down so the next person +won't. + +--- + +*Vasyl Vdovychenko — building [TextStack](https://textstack.app), +writing at [vasyl.blog](https://vasyl.blog), shouting on Twitter at +[@Rexetdeus](https://twitter.com/Rexetdeus).* diff --git a/blog-draft-textstack-ddia-reader-2026-05.md b/blog-draft-textstack-ddia-reader-2026-05.md new file mode 100644 index 00000000..341b7fa1 --- /dev/null +++ b/blog-draft-textstack-ddia-reader-2026-05.md @@ -0,0 +1,84 @@ +--- +title: "I gave up on Designing Data-Intensive Applications three times. So I built a reader to finish it." +date: 2026-05-19 +tags: [textstack, open-source, side-project, indie-hacking, language-learning, reading] +canonical_url: https://vasyl.blog/2026/05/19/i-gave-up-on-ddia-three-times/ +--- + +# I gave up on Designing Data-Intensive Applications three times. So I built a reader to finish it. + +I gave up on *Designing Data-Intensive Applications* three times. + +The third time, I built software to finish it. Six months later, that software is [TextStack](https://textstack.app) — open source, AGPL-3.0, free to use, free to self-host. + +This is the story of why, what I built, and what three weeks of real usage data taught me. + +## The friction + +The problem wasn't the math. It was vocabulary. + +Page 256 of DDIA uses "phantom" as a database isolation anomaly. The dictionary tells me it's a ghost. Google tells me it's a Rolls-Royce model. Kindle's Word Wise — same. + +Every chapter has 5-10 words like that. "Trip" doesn't mean a journey; in transactions context it means a specific kind of read-write conflict. "Lease" isn't a rental agreement; it's a distributed-systems timing primitive. "Fence" doesn't keep cattle in; it's a memory ordering constraint. + +For a native English speaker who works on databases, this is friction. For me — Ukrainian as a first language, English as a second — each lookup broke my concentration so completely that by the time I figured out what the word meant in context, I'd forgotten what the paragraph was about. + +I gave up on chapter 7 three times. The third time, I sat down and asked: what if the reader knew what book it was reading? + +## What I built + +TextStack is a reader that knows what book it's reading. + +Tap any word, and you get a 2-3 sentence explanation in the book's domain. "Phantom" in DDIA returns the database isolation meaning, not the ghost. "Trip" in a distributed systems book returns the read-write conflict meaning, not the journey. + +The rest of what's in there: + +- **Upload your own books** — EPUB, PDF, FB2 supported. No need to find them on TextStack's library; bring your own. +- **Vocabulary SRS** — words you tap get added to a spaced-repetition system with 5 stages (Recognition → Recall → Context → Mastered). I built it because Kindle's Vocabulary Builder is read-only; you can't actually drill the words you saved. +- **Built-in dictionary** — Free Dictionary API with phonetic pronunciation. +- **Translation via OpenAI** — for when a word makes no sense even in context. +- **Text-to-speech via Edge TTS** — through a direct WebSocket connection to Microsoft's service, no API key required. Two-layer caching so the same paragraph doesn't get re-synthesized. +- **Full-text search** across all your uploaded books — PostgreSQL FTS, not vector embeddings, because precision matters more than fuzziness for finding "the chapter where Kleppmann talks about lineage". + +Stack: ASP.NET Core (.NET 10) + PostgreSQL backend, React + React Native (Expo) frontend. Self-host with `docker compose up`, or try the hosted version at [textstack.app](https://textstack.app) without signup. + +Source code: [github.com/mrviduus/textstack](https://github.com/mrviduus/textstack). + +## Three weeks of clean data + +I ran TextStack quietly for six months. April 23rd I noticed Google Analytics showing 7,000 sessions a month with 1-second average engagement — turns out a directory had listed the site and was sending bot traffic. I removed it and the numbers normalized. + +Here's what three weeks of clean data looks like: + +- **25 unique users.** 19 new, 9 returning. +- **32 minutes** average engagement time per user. +- **8.2 sessions** per active user. +- **44 Google clicks** in the broader 3-month window. Position 60+ on most queries — I'm fighting Project Gutenberg, Standard Ebooks, and Goodreads for the same long-tail "free books" searches, and TextStack is six months old with no backlinks. + +Of those 25 users, most are people I shared the project with directly — friends, dev acquaintances, Twitter followers who clicked through. The nine organic strangers come from the US, Ireland, Pakistan, Colombia. Tiny absolute numbers, but globally distributed in exactly the demographic pattern I expected: non-native English speakers reading technical books in English. + +The engagement metric is what keeps me going. 32 minutes per user is not a "quick visit" pattern. The people who find TextStack actually use it. + +## What this taught me + +Three things I didn't expect when I started: + +**Niche audiences are real but hard to find.** My target is non-native English speakers reading technical books in English. Globally there are probably millions of them. But they're not concentrated anywhere — not one country, not one subreddit, not one Slack. Finding them one at a time is the actual hard problem, harder than the product. + +**Engagement metrics matter more than acquisition metrics at this stage.** I spent six months obsessing over SEO when I had 25 users. The SEO matters eventually, but you cannot bootstrap distribution from 25 users to 50,000 through SEO alone. You need a small group of people who love the product enough to talk about it before SEO compounds. + +**Open source attracts different people than free SaaS.** When I switched from BUSL to AGPL three weeks ago, the conversation around TextStack changed. The people who showed up after AGPL were more technical, asked about self-hosting, wanted to read the code. The free-tier-vs-paid mental model didn't apply. That changed how I think about distribution. + +## What's next + +Three months of focus, in order: + +1. **Get to 100 real users who chose TextStack over alternatives.** Not 100 sign-ups — 100 people who came back at least three times. Through community engagement, build-in-public on Twitter, posts on Indie Hackers and Dev.to. Direct conversation, not paid acquisition. +2. **Improve the metadata pages so they actually rank.** SEO backfill through Claude is already generating descriptions, themes, and FAQs for each book and author. Need to verify the quality and scale to all ~400 indexable pages. +3. **Build the chapter-by-chapter analysis layer** — unique value that doesn't compete with Project Gutenberg's text. If I have summaries, themes, and a tap-to-explain layer, I'm not duplicating their work; I'm adding to it. + +If you've ever quit a technical book, I'd love to hear what made you put it down. If it was vocabulary, TextStack might help — try it at [textstack.app](https://textstack.app) and tell me what didn't work. If it was something else, that's even more useful — comment below or [find me on Twitter](https://twitter.com/Rexetdeus). + +--- + +*TextStack is open source under AGPL-3.0. Source: [github.com/mrviduus/textstack](https://github.com/mrviduus/textstack). Live at [textstack.app](https://textstack.app).* diff --git a/blog-final-expo-google-play-2026.md b/blog-final-expo-google-play-2026.md new file mode 100644 index 00000000..262df767 --- /dev/null +++ b/blog-final-expo-google-play-2026.md @@ -0,0 +1,295 @@ +# Four Hidden Gates Between Your Expo Build and Google Play in 2026 + +Real time from `eas build` to my first tester on Google Play: **four hours and seven builds**. Google rolled out **Android Developer Verification** ahead of its September 2026 mandate, and the path from a fresh EAS-built AAB to an Internal Testing release no longer looks like the tutorials. Below is the map I wish I'd had at hour zero. + +## TL;DR + +Four undocumented (or under-documented) gates now sit between your Expo build and Internal Testing: + +1. **Your dev account is probably on the wrong Google login.** Try `u/1`, `u/2` in the Play Console URL before assuming you need to re-register. +2. **Package names must be pre-registered** before *Create app* will accept them — a "Draft" registration is **not** enough. +3. **Proving ownership requires an APK** (not the AAB you already have) containing a specific token file in `assets/`, signed with the same key. +4. **`expo prebuild` wipes `android/`**, so the token file vanishes from your APK. Solution: a tiny Expo config plugin using `withDangerousMod`. + +If you skim nothing else, the [config plugin below](#the-config-plugin-that-fixes-it) is the load-bearing piece. The single habit that would have saved me three 15-minute build cycles: **verify the bytes inside the APK before clicking upload, every time.** + +--- + +## What I was shipping + +I work on [TextStack](https://textstack.app) — a reader for dense technical books where you tap any term and get a domain-aware, native-language explanation. The web app has been live for a while; what I needed today was the **mobile companion** in front of four friends in Internal Testing. + +Stack: Expo 55, React Native 0.83, TypeScript, file-based Expo Router. Build pipeline: **EAS Build** producing a signed AAB. "This should be easy" estimate: 30 minutes from `eas build` to a tester installing on their phone. + +Real time: four hours, seven builds. + +--- + +## Gate 1 — The dev account on a different Google login + +I opened `play.google.com/console` while signed into my main Google account and got the **"To get started, choose an account type"** signup flow. Confusing — I'd registered as a Play developer years ago. + +The trick: Play Console keys the dev-account lookup off the **account index in the URL**, not whichever account is most active in the browser: + +``` +play.google.com/console/u/0/developers ← first Google account +play.google.com/console/u/1/developers ← second Google account ✓ +play.google.com/console/u/2/developers ← third +``` + +If you have multiple Google accounts signed into Chrome, Play Console renders whichever matches `u/N`. **Try every index before assuming you need to register.** + +Identity verification, by the way, is a separate one-time step (1–3 days for individuals, government ID required). Mine had been done weeks earlier — easy to forget you've already done it when you're staring at a signup page. + +--- + +## Gate 2 — Package names must be pre-registered + +After finding the right dev account I clicked **Create app**, filled in the form (app name, package, en-US, App, Free, accept the three declarations — more on those below), submitted, and got: + +> You can't use this package name because it hasn't been registered. + +This was in **no** 2023–2024 tutorial I'd seen. The old flow was "fill the form, accept ToS, submit, done." In 2025 Google quietly added a new left-nav item: **Android developer verification**. From the in-product banner: + +> Starting in September 2026, all Android apps must be registered by verified developers in order to be installable on certified Android devices in select regions. + +The rollout is gradual but already enforced for new dev accounts. Even though the public deadline is months away, you cannot create your first app until you've pre-registered the package name and proved you own the signing key. + +### The two-step proof of ownership + +Open **Android developer verification → Register package name**. Enter the package and a friendly name. The package now sits in **Draft** state with two unlocked tasks: + +1. **Select an eligible public key** — pick the SHA-256 fingerprint Google should associate with this package. +2. **Sign and upload an APK** — prove you actually have the matching private key. + +#### Selecting the eligible key + +I clicked **Select key** expecting an empty list (new dev account, after all). Instead, a fingerprint was already there: + +``` +XX:XX:XX:XX:XX:XX:XX:XX:…:XX:XX:XX:XX (your EAS upload key) +``` + +That's the **EAS-managed upload key** fingerprint, automatically populated because EAS had already produced a build artifact under this package name. Google's ingestion records signatures of every APK/AAB it sees, even via EAS infrastructure. Click the radio button, **Add key**. + +#### Sign and upload an APK — yes, APK, not AAB + +The dialog reads **"Sign and upload an APK"** — and literally means it. The HTML file input has `accept=".apk"`. EAS production builds default to AAB, so you need a **separate APK build** for this step. + +Crucially, the APK must contain a **unique token file** that proves it was built specifically for ownership verification on your account: + +1. Copy the snippet from the dialog (a ~26-char base32-style nonce, account-specific). +2. In your app's source tree, create `assets/adi-registration.properties` containing exactly that snippet — no trailing newline, no BOM, no whitespace. +3. Build a release APK signed with the private key matching the fingerprint above. +4. Upload it. + +Google's reference sample: [`android/security-samples/AndroidDeveloperVerificationAPKSigningExample`](https://github.com/android/security-samples/tree/main/AndroidDeveloperVerificationAPKSigningExample). + +This is the step that consumed my evening. + +--- + +## Gate 3 — Three declarations, not two + +A side trip — most tutorials show **two** declarations on the Create app form: + +- Play App Signing Terms of Service +- US export laws + +In 2026 there are **three**. The new one is on top: + +- **Developer Program Policies** — *"Confirm app meets the Developer Program Policies"* + +Miss it and the form re-renders with all three boxes scrolled out of view and one red line under the missed one. Easy to chase your tail looking for an "invisible" error. Scroll to the top. + +--- + +## Gate 4 — The token-file marathon + +This is where I lost three hours. Four builds, ~15 minutes each, before Play Console accepted the APK. + +EAS profile `preview` already produces an APK: + +```json +"preview": { + "distribution": "internal", + "android": { "buildType": "apk" } +} +``` + +So I ran `eas build -p android --profile preview`, waited 15 minutes, dropped the APK into Play Console. Each attempt failed differently: + +| Build | Result | Diagnosis | +|---|---|---| +| #4 | "does not have the required token file" | File missing entirely | +| #5 | "has an invalid token file" | File present, **27 bytes** (trailing `\n`) | +| #6 | "has an invalid token file" | **27 bytes** again — typo, 14 trailing A's instead of 13 | +| #7 | ✓ accepted | **26 bytes**, exact match | + +The lesson rephrased: read every error literally. *"Does not have"* and *"has an invalid"* are different bugs. + +### Why my file wasn't in the APK (build #4) + +I'd created `apps/mobile/android/app/src/main/assets/adi-registration.properties` on disk before building. But the file wasn't in the APK. Why? + +``` +$ cat apps/mobile/.gitignore +… +# generated native folders +/ios +/android +``` + +The `android/` directory is **gitignored** — meaning EAS regenerates it on every build via `expo prebuild`. Anything you drop in there gets wiped before `gradle assembleRelease` ever runs. + +### Why "26 bytes" mattered (builds #5 and #6) + +After my first config plugin landed the file in the APK, the error changed to *"invalid"*. I copied the APK into a working directory and unzipped the asset: + +```bash +unzip -p textstack-v1.0.0-build5-adi.apk \ + assets/adi-registration.properties | xxd +``` + +``` +00000000: <hex>… 0a …last byte 0x0a (newline) +``` + +My plugin had `ADI_SNIPPET + '\n'` — a 27-byte file. The Google sample file in the reference repo is **26 bytes, no trailing newline**. Removed the `\n`, rebuilt. + +Build #6 was still 27 bytes — this time the **snippet itself** was 27 chars. I'd visually copied the snippet, and miscounted the trailing A's by one. (Try counting "AAAAAAAAAAAAA" vs "AAAAAAAAAAAAAA" in a single glance. Your eyes won't.) The right way: + +```bash +# Click the copy icon in Play Console next to the snippet, then: +pbpaste | wc -c # → 26 (or whatever the spec says) +pbpaste | xxd # confirms exact bytes +``` + +26 bytes, byte-for-byte. Updated the plugin. Build #7 passed. + +**Cost of the marathon: ~75 minutes of waiting on EAS plus the debugging in between.** A single `unzip -p | wc -c` would have caught both mistakes before the upload. + +--- + +## The config plugin that fixes it + +Save as `apps/mobile/plugins/with-adi-registration.js` and add `"./plugins/with-adi-registration"` to your `app.json` `plugins` array. The `ADI_SNIPPET` constant is account-specific — copy yours from the Play Console dialog and verify with `pbpaste | wc -c` before building. + +```js +// Expo config plugin: writes assets/adi-registration.properties into the +// generated android/app/src/main/assets/ folder during expo prebuild. +// Required by Google Play "Android developer verification" to prove +// ownership of the package name. The snippet is unique to the Play +// Console account and is checked at upload time inside the +// Sign and upload an APK flow. + +const { withDangerousMod } = require('expo/config-plugins'); +const fs = require('fs'); +const path = require('path'); + +// Account-specific token from Play Console: +// "Sign and upload an APK" → "Copy the snippet". +// VERIFY this byte-for-byte (`pbpaste | wc -c`) — visual copy fails +// because the trailing-A run defeats human counting. +const ADI_SNIPPET = 'YOUR_ACCOUNT_SPECIFIC_SNIPPET_HERE'; + +module.exports = function withAdiRegistration(config) { + return withDangerousMod(config, [ + 'android', + async (config) => { + const assetsDir = path.join( + config.modRequest.platformProjectRoot, + 'app', 'src', 'main', 'assets' + ); + fs.mkdirSync(assetsDir, { recursive: true }); + // Google compares byte-for-byte — no trailing newline, no BOM, + // no surrounding whitespace. + fs.writeFileSync( + path.join(assetsDir, 'adi-registration.properties'), + ADI_SNIPPET, + 'utf8' + ); + return config; + }, + ]); +}; +``` + +`withDangerousMod` runs after `expo prebuild` regenerates the native folder, so files it writes survive into the gradle build. Two minutes of plugin code, hours of pain avoided. + +**Once Google verifies ownership the plugin can be removed** — the token file only matters during verification. + +--- + +## Sanity-check the APK before every upload + +Don't trust the build. Verify the bytes: + +```bash +APK=$(ls -t ~/Downloads/your-app-v*-preview.apk | head -1) + +# Should be exactly the expected length +unzip -p "$APK" assets/adi-registration.properties | wc -c + +# Should match the Play Console snippet exactly +unzip -p "$APK" assets/adi-registration.properties | xxd + +# Diff against the snippet you literally copied +SNIPPET="$(pbpaste)" +diff <(unzip -p "$APK" assets/adi-registration.properties) \ + <(printf '%s' "$SNIPPET") \ + && echo MATCH || echo MISMATCH +``` + +If MATCH and the byte count matches — drag the APK into Play Console. If MISMATCH — fix the plugin, don't burn another 15-minute build. + +--- + +## Once verification submits, the rest is normal + +After clicking **Submit** on the verification dialog, the package name moves to **In review**. Google's docs say up to 48 hours; in my case the *Create app* form stopped red-X-ing the package name immediately, and the email confirmation arrived later. YMMV. + +From there, Internal Testing is the well-trodden path: + +1. **Create app**: Same form (name, package, language, App, Free, three declarations). Submit. +2. **Internal testing → Create new release**: Drop the **production AAB** (yes, AAB this time — different artifact than what you used for verification) into the upload zone. Wait for Google's distribution optimization (a couple of minutes). +3. **Release notes**: The textarea expects XML language tags: + ```xml + <en-US> + Your release notes here. + • Bullet 1 + • Bullet 2 + </en-US> + ``` +4. **Testers tab → Create email list**: Comma-separated emails, Enter to commit, Save. Bind the list to the track. +5. **Releases → Save and publish**: One warning ("no deobfuscation file") is informational — Expo doesn't run R8 by default. Click through the publish confirmation. +6. **Copy the opt-in URL** from "How testers join your test" and send it to testers. Each opens the URL on their Android phone (signed into the Google account in the email list), clicks accept, and the app appears in the Play Store usually within an hour. + +Done. Real users on real phones from one EAS production AAB. + +--- + +## What I'd tell yesterday-me + +- **Check every Google account index** (`u/0`, `u/1`, `u/2`) in the Play Console URL before assuming you don't have a dev account. +- **Pre-register the package name** as a first action, not when *Create app* starts rejecting it. +- **Read every error literally.** "Does not have the required token file" and "has an invalid token file" are different bugs. +- **Always verify bytes inside the APK** before uploading anything to Play Console. `unzip -p ... | wc -c` is the single habit that saves the most time. +- **EAS prebuild wipes `android/`.** Any custom file in the release artifact requires a config plugin. `withDangerousMod` is the right hook for simple file writes. +- **Copy via the copy icon, not your eyes.** A 26-character string with thirteen identical letters at the end will defeat your visual counting every time. + +--- + +## What's next + +Internal Testing is one track. To reach the public Play Store I still owe: + +- **App content**: privacy policy URL, data safety form, target audience declarations. +- **Store listing**: short and full descriptions, eight phone screenshots, a 1024×500 feature graphic, a 512×512 icon. +- A **closed test** with broader feedback before requesting production access. + +Each is its own small adventure. I'll write those up if they turn out to have hidden gates of their own. + +If you build dense technical books and want to try the reader on the web first, the sample chapters are at [textstack.app](https://textstack.app) — no signup. The mobile app is in Internal Testing and rolling outward. + +Spent the evening. Got the app live. Wrote it down so the next person won't have to. diff --git a/blog-post-agpl-relicense.md b/blog-post-agpl-relicense.md new file mode 100644 index 00000000..a8a1dd39 --- /dev/null +++ b/blog-post-agpl-relicense.md @@ -0,0 +1,191 @@ +--- +title: "Why I relicensed TextStack from BUSL-1.1 to AGPL-3.0" +date: 2026-05-04 +tags: [textstack, open-source, licensing, indie-dev] +canonical_url: https://vasyl.blog/2026/05/04/why-i-relicensed-textstack-from-busl-to-agpl/ +--- + +# Why I relicensed TextStack from BUSL-1.1 to AGPL-3.0 + +I picked BUSL-1.1 for [TextStack](https://textstack.app) three weeks ago. +Three weeks later I changed my mind. Here's the reasoning, for any solo +dev about to make the same call. + +## The original choice + +When I open-sourced TextStack, I copied the license file from a project I +admired. That project was on Business Source License 1.1. So mine became +BUSL-1.1. + +I did vaguely understand what BUSL was: source-available, not OSI-approved. +You can read the code, fork it, run it for yourself — you just can't host +it as a commercial service competing with the licensor. The license +auto-converts to a real open-source license (Apache-2.0 in my case) after +four years. + +The mental model was clear: *I'm protecting my future ability to +monetize. If TextStack ever takes off, I don't want AWS to fork it and +host a clone for $5/month, undercutting my path to one paying customer.* + +That's a reasonable concern. The pattern that motivated companies like +MongoDB, Sentry, MariaDB, CockroachDB, and Elastic to move to BUSL or +SSPL is real: a hyperscaler can take your open-source code, productize +it, and outscale you on infra costs. + +So why did I change my mind? + +## The cost I didn't price in + +The week I shipped TextStack, I tried to submit it to +[awesome-selfhosted](https://github.com/awesome-selfhosted/awesome-selfhosted). +It got rejected. The list is FOSS-only. BUSL doesn't qualify. + +OK, fine — I went looking for the non-free fork. Submitted there instead. +Then I started preparing other awesome-list submissions: awesome-dotnet, +awesome-react-native, etc. Most of them either explicitly require an +OSI-approved license, or implicitly do (the maintainers don't want +non-FOSS clutter). + +Then I noticed a pattern in the issues and discussions on TextStack's +GitHub repo: people would peek at the license badge, see "BUSL-1.1", and +tab away. Nobody opened an issue saying "I'd contribute but don't like +the license." Of course they didn't. They just didn't show up. + +The brand cost is harder to measure but real: "source-available" reads +as "trying to have it both ways" to anyone steeped in OSS culture. +Whether that's fair is beside the point. It's the perception. + +I started adding it up: + +- Locked out of the most-trafficked self-hosted directory +- Awkward conversations every time I introduced the project ("wait, + it's not actually open source?") +- Contributor friction (CLAs aside, devs avoid licenses they don't + recognize) +- Branding "source-available" in a market where competitors say + "open-source" — even when the practical difference for self-hosters + is zero + +Versus what I was protecting: a hyperscaler taking my niche reading tool +for developers learning AI engineering and hosting a $5/month clone. + +That scenario is approximately fictional. AWS isn't building "AWS Book +Reader" any time soon. The realistic risk is closer to zero, and I was +paying real costs every day to insure against it. + +## What about AGPL? + +The third option I'd been ignoring: **GNU Affero General Public License +v3.0**. + +AGPL is OSI-approved open source. It also has a copyleft clause — §13 — +that says: if you modify the software and run it as a network service, +you must publish your modifications under the same license. That's the +"AWS hosts a clone" defense, expressed through copyleft instead of +through licensing restrictions. + +It's strictly weaker than BUSL against a determined competitor — they +could fork TextStack, modify it, publish their fork, and host that. But +the friction is high enough that nobody bothers for projects below a +certain size. And the "publish your fork" requirement makes it hard for +a closed-source SaaS to compete: their differentiator becomes public. + +Look at who uses AGPL successfully: +- **[Plausible Analytics](https://plausible.io)** — competes with + Google Analytics, profitable, AGPL-3.0 +- **[PostHog](https://posthog.com)** — $100M+ revenue, AGPL-3.0 +- **[Cal.com](https://cal.com)** — competes with Calendly, AGPL-3.0 +- **[Mastodon](https://joinmastodon.org)** — federated social, + AGPL-3.0 +- **[Pixelfed](https://pixelfed.org)** — federated photos, AGPL-3.0 +- **[Nextcloud](https://nextcloud.com)** — self-hosted file sync, + AGPL-3.0 +- **[Bitwarden](https://bitwarden.com)** — password manager (until + acquisition), AGPL-3.0 + +These aren't fringe projects. They're successful indie SaaS companies +that monetize via hosted offerings while keeping the source open. The +business model: AGPL for the community, dual-license for commercial +customers who can't or won't comply with AGPL §13. + +That's the model I want. + +## What changed in TextStack + +- `LICENSE`: BUSL-1.1 → AGPL-3.0 +- README badge updated +- COPYRIGHT.md rewritten as plain-English summary of AGPL rights +- CONTRIBUTING.md gained a small CLA: contributions are AGPL-3.0, but + contributors grant me the right to relicense their commits for the + purpose of dual licensing. This preserves the ability to offer + commercial terms even after others contribute. + +Old commits stay BUSL-1.1 (a license can't be revoked retroactively). +Everything from `v0.1.0` onwards is AGPL-3.0. + +The +[commit](https://github.com/mrviduus/textstack/commit/main) is one +chore: relicense, no functional changes. + +## The dual-licensing payoff + +Here's the second-order benefit I didn't appreciate when I picked BUSL: + +With AGPL-3.0, if a company wants to embed TextStack in proprietary +software, or run it as a hosted commercial service without publishing +their modifications, they can buy a commercial license from me. + +With BUSL-1.1, that path was already closed. BUSL is itself the +commercial-restricted license. There's no "upgrade to non-restricted" +to sell. + +So AGPL gives me **both** the community license **and** the monetization +path. BUSL gave me only one. + +I don't have any commercial customers yet. My +[goal](https://github.com/mrviduus/textstack#roadmap-6-month) is one by +October. AGPL keeps that door open in a way BUSL didn't. + +## What I'd tell other solo devs + +If you're picking a license for a solo project that has any chance of +becoming a commercial product: + +1. **Don't copy BUSL because Sentry uses BUSL.** They have different + threat models. A 100-person SaaS with $50M ARR has hyperscaler + competition risk. You don't. +2. **Default to AGPL-3.0** unless you have a specific reason not to. + It's the modern indie-SaaS license: real open source, strong + copyleft, dual-licensable. +3. **MIT/Apache** are great for libraries and dev tools. They're poor + for products you might want to monetize, because they don't + protect against the "AWS forks and hosts" scenario at all. +4. **The license matters less than the trust you build around your + project.** Don't agonize over edge cases. Pick a real OSI license, + ship, and build. + +## What's next for TextStack + +- First v0.1.0 release tagged today, [available on + GitHub](https://github.com/mrviduus/textstack/releases/tag/v0.1.0) +- Awesome-selfhosted submission planned for September (their + 4-month-since-first-release rule) +- iOS App Store launch +- Curated AI-engineering corpus: DDIA, ML papers, type theory, + distributed systems classics + +If TextStack might solve a problem for you — opening a textbook, +hitting a wall of unfamiliar terms, putting it down again — give it a +try at [textstack.app](https://textstack.app). No signup needed for +sample chapters. + +If you're a fellow solo dev wrestling with the licensing question, my +[email](mailto:mrviduus@gmail.com) is open. + +— Vasyl + +--- + +*Find me on [GitHub](https://github.com/mrviduus) / +[Twitter](https://twitter.com/Rexetdeus) / [Dev.to](https://dev.to/mrviduus). +TextStack is at [textstack.app](https://textstack.app).* diff --git a/celpip_vocab_band9.tsv b/celpip_vocab_band9.tsv new file mode 100644 index 00000000..ab25b8fa --- /dev/null +++ b/celpip_vocab_band9.tsv @@ -0,0 +1,155 @@ +english russian example category +From my perspective С моей точки зрения From my perspective, remote work has reshaped how teams collaborate. W2-intro +What strikes me most is Что меня больше всего поражает What strikes me most is how quickly attitudes shift once people try the alternative. W2-intro +It is widely held that Принято считать, что It is widely held that early language exposure improves cognitive flexibility. W2-intro +There is a compelling case for Есть веские основания в пользу There is a compelling case for prioritizing public transit over highway expansion. W2-intro +Few would dispute that Мало кто станет спорить, что Few would dispute that screen time affects sleep quality. W2-intro +A case in point is Показательный пример — A case in point is Toronto's bike-lane expansion, which cut commute times. W2-body +To illustrate Для наглядности To illustrate, consider a parent juggling two part-time jobs. W2-body +This stems from Это происходит из-за This stems from a fundamental shift in how we measure productivity. W2-body +The underlying issue is Корневая проблема в том, что The underlying issue is not the cost but the access. W2-body +What is often overlooked is Что часто упускается из виду What is often overlooked is the long-term maintenance burden. W2-body +Conversely Напротив Conversely, smaller firms benefit from this flexibility the most. W2-contrast +That said При этом / тем не менее That said, the policy has clear blind spots. W2-contrast +While it is true that ... it does not follow that Хотя верно, что..., отсюда не следует, что While it is true that automation displaces jobs, it does not follow that the net effect is negative. W2-contrast +On the other hand С другой стороны On the other hand, mandatory training adds friction for senior staff. W2-contrast +Critics might counter that Оппоненты могут возразить Critics might counter that this approach favours larger institutions. W2-contrast +All things considered С учётом всего сказанного All things considered, the benefits outweigh the trade-offs. W2-conclusion +On balance В итоге, по совокупности On balance, hybrid schedules offer the most pragmatic compromise. W2-conclusion +The takeaway is Главный вывод The takeaway is that policy must follow practice, not the other way around. W2-conclusion +Ultimately В конечном счёте Ultimately, this is a question of priorities, not resources. W2-conclusion +It comes down to Всё сводится к It comes down to whether we trust people to manage their own time. W2-conclusion +To a large extent В значительной мере To a large extent, this is a generational divide. W2-emphasis +It is worth noting that Стоит отметить, что It is worth noting that the data covers only urban centres. W2-emphasis +Far from being X, it is Y Это вовсе не X, а скорее Y Far from being a luxury, mental-health support is a basic workplace need. W2-emphasis +By no means Отнюдь не / никоим образом This is by no means a settled debate. W2-emphasis +More often than not Чаще всего More often than not, the simpler tool wins. W2-emphasis +I am writing to enquire about Я пишу с вопросом о I am writing to enquire about the warranty terms on order #4521. W1-formal +I would like to bring to your attention Хотел бы обратить ваше внимание I would like to bring to your attention a recurring issue with the elevator. W1-formal +With reference to your email of В ответ на ваше письмо от With reference to your email of April 18, please find my responses below. W1-formal +I hope this finds you well Надеюсь, у вас всё хорошо Hi Marta, I hope this finds you well. W1-semiformal +Just a quick note to Коротко по теме Just a quick note to confirm Friday's meeting time. W1-semiformal +I would appreciate it if you could Был бы признателен, если бы вы I would appreciate it if you could resend the invoice. W1-request +Could you possibly Не могли бы вы Could you possibly extend the deadline by two days? W1-request +Would it be possible to Возможно ли Would it be possible to schedule a follow-up next week? W1-request +I sincerely apologize for Искренне приношу извинения за I sincerely apologize for the delay in responding. W1-apology +Thank you for bringing this to my attention Спасибо, что обратили моё внимание Thank you for bringing this to my attention; I will look into it today. W1-apology +I look forward to your reply Жду вашего ответа I look forward to your reply at your earliest convenience. W1-closer +Please do not hesitate to reach out Не стесняйтесь обращаться Please do not hesitate to reach out if you need clarification. W1-closer +Should you have any questions Если возникнут вопросы Should you have any questions, I am happy to elaborate. W1-closer +I trust this clarifies the matter Надеюсь, это проясняет вопрос I trust this clarifies the matter; please confirm receipt. W1-closer +Looking forward to hearing back Жду обратной связи Looking forward to hearing back from you. W1-closer +If I were in your shoes На вашем месте If I were in your shoes, I would tackle the smaller tasks first. S-advice +The way I see it Как я это вижу The way I see it, the second option fits your timeline better. S-advice +What I would suggest is Я бы предложил What I would suggest is splitting the deposit across two accounts. S-advice +Compared with the alternative В сравнении с альтернативой Compared with the alternative, this design is far more accessible. S-compare +The key difference is Главное отличие The key difference is who bears the upfront cost. S-compare +By far the more practical option Несомненно более практичный вариант By far the more practical option is the second venue. S-compare +In the foreground На переднем плане In the foreground, a man is unloading boxes from a van. S-describe +To the right of the frame Справа в кадре To the right of the frame, two children are playing on a swing. S-describe +What stands out is Что обращает на себя внимание What stands out is the empty seating area, which suggests the event hasn't started. S-describe +Hypothetically speaking Гипотетически Hypothetically speaking, if the budget doubled, I would invest in training. S-hypo +Should that be the case Если так Should that be the case, we would need to revisit the schedule. S-hypo +There is no easy answer, but Простого ответа нет, но There is no easy answer, but I lean toward the cautious approach. S-hypo +I would lean toward Я склоняюсь к I would lean toward the in-person option for the first session. S-hedge +It really depends on Всё зависит от It really depends on whether the deadline is firm. S-hedge +Broadly speaking В общем и целом Broadly speaking, both approaches achieve the same goal. S-hedge +exceptionally исключительно The team performed exceptionally well under pressure. LEX-very +remarkably на удивление The data is remarkably consistent across regions. LEX-very +profoundly глубоко This decision has profoundly shaped my career. LEX-very +substantially существенно Costs have risen substantially since last quarter. LEX-very +markedly заметно Attendance has improved markedly since the schedule change. LEX-very +compelling убедительный She made a compelling case for the new approach. LEX-good +exemplary образцовый His conduct on the project was exemplary. LEX-good +invaluable бесценный Her feedback proved invaluable during the redesign. LEX-good +sound дельный, обоснованный The proposal rests on sound financial assumptions. LEX-good +solid надёжный We have a solid foundation to build on. LEX-good +detrimental пагубный Skipping reviews can be detrimental to code quality. LEX-bad +flawed с изъяном, ошибочный The methodology was flawed from the start. LEX-bad +counterproductive контрпродуктивный Pressuring staff is often counterproductive. LEX-bad +unsustainable нежизнеспособный The current pace is unsustainable. LEX-bad +problematic проблематичный The wording of the policy is problematic. LEX-bad +pivotal ключевой, поворотный Mentorship played a pivotal role in my early career. LEX-important +crucial решающий Timing is crucial when launching a product. LEX-important +paramount первостепенный Safety is paramount on this site. LEX-important +instrumental сыгравший важную роль She was instrumental in turning the project around. LEX-important +contend утверждать I would contend that the data supports this conclusion. LEX-think +maintain настаивать He maintains that the original schedule is realistic. LEX-think +argue доказывать, утверждать I would argue this is the wrong question to ask. LEX-think +hold the view that придерживаться мнения, что I hold the view that consistency beats intensity. LEX-think +address затрагивать (тему) The report fails to address the root cause. LEX-say +emphasize подчёркивать He emphasized the need for transparency. LEX-say +acknowledge признавать I acknowledge that the trade-off is real. LEX-say +point out обращать внимание She pointed out a flaw in the assumption. LEX-say +carry out проводить (работу) We carry out audits twice a year. LEX-make +implement внедрять The team implemented the changes within a week. LEX-make +undertake предпринимать The city undertook a major review of zoning rules. LEX-make +facilitate содействовать The new tool facilitates faster onboarding. LEX-help +enable давать возможность This setup enables remote teams to collaborate seamlessly. LEX-help +support поддерживать The data supports the original hypothesis. LEX-help +demonstrate демонстрировать The study demonstrates a clear correlation. LEX-show +highlight подчёркивать The incident highlights deeper systemic issues. LEX-show +reveal выявлять The audit revealed several gaps in compliance. LEX-show +leverage использовать (с выгодой) We leverage existing data to predict demand. LEX-use +utilize применять The platform utilizes machine learning to rank results. LEX-use +draw on опираться на The proposal draws on three years of research. LEX-use +moreover более того Moreover, the cost has dropped significantly. LEX-link +furthermore кроме того Furthermore, the policy has bipartisan support. LEX-link +on top of that вдобавок On top of that, they offer free shipping. LEX-link +consequently следовательно Consequently, hiring has slowed. LEX-link +in light of в свете, учитывая In light of recent findings, the plan was revised. LEX-link +given that учитывая, что Given that resources are limited, prioritization is essential. LEX-link +work-life balance баланс работы и жизни Hybrid schedules have improved work-life balance for most staff. T-work +burnout выгорание Chronic overtime is a clear path to burnout. T-work +upskill повышать квалификацию Employees are encouraged to upskill quarterly. T-work +workplace flexibility гибкость на работе Workplace flexibility has become a core expectation. T-work +staff retention удержание персонала Strong onboarding boosts staff retention. T-work +compensation package компенсационный пакет The compensation package includes stock options. T-work +take on responsibility брать на себя ответственность She is ready to take on more responsibility. T-work +set boundaries устанавливать границы It is healthy to set boundaries with clients. T-work +meet a deadline укладываться в срок The team consistently meets tight deadlines. T-work +career trajectory карьерная траектория This role offers a strong career trajectory. T-work +lifelong learning обучение в течение всей жизни Lifelong learning is essential in a fast-changing economy. T-edu +critical thinking критическое мышление Schools should prioritize critical thinking over rote memorization. T-edu +tuition fees плата за обучение Tuition fees have outpaced inflation for two decades. T-edu +hands-on experience практический опыт Internships provide hands-on experience that classrooms cannot. T-edu +academic curriculum учебная программа The curriculum was overhauled to include digital literacy. T-edu +broaden one's horizons расширять кругозор Studying abroad broadens one's horizons. T-edu +retain information удерживать информацию Active recall helps retain information long-term. T-edu +peer-to-peer learning взаимное обучение Peer-to-peer learning fosters deeper engagement. T-edu +drop-out rate отсев The drop-out rate fell after support services were expanded. T-edu +extracurricular activity внеклассная активность Extracurricular activities build soft skills. T-edu +screen time время за экраном Excessive screen time affects sleep quality. T-tech +digital divide цифровое неравенство The digital divide persists in rural areas. T-tech +data privacy конфиденциальность данных Data privacy concerns are reshaping consumer behaviour. T-tech +cutting-edge передовой The lab uses cutting-edge imaging technology. T-tech +user-friendly удобный для пользователя The new interface is far more user-friendly. T-tech +obsolete устаревший That standard became obsolete years ago. T-tech +streamline оптимизировать The tool streamlines invoice processing. T-tech +end-user конечный пользователь End-user feedback drives the roadmap. T-tech +cybersecurity threat угроза кибербезопасности Phishing remains the top cybersecurity threat. T-tech +seamless integration бесшовная интеграция The plugin offers seamless integration with Outlook. T-tech +carbon footprint углеродный след Cutting flights significantly reduces our carbon footprint. T-env +renewable energy возобновляемая энергия Renewable energy now powers most public buildings. T-env +sustainable practices устойчивые практики The company adopted sustainable packaging practices. T-env +greenhouse gas emissions выбросы парниковых газов The plan aims to halve greenhouse gas emissions by 2030. T-env +single-use plastics одноразовый пластик Single-use plastics have been banned in cafeterias. T-env +waste reduction сокращение отходов Composting is a low-effort waste reduction step. T-env +environmental impact экологическое воздействие The project's environmental impact was assessed thoroughly. T-env +eco-conscious экологически сознательный Eco-conscious consumers drive demand for transparent supply chains. T-env +mental health психическое здоровье Workplaces are taking mental health more seriously. T-health +preventive care профилактика Preventive care saves the system money long-term. T-health +balanced diet сбалансированное питание A balanced diet beats any single supplement. T-health +physical activity физическая активность Even moderate physical activity reduces long-term risks. T-health +stress management управление стрессом Stress management techniques should be taught early. T-health +accessible healthcare доступное здравоохранение Accessible healthcare is a defining Canadian value. T-health +screen-related fatigue усталость от экранов Screen-related fatigue is a real workplace concern. T-health +holistic approach комплексный подход The clinic takes a holistic approach to recovery. T-health +sense of belonging чувство принадлежности Local events foster a sense of belonging. T-comm +civic engagement гражданская активность Civic engagement has risen among younger voters. T-comm +volunteer work волонтёрство Volunteer work strengthens community ties. T-comm +public transit общественный транспорт Reliable public transit reduces dependence on cars. T-comm +affordable housing доступное жильё Affordable housing remains the most pressing local issue. T-comm +social cohesion социальная сплочённость Cultural events contribute to social cohesion. T-comm +neighbourhood watch соседский дозор Neighbourhood watch programs deter petty crime. T-comm +inclusive community инклюзивное сообщество The library aims to be an inclusive community space. T-comm diff --git a/claude-code-prod-stats-prompt.md b/claude-code-prod-stats-prompt.md new file mode 100644 index 00000000..d1173b00 --- /dev/null +++ b/claude-code-prod-stats-prompt.md @@ -0,0 +1,110 @@ +# Claude Code prompt — collect prod stats for Gemma 4 article + +Paste the block below into Claude Code (CLI) inside any local repo. Claude Code will SSH into your prod server, run a small read-only data-collection pass, and report back the numbers we need to drop into the dev.to article before publishing. + +**Before pasting:** replace `YOUR_SSH_TARGET` with how you normally SSH into your prod box (e.g. `vasyl@textstack-prod.example.com` or whatever's in `~/.ssh/config`). Replace `/path/to/textstack/on/prod` with the absolute path to the textstack repo on the server (where `docker compose` is run from). + +--- + +``` +I'm preparing a Dev.to submission for the Gemma 4 Challenge that needs production stats from my self-hosted TextStack deployment. SSH into my prod server and collect read-only metrics. Do not run any write or destructive commands. + +SSH target: YOUR_SSH_TARGET +Working directory on server: /path/to/textstack/on/prod +Database container: textstack_db_prod +Postgres user: read from .env on the server (variable POSTGRES_USER) +Postgres db: read from .env on the server (variable POSTGRES_DB) +Ollama container service name in docker compose: ollama + +Run the following, in this order, and report the full output of each step verbatim. If anything errors, paste the error and continue to the next step. All commands are read-only. + +STEP 1 — Confirm Ollama is healthy and Gemma 4 is loaded: + docker compose exec ollama ollama list + docker compose exec ollama ollama ps + docker stats --no-stream ollama 2>/dev/null || true + +STEP 2 — Container uptime + memory snapshot of the host: + docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}' | head -20 + free -h + uptime + +STEP 3 — Vocabulary table totals (Gemma generates distractors/hint/explanation; null means it didn't run for that word): + source .env + docker exec textstack_db_prod psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c " + SELECT + COUNT(*) AS total_words_all_time, + COUNT(distractors) AS with_distractors_all_time, + COUNT(hint) AS with_hint_all_time, + COUNT(explanation) AS with_explanation_all_time + FROM vocabulary_words; + " + +STEP 4 — Words created since the Gemma swap (PR #232 merged on or around 2026-05-07): + docker exec textstack_db_prod psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c " + SELECT + COUNT(*) AS total_words_since_swap, + COUNT(distractors) AS with_distractors_since_swap, + COUNT(hint) AS with_hint_since_swap, + COUNT(explanation) AS with_explanation_since_swap, + MIN(created_at) AS earliest_post_swap_word, + MAX(created_at) AS latest_post_swap_word, + ROUND(EXTRACT(EPOCH FROM (MAX(created_at) - MIN(created_at)))/3600.0, 1) AS hours_window_post_swap + FROM vocabulary_words + WHERE created_at >= '2026-05-07'; + " + +STEP 5 — Average distractor count per Gemma-touched word (each distractors value is a JSON array): + docker exec textstack_db_prod psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c " + SELECT + ROUND(AVG(jsonb_array_length(distractors::jsonb)), 2) AS avg_distractors_per_word, + MIN(jsonb_array_length(distractors::jsonb)) AS min_distractors, + MAX(jsonb_array_length(distractors::jsonb)) AS max_distractors + FROM vocabulary_words + WHERE distractors IS NOT NULL + AND created_at >= '2026-05-07'; + " + +STEP 6 — Sample 3 real (term, distractors) pairs from after the swap (so I can verify quality and optionally quote one in the article): + docker exec textstack_db_prod psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c " + SELECT + word, + LEFT(distractors, 200) AS distractors_preview + FROM vocabulary_words + WHERE distractors IS NOT NULL + AND created_at >= '2026-05-07' + ORDER BY random() + LIMIT 3; + " + +STEP 7 — Quick disk + container info for Ollama specifically: + docker compose exec ollama du -sh /root/.ollama 2>/dev/null || docker compose exec ollama du -sh ~/.ollama 2>/dev/null || echo 'ollama dir size: not accessible via this path' + docker inspect ollama --format '{{ .HostConfig.Memory }}' 2>/dev/null + +When done, summarize at the bottom in this exact format so I can drop it straight into the article: + + REPORT FOR ARTICLE: + - Words saved since Gemma 4 e4b swap (2026-05-07 → now): N + - Of those, Gemma-generated distractors: N (X% success rate) + - Of those, Gemma-generated hints: N + - Of those, Gemma-generated explanations: N + - Average distractors per generated word: N.NN + - Time window since swap: N hours + - Ollama container uptime: STRING + - Gemma 4 e4b model resident: YES/NO (from `ollama ps`) + - One example (term, distractors) pair worth quoting: <quoted from STEP 6 output> + +Do not commit, push, or modify any file on the server. Read-only only. If you need a sudo password for any command, stop and ask me — none of these commands should require sudo. +``` + +--- + +## What to do with the output + +When Claude Code reports back, look for two numbers: + +1. **`with_distractors_since_swap`** — the count of real Gemma calls that successfully generated 5 distractors. This replaces the placeholder *"~3 hours of real distractor calls"* in the article body. +2. **The example (term, distractors) pair** from STEP 6 — pick one with a clearly technical term and 5 sensible distractors, and add it to the article as a real-data block right under the parser-bug section. It will land harder than the synthetic `linearizability` example (which we keep, because it pre-dates having real production data). + +If `with_distractors_since_swap` is < 10, the article framing stays as-is ("dataset starts fresh from yesterday"). If it's 50+, change the "What's next" closing paragraph to mention the actual count instead of "~1000 needed for fine-tuning, dataset starts fresh". + +If Ollama container uptime is < 24h or `ollama ps` shows nothing resident, you have a current problem on prod — the silent fallback might be back. Worth investigating before publishing the post (we don't want a reader to land, click into textstack.app, and find vocab features broken). diff --git a/claude-code-prompt-gemma4-swap.md b/claude-code-prompt-gemma4-swap.md new file mode 100644 index 00000000..3bb15252 --- /dev/null +++ b/claude-code-prompt-gemma4-swap.md @@ -0,0 +1,293 @@ +# Claude Code prompt — swap qwen3:8b → gemma4:e4b + +Copy the prompt below and paste it into Claude Code (`claude` CLI) running in the textstack repo root. + +## Why this swap + +- Current local LLM: `qwen3:8b` (~5-6 GB Q4) — used for distractor generation, hint generation, book metadata, tag suggestions +- New target: `gemma4:e4b` (~3-4 GB Q4) — Google's 2026 Gemma 4 family, "effective 4B" architecture +- Server has 31 GB RAM, Ollama Docker container limited to 4G. Gemma 4 e4b fits comfortably; qwen3:8b was tight against the limit. +- Gemma 4 is on Ollama, confirmed in the host's Ollama app + +## Before pasting prompt — quick prerequisites + +1. Verify Ollama on the host has `gemma4:e4b`: + ```bash + ollama list | grep gemma4 + ``` + If not present: + ```bash + ollama pull gemma4:e4b + ``` + +2. Make sure repo is clean (`git status` shows nothing to commit) before starting. + +--- + +## The prompt + +```` +You are working in the textstack repository at /Users/vasylvdovychenko/projects/textstack/textstack. + +TASK: Swap the local LLM model used by Ollama from `qwen3:8b` to `gemma4:e4b` across the entire codebase. + +CONTEXT: +- TextStack uses Ollama as its local LLM provider. The model is referenced as `Ollama:Model` config value across appsettings.json files, source-code defaults, docker-compose env, and documentation. +- The new model is `gemma4:e4b` (Google's Gemma 4 effective-4B variant, ~3-4 GB Q4 RAM, on Ollama). +- Architecture: `ILlmService` interface with two implementations — `OllamaLlmService` and `OpenAiLlmService`. We are NOT changing the architecture or adding a new provider. We are only changing the default Ollama model name from qwen3:8b to gemma4:e4b. +- ILlmService consumers expect plain text completions (sometimes JSON-formatted output that is then parsed). Gemma 4 should handle these workloads identically to qwen3:8b, but quality may differ. + +PRE-FLIGHT CHECKS — halt if any fails: + +1. Working tree clean: + git status --porcelain + Should output nothing. + +2. On main (or a feature branch you intend to use): + git rev-parse --abbrev-ref HEAD + If on main, branch off: + git checkout -b chore/swap-ollama-model-to-gemma4 + If already on a feature branch, fine. + +EXECUTE THESE EXACT FILE EDITS (replace `qwen3:8b` with `gemma4:e4b`): + +### Source code defaults (3 files) + +1. `backend/src/Api/Program.cs` — line where you see: + options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "qwen3:8b"; + Change to: + options.OllamaModel = builder.Configuration["Ollama:Model"] ?? "gemma4:e4b"; + +2. `backend/src/Application/LLM/OllamaLlmService.cs` — line where you see: + _model = config["Ollama:Model"] + ?? Environment.GetEnvironmentVariable("OLLAMA_MODEL") + ?? "qwen3:8b"; + Change the fallback string to "gemma4:e4b". + +3. `backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs` — line: + public string OllamaModel { get; set; } = "qwen3:8b"; + Change to: + public string OllamaModel { get; set; } = "gemma4:e4b"; + +### Configuration files (NOT in bin/ — those regenerate) + +4. `backend/src/Api/appsettings.json` — find `"Model": "qwen3:8b"` under the `"Ollama"` key. Change to `"gemma4:e4b"`. + +5. `backend/src/Worker/appsettings.json` — same. + +DO NOT EDIT files under `bin/` directories — those are build artifacts and will be regenerated on next `dotnet build`. + +### Docker Compose + +6. `docker-compose.yml` — find: + Ollama__Model: qwen3:8b + Change to: + Ollama__Model: gemma4:e4b + + Verify the Ollama service `memory: 4G` limit is sufficient for gemma4:e4b. It is (~3-4 GB Q4 fits with headroom). No change needed. + +### Documentation (current-state references) + +7. `README.md` — find references to `qwen3:8b` (around lines 97 and 124). Update to `gemma4:e4b`. Keep the surrounding text contextually accurate (e.g., "local Ollama `gemma4:e4b` (distractors, local)"). + +8. `CLAUDE.md` — line 202 mentions "Ollama LLM (`qwen3:8b`) generates 5 distractors...". Update to `gemma4:e4b`. + +9. `docs/04-dev/llm-provider-swap.md` — replace `qwen3:8b` with `gemma4:e4b` in the docs section (it's used as an example default). + +10. `docs/ux-roadmap/17-ai-auto-tags.md` — replace `qwen3:8b` with `gemma4:e4b`. + +11. `PLAN-elevenreader-parity.md` (line 28) — update "Ollama qwen3" to "Ollama gemma4:e4b". + +12. `TODO.md` — update or remove the "llama3, mistral, qwen, phi" speculation entry since we've now picked gemma4. + +### DO NOT TOUCH + +- `CHANGELOG.md` — historical entries (the existing "switched from gemma3:4b to qwen3:8b" entry is historical fact). DO NOT modify it. +- `release-notes-v0.1.0.md` — already used for the v0.1.0 GitHub Release, frozen. +- `hackernews-launch-post.md` — marketing draft, freeze. +- `bin/` directories everywhere — build artifacts. +- Test mocks in `tests/TextStack.UnitTests/` and `tests/TextStack.IntegrationTests/` source files (the test code uses `Mock<ILlmService>`, doesn't reference the model name). + +### Add a CHANGELOG entry + +13. Open `CHANGELOG.md`. Find the `## [Unreleased]` section near the top. Add a new bullet under `### Changed` (create the subheading if it doesn't exist): + + ``` + ### Changed + - **Local LLM model**: switched from `qwen3:8b` to `gemma4:e4b`. Smaller footprint + (~3-4 GB vs 5-6 GB Q4), fits comfortably under the 4 GB Ollama container + memory limit, and uses Google's Gemma 4 effective-4B architecture (released + May 2026). Same `ILlmService` interface, no API changes. To roll back: + set `Ollama__Model=qwen3:8b` env var or update `appsettings.json`. + ``` + +### Commit + +14. Stage and commit with this message: + + ``` + chore(llm): swap Ollama model from qwen3:8b to gemma4:e4b + + Updates the default local LLM across appsettings, source defaults, docker- + compose env, and documentation. Same ILlmService interface, no API changes. + + Why: + - gemma4:e4b is ~3-4 GB Q4 vs qwen3:8b at ~5-6 GB Q4 — fits comfortably + under the 4 GB Ollama container limit without OOM-kill risk. + - Released May 2026 by Google; "effective 4B" architecture should be + competitive with qwen3:8b on the distractor/hint/metadata generation + tasks we use it for. + - Eligible for Dev.to's Gemma 4 Challenge submission. + + Rollback: set Ollama__Model=qwen3:8b in env or revert this commit. + + Files changed: + - backend/src/Api/Program.cs (default fallback) + - backend/src/Application/LLM/OllamaLlmService.cs (default fallback) + - backend/src/Vocabulary/TextStack.Vocabulary/VocabularyOptions.cs (default) + - backend/src/Api/appsettings.json (config) + - backend/src/Worker/appsettings.json (config) + - docker-compose.yml (env) + - README.md, CLAUDE.md, docs/04-dev/llm-provider-swap.md, + docs/ux-roadmap/17-ai-auto-tags.md, PLAN-elevenreader-parity.md, + TODO.md (documentation) + - CHANGELOG.md (Unreleased entry) + ``` + +15. Do NOT push. The user wants to test locally first before pushing. + +VERIFICATION: + +After all edits, run these checks: + +1. No remaining functional references to `qwen3:8b` in source/config (excluding bin/, CHANGELOG.md historical entries, frozen marketing files, and `release-notes-v0.1.0.md`): + ``` + grep -rn "qwen3:8b" \ + --include="*.cs" --include="*.json" --include="*.yml" --include="*.yaml" \ + --include="*.md" --include="*.sh" --include="Makefile" \ + --exclude-dir=bin --exclude-dir=obj --exclude-dir=node_modules \ + . | grep -v -E '(CHANGELOG\.md|release-notes-v0\.1\.0\.md|hackernews-launch-post\.md):' + ``` + Expected: empty (no output). + +2. Build succeeds: + ``` + dotnet build textstack.sln + ``` + +3. Tests pass (unit tests use mocks, won't actually hit Ollama): + ``` + dotnet test tests/TextStack.UnitTests + ``` + +4. Output a final summary listing: + - Branch name + - Commit hash + - Files changed (count + paths) + - Confirmation that grep verification returned empty + - Confirmation that build + tests succeeded + - Suggested next manual step for the user (see Post-prompt steps below) + +If any step fails, do not finalize the commit. Report the issue and propose a fix. +```` + +--- + +## Post-prompt manual steps (you do these after Claude Code finishes) + +### 1. Pull the model on the production server + +SSH to the VPS and pull the new model into the Ollama Docker volume: + +```bash +docker compose exec ollama ollama pull gemma4:e4b +``` + +This downloads the model to `./data/ollama/` so the next deploy can use it without redownloading. + +### 2. Test locally before deploying + +Run a quick smoke test against your local Ollama (or via docker compose): + +```bash +# Start Ollama locally +docker compose up -d ollama + +# Pull model +docker exec textstack_ollama ollama pull gemma4:e4b + +# Direct test — does gemma4:e4b respond and produce JSON-parseable output? +docker exec textstack_ollama ollama run gemma4:e4b 'Generate 5 multiple-choice distractors for the technical term "eventual consistency" in the context of distributed databases. Output ONLY a JSON array of strings, nothing else.' +``` + +Expected: a JSON array like `["strong consistency", "linearizability", "ACID", "two-phase commit", "leader election"]` (the exact words don't matter, the format does). + +If gemma4:e4b refuses to output strict JSON or wraps it in markdown fences, you'll need to tweak the prompts in: +- `backend/src/Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs` +- `backend/src/Worker/Services/BookMetadataGenerator.cs` +- `backend/src/Worker/Services/TagSuggestionGenerator.cs` + +(Look for the "Output ONLY..." or "Respond with JSON" instructions and make them stricter — Gemma sometimes adds preamble.) + +### 3. End-to-end test + +Save a new vocabulary word in TextStack and verify: +- Distractors are generated (5 of them) +- Hint is generated (1 short hint) +- Explanation is generated (2-3 sentences) + +If all three appear in the database within ~10-30 seconds, the swap works. + +### 4. Push and deploy + +```bash +git push origin chore/swap-ollama-model-to-gemma4 +# Open PR, merge to main when ready +# CI will build and deploy +``` + +After deploy: +- `make logs` to watch for Ollama-related errors +- Monitor a couple of vocabulary saves to confirm distractors generate + +### 5. Rollback path + +If something breaks in production: + +```bash +# Quick env override (no code change) +ssh server +cd /path/to/textstack +docker compose exec api sh -c 'echo "Ollama__Model=qwen3:8b" >> .env' +docker compose restart api worker +``` + +Or revert the commit and redeploy. + +--- + +## Summary of files affected + +**Functional (must change):** +- 3 source files (defaults) +- 2 config files (appsettings.json — Api + Worker) +- 1 docker-compose.yml + +**Documentation (must change for accuracy):** +- README.md, CLAUDE.md +- docs/04-dev/llm-provider-swap.md +- docs/ux-roadmap/17-ai-auto-tags.md +- PLAN-elevenreader-parity.md +- TODO.md + +**Tracking (add Unreleased entry):** +- CHANGELOG.md + +**Frozen (do NOT change):** +- CHANGELOG.md historical entries +- release-notes-v0.1.0.md +- hackernews-launch-post.md +- bin/ directories +- test mock code + +Total: ~10-12 files modified, 1 commit. diff --git a/claude-code-prompt-release.md b/claude-code-prompt-release.md new file mode 100644 index 00000000..9ab8bbad --- /dev/null +++ b/claude-code-prompt-release.md @@ -0,0 +1,162 @@ +# Claude Code prompt — cut v0.1.0 release + +Copy the prompt below and paste it into Claude Code (`claude` CLI) running in the textstack repo root. It will execute end-to-end: pre-flight checks → CHANGELOG update → tag creation → push → GitHub Release. + +--- + +## Prerequisites + +Before running the prompt, verify in your terminal: + +1. `gh` CLI installed and authenticated: + ```bash + gh auth status + ``` + Should show `Logged in to github.com as mrviduus`. If not, run `gh auth login`. + +2. You have push access to the repo (you do — you're the owner). + +3. `release-notes-v0.1.0.md` exists in the repo root (it does — I just created it). + +--- + +## The prompt + +```` +You are working in the textstack repository at /Users/vasylvdovychenko/projects/textstack/textstack. The owner is Vasyl Vdovychenko. + +TASK: Cut and publish the first tagged release v0.1.0 on GitHub. + +CONTEXT: +- TextStack was just relicensed from BUSL-1.1 to AGPL-3.0 (PR #201, already merged to main). +- This is the first ever tagged release. +- Goal 1: start the 4-month seasoning timer required by awesome-selfhosted. +- Goal 2: publish a discoverable release (RSS feed, GitHub Releases page). +- A pre-written release notes file exists at release-notes-v0.1.0.md in the repo root — use it as-is for the GitHub Release body. + +PRE-FLIGHT CHECKS — halt and report if any fails: + +1. Working tree state: + git status + If there are uncommitted changes on the current branch, halt. Tell the user to commit or stash before proceeding. Do not stash automatically. + +2. Current branch is recorded — we'll need to switch back at the end: + git rev-parse --abbrev-ref HEAD + Save this value as ORIGINAL_BRANCH. + +3. Switch to main and pull latest: + git checkout main + git fetch origin + git pull origin main --ff-only + If the pull fails (non-fast-forward), halt and tell the user to resolve manually. + +4. Verify the relicense commit is in main: + git log --oneline -50 | grep -i "relicense from BUSL" + Should find the AGPL relicense commit. If not found, halt. + +5. Verify no v0.1.0 tag already exists: + git tag -l "v0.1.0" + Should output nothing. If the tag exists, halt. + +6. Verify release-notes-v0.1.0.md exists and is non-empty: + wc -l release-notes-v0.1.0.md + Should report at least 50 lines. + +7. Verify gh CLI is authenticated: + gh auth status + Should show authenticated as mrviduus. + +EXECUTE: + +1. Update CHANGELOG.md to mark the v0.1.0 release. + + Read the current CHANGELOG.md. Find the line `## [Unreleased]`. Replace that single line with two lines: + + ## [Unreleased] + + ## [v0.1.0] — TODAY_DATE + + Where TODAY_DATE is today's date in YYYY-MM-DD format (use `date +%Y-%m-%d`). + + This convention preserves all existing detailed Unreleased notes under v0.1.0 (so they become the v0.1.0 changelog), and leaves a clean empty Unreleased header at the top for future work. + + Also append a short headline summary near the top of the new v0.1.0 section, immediately after the date header: + + ### Headline + + First tagged release of TextStack under **GNU Affero General Public License v3.0**. Earlier development was BUSL-1.1; v0.1.0 onwards is AGPL-3.0 (PR #201). See `release-notes-v0.1.0.md` for the user-facing announcement. + +2. Stage and commit the CHANGELOG update: + git add CHANGELOG.md + git commit -m "docs(changelog): mark v0.1.0 release" + +3. Push the commit: + git push origin main + +4. Create the annotated tag at HEAD of main: + git tag -a v0.1.0 -m "v0.1.0 — first AGPL-3.0 release + + First public tagged release of TextStack. Project relicensed from + BUSL-1.1 to AGPL-3.0 (PR #201). See release-notes-v0.1.0.md and + CHANGELOG.md for details." + +5. Push the tag: + git push origin v0.1.0 + +6. Create the GitHub Release using the existing release notes file: + gh release create v0.1.0 \ + --title "v0.1.0 — First AGPL-3.0 release" \ + --notes-file release-notes-v0.1.0.md \ + --latest + + If the command fails because gh is not installed or not authenticated, report the error and stop. + +7. Switch back to ORIGINAL_BRANCH (the branch you were on at the start): + git checkout $ORIGINAL_BRANCH + +VERIFICATION: + +After everything completes: + +- gh release view v0.1.0 — should print the new release info with the URL +- git tag -l "v0.1.0" — should show the tag +- git log v0.1.0 --oneline -1 — should show the commit the tag points to + +OUTPUT a final summary to the user containing: +- Original branch you returned them to +- Tag SHA (full and short) +- CHANGELOG commit SHA +- GitHub Release URL (gh release view v0.1.0 --json url -q .url) +- Confirmation that the 4-month timer for awesome-selfhosted starts today +- Suggested follow-up: edit the GitHub Release on the web to verify formatting, then announce on Twitter/blog/HN + +If ANY step fails, do not proceed. Report the failure clearly and propose a recovery action. Never force-push, never delete tags, never rewrite history. +```` + +--- + +## After Claude Code finishes + +1. **Verify on the web**: + - Go to https://github.com/mrviduus/textstack/releases + - Confirm v0.1.0 is published as "Latest" with formatted release notes + - Confirm the README badge in the release matches AGPL-3.0 + +2. **Optional polish**: + - On the GitHub Release page, click "Edit" → upload a screenshot of the reader (drag into the description) for a more visual release page + - Add a banner image if you have one (`docs/assets/hero.png` would work) + +3. **Announce**: + - Tweet from @Rexetdeus: link to release + 1-line "First AGPL-3.0 release of TextStack — a reader for technical books" + - Short blog post on vasyl.blog "TextStack v0.1.0 is out, and it's now AGPL-3.0" (this is also the foundation for a HN submission later) + +4. **Calendar reminder**: 2026-09-04 — eligible to submit awesome-selfhosted PR + +--- + +## If something goes wrong + +- **`gh` not installed**: `brew install gh` then `gh auth login` +- **Push rejected because branch protection**: you might have branch protection on main requiring PRs. If so, the commit + tag flow needs adjustment — let me know and I'll rewrite the prompt to use a PR instead of direct push +- **Tag already exists**: someone else may have tagged. Check with `git tag -l` and decide whether to delete (`git tag -d v0.1.0 && git push origin :refs/tags/v0.1.0`) or use a different version +- **Claude Code halts on a check**: that's working as intended — read the error, fix the underlying issue, re-run diff --git a/comment-response-templates.md b/comment-response-templates.md new file mode 100644 index 00000000..5628f616 --- /dev/null +++ b/comment-response-templates.md @@ -0,0 +1,157 @@ +# Comment response templates + +Pre-baked answers to questions you'll see within the first few hours on Dev.to, Reddit, and HN. Adjust each to the platform's voice (HN drier than Reddit, Reddit drier than Dev.to). Don't paste verbatim — make it sound like you in the moment. + +**Rule of thumb:** any reply under 2 sentences should be either dropped or expanded. "Thanks!" hurts your engagement signal more than no reply at all. + +--- + +## "Why not just use OpenAI/Claude/Gemini for this too?" + +``` +For four jobs (distractors, hints, native-language explanations, book metadata enrichment), the per-call cost was killing the self-host story. ~50 words saved per active reader per book × 5¢ per OpenAI distractor call = $2.50/book/user. Fine if I run the only instance, but the project's AGPL and the whole point is that anyone with a $20 VPS can run it. The moment someone else's hosting bill becomes their problem, cloud LLM costs make TextStack unrunnable for them. + +Translation stayed on OpenAI because multilingual quality, especially Ukrainian, isn't there yet on local 4B-parameter models. Different tasks, different trade-offs. +``` + +--- + +## "Why E4B and not 31B / 26B MoE?" + +``` +Short answer: 31B and 26B MoE need either a GPU or a much bigger box. E4B fits the constraint that matters — TextStack has to be deployable by anyone with a $20/month consumer VPS. Article goes into the trade-off matrix in the "How I Used Gemma 4" section, but the TLDR is: E2B was too weak for technical-domain distractor quality, E4B is the smallest model that produces plausible siblings for terms like "linearizability", and the bigger models would force a hardware upgrade I'm not asking my users to make. +``` + +--- + +## "Why not run on GPU?" + +``` +RTX 5090 + vLLM + 31B + MTP is a different conversation — the speed and reasoning quality are on another level. There's a great post about that exact stack from @ertugrul_demir on Dev.to right now. I deliberately picked the opposite direction: keep it consumer-VPS-only so the self-host story is real, not aspirational. Different audience, different trade-off space. +``` + +--- + +## "Can I run this on a Raspberry Pi?" + +``` +Ollama needs ~13 GiB resident for E4B with KEEP_ALIVE=-1. A Pi 5 maxes at 8 GB RAM. So no, not on a Pi 5 with E4B. If you swap to E2B (2B effective), you can — but in my testing E2B distractor quality on technical vocabulary wasn't there. If you want vision or simpler classification on a Pi, the Tahosin post from this challenge (s/he ran Gemma 4 vision on a Pi 5 for object detection) is a better reference. +``` + +--- + +## "Why .NET? Why not Python?" + +``` +I've shipped .NET production for a decade and React on the side. Python would mean introducing a third language to the stack just for the LLM glue, when ASP.NET Core 10 talks to Ollama's HTTP API perfectly fine. The fire-and-forget pattern via IServiceScopeFactory is also nicer in C# than Python equivalents I've used. Mostly though: I know .NET, and the integration is one HTTP call. +``` + +--- + +## "Is AGPL really enforceable for SaaS?" + +``` +Honest answer: it's enforceable in principle, hard in practice unless you have legal resources. I picked AGPL not because I'm planning to sue anyone, but because it sets the expectation: if you build a SaaS off this, your modifications are public. That filters out the "fork it, slap a paywall on it, stay quiet" path which I don't want. Anyone running TextStack as-is for personal use has zero obligations. +``` + +--- + +## "What does Kindle Word Wise actually do?" + +``` +It's a Kindle feature (frozen at 2014-era rules) that shows brief explanations for "harder" words above the line as you read. Built for native English speakers who hit unfamiliar everyday words. Doesn't translate, doesn't know the book's domain, doesn't have an SRS layer, no LLM. TextStack is essentially: what would Word Wise look like if it were built today, knew the difference between "attention" the everyday word and "attention" the ML term, and could surface terms into a spaced-repetition queue. +``` + +--- + +## "Why didn't you alert on the silent fallback?" + +``` +Fair hit. I had no observability on llm.success vs llm.fallback split — both code paths returned a list of strings to the caller, both succeeded from the API's perspective. My todo from this incident is: emit two distinct counters, alert if the fallback ratio drifts above 5%. Pre-Gemma-swap I'd convinced myself "if it worked locally, it works in prod" and the fact that the fallback was silent let me skip the obvious instrumentation. Lesson worth its own short post. +``` + +--- + +## "What's the cold-start latency post-restart?" + +``` +50–60 seconds for the first inference call after the container boots, then warm forever (KEEP_ALIVE=-1). Article has the numbers. Practically that means every deploy burns 60s of latency on the first user who triggers a Gemma call — not great, not terrible. Workaround if you cared more than I do: have the container hit Ollama with a warmup prompt as part of the docker compose up sequence. +``` + +--- + +## "Does this work with [other Ollama model]?" + +``` +Yeah, all the local stuff is just an Ollama HTTP call — the model name is one config line. Distractor parser is the surface that's most model-sensitive (qwen3 outputs differently than Gemma 4 differently than Llama). The one in the post is tuned for "single-word output". If you swap models, expect to re-tune that prompt. +``` + +--- + +## "Sample chapters / can I try it without signup?" + +``` +Yes — go to https://textstack.app, pick any book, hit "Read". Sample chapters are unauthenticated. Vocab review needs a free account because progress and SRS state are per-user, but there's no email verification gate. Use any throwaway email. +``` + +--- + +## "What's the SRS algorithm? Anki / SuperMemo?" + +``` +Custom 5-stage state machine — New → Recognition → Recall → Context → Mastered. Each stage uses a different review mode (multiple choice in early stages, classic flashcard later). It's not Anki — Anki is open-ended, TextStack's queue is intentionally capped (no infinite backlog). Code is at backend/src/Application/Vocabulary/SrsEngine.cs if you want to read it. +``` + +--- + +## "Where do the books come from?" + +``` +Two sources: a curated public library (mostly Project Gutenberg + Standard Ebooks for the technical/classics overlap), and user uploads (you can upload your own EPUB/PDF/FB2 and the worker extracts it). The interesting books for this audience — DDIA, Crafting Interpreters, SICP, the Pragmatic Programmer — are user uploads. +``` + +--- + +## "Isn't this basically a wrapper around Gemma 4?" + +``` +The Gemma part is one hop in a longer pipeline: parse EPUB/PDF/FB2 → extract chapters → search-vector index → reader UI → tap-word context detection → translation routing (cloud or local) → SRS scheduling → distractor generation → UX layer. Gemma's role is one chunk of the inference work — distractors, hints, explanations, metadata. Calling the whole product a wrapper would be like calling Stripe a wrapper around card networks. The model does its job; everything else is what makes it useful. +``` + +--- + +## "How do I contribute / what kind of PRs do you accept?" + +``` +Check CONTRIBUTING.md in the repo. Right now the highest-leverage PRs are: language-specific translation polish (Ukrainian/Russian especially — I can read those, but PR feedback from native speakers helps), bug reports with reproduction steps, and integration adapters for additional book sources beyond Gutenberg. The codebase is .NET 10 backend + React 19 frontend + React Native mobile. +``` + +--- + +## "How does this compare to LingQ / Kindle / Beeline / Readlang?" + +``` +Different audiences. LingQ is built for casual language learning at scale — broad vocabulary, conversational. TextStack is built for technical books specifically: domain-aware translation knowing whether "transaction" means a database thing or a financial one, integrated with an SRS that caps the queue weekly so you don't drown. None of those tools have local-LLM as the engine — they're all cloud-paid or freemium models with usage limits. The trade-off is real: TextStack's translation isn't as polished as paid services on conversational language, but it nails technical terms in a way none of them do. +``` + +--- + +## Crisis-mode replies + +If someone calls you out for something you actually got wrong: + +``` +You're right. I had this wrong in the post — [acknowledge specific thing]. Going to update the article with a correction note. Appreciate the catch. +``` + +If someone says "this is just an ad": + +``` +The post is a writeup of a real bug I had to fix in production. Repo is open source, all the code I describe is linkable, the PRs are public. If there's a part you'd find more useful as a standalone technical reference without the project context, happy to pull it out. +``` + +If you get a flame about the AGPL: + +``` +AGPL was a deliberate choice, not an accident. Whether it's the right license for your use case is a different question — if you'd want to use this commercially under different terms, open an issue and let's talk. I'm not opposed to a dual-license discussion. +``` diff --git a/devto-gemma4-article.md b/devto-gemma4-article.md new file mode 100644 index 00000000..e6eba5f6 --- /dev/null +++ b/devto-gemma4-article.md @@ -0,0 +1,435 @@ +# Dev.to article — Gemma 4 Challenge submission (Build category) + +**Strategy:** submit to **"Build with Gemma 4"** ($500 × 5 winners, vs Write at $100 × 5). Required structure: `## What I Built` / `## Demo` / `## Code` / `## How I Used Gemma 4`. Lead with the strongest hook this draft has — the silent-fallback discovery — then deliver against each judging criterion in order. + +**Judging criteria, mapped to sections:** + +| Criterion | Section that proves it | +|---|---| +| Intentional and effective use of Gemma 4 | `## How I Used Gemma 4` — full trade-off table + the second-swap narrative (e4b → e2b once prod data was in) | +| Technical implementation and code quality | `## Code` (AGPL repo + 3 PRs + 4 follow-up commits with diffs) + the 6-lesson walkthrough | +| Creativity and originality | What I built (technical-vocab SRS for non-native devs reading dense English books) + two model swaps in one project + 63k load test on consumer hardware | +| Usability and user experience | Demo of live `textstack.app` + sample chapters without signup + the 100% success rate at 500 RPS under load | + +--- + +## Title (recommended) + +``` +I shipped local LLM features two months ago. Production never ran them once. +``` + +Hook test against the field: top current entry is *"Gemma 4: Why Local AI is Finally Becoming Personal"* — generic, no production data. Ours is specific, second-person-implicating, promises a story, and reveals it was on Gemma 4 by paragraph two. Keep. + +Backup if Build judges prefer category-clean titles: *"How I rebuilt Kindle Word Wise on Gemma 4 e4b — and discovered Ollama had been silently empty for two months"*. + +--- + +## Tags (required by challenge) + +`devchallenge`, `gemmachallenge`, `gemma` + +The challenge announcement and submission template both list exactly these three. dev.to caps tags at 4. **Do not add `ai`, `opensource`, or `selfhosted`** in their place — judges filter by `gemmachallenge` and the official three keep us in the eligible pool. If we want a 4th, `webdev` and `ollama` are safer than thematic ones. + +--- + +## Cover image prompt for Dev.to AI generator + +``` +Flat minimalist illustration: a server rack labeled "ollama" in the foreground, +its model slot drawn as an empty glass cylinder. On the right, two stacked +model containers labeled "gemma4:e4b" (fading out) and "gemma4:e2b" (sliding +in, glowing). Faint code-trace lines underneath in soft teal and purple. Wide +banner aspect ratio (1000×420), no people, no faces, dev.to-friendly clean +style. Visual hint of "two swaps, one challenge". +``` + +--- + +## Article body (paste into Dev.to editor) + +```markdown +*This is a submission for the [Gemma 4 Challenge: Build with Gemma 4](https://dev.to/challenges/google-gemma-2026-05-06)* + +Two months ago I shipped local-LLM features in [TextStack](https://textstack.app) — an open-source reader for developers who want to finish dense English technical books in their native language. Yesterday I noticed something strange about the production server's RAM. 3 GB used out of 30. The model that runs all those features should be ~13 GB resident. + +I SSH'd in. + +```bash +$ docker compose exec ollama ollama list +NAME ID SIZE MODIFIED +$ +``` + +Nothing. The Ollama container had been running for 60+ days without a single model pulled. Every distractor call had fired, hit the fallback path, and returned random vocabulary words. I never noticed because the failure mode is silent — the user sees distractors, just not LLM-generated ones. + +This is the post-mortem of that, plus the **two model swaps** that finally got the features working: `qwen3:8b → gemma4:e4b` on day one to bring local inference up at all, then `e4b → e2b` once production load showed e4b couldn't keep up on CPU. **Six production bugs surfaced along the way.** The article ends with a real 63,000-request load test on the e2b deploy: 100% success, p95 = 20.5 ms, total OpenAI cost = $0.002. + +## What I Built + +[**TextStack**](https://textstack.app) is an open-source ([AGPL-3.0](https://github.com/mrviduus/textstack/blob/main/LICENSE)) reader for developers who keep abandoning English technical books like *Designing Data-Intensive Applications*. Tap any term → context-aware translation that knows the book's domain ("attention" in an ML chapter gets *увага (механізм у нейромережах)*, not the everyday meaning). Words you save feed a capped weekly SRS queue. + +Local **Gemma 4 e2b** generates the multiple-choice distractors, hints, native-language explanations, and book metadata enrichment — four jobs that previously needed paid OpenAI calls per user. OpenAI `gpt-5-mini` stays for translation (multilingual quality matters) and for in-reader live explanations (latency-sensitive). Everything else runs on a single-CPU 30 GB-RAM VPS, no GPU. + +## Demo + +🌐 **Live:** [textstack.app](https://textstack.app) — sample chapters open without signup. Tap any word in *Designing Data-Intensive Applications*, then check the vocabulary review. + +🎬 **37-second walkthrough — read → save word → MCQ with Gemma-generated distractors → answer feedback:** + +![TextStack vocabulary review demo: tap-translations in DDIA, save word to vocabulary, MCQ card with 4 Gemma-generated distractors, red/green answer feedback](https://raw.githubusercontent.com/mrviduus/textstack/main/docs/marketing/srs-mcq-demo.gif) + +📸 **Single MCQ card — "___ the data from these external systems..." with 4 Gemma-generated distractors (battle / bringing / storm / courage):** + +![Vocabulary multiple-choice card with cloze sentence from DDIA and 4 Gemma-generated distractor options](https://raw.githubusercontent.com/mrviduus/textstack/main/docs/marketing/srs-mcq-card.png) + +> **Note for judges:** Sample chapters are unauthenticated; the vocabulary review needs a free account because progress and SRS state are per-user. Use any throwaway email — there's no email verification gate on read. + +## Code + +📦 **Repository:** [github.com/mrviduus/textstack](https://github.com/mrviduus/textstack) — AGPL-3.0, 200+ merged PRs, deployed at [textstack.app](https://textstack.app) + +[![Star on GitHub](https://img.shields.io/github/stars/mrviduus/textstack?style=social&label=Star)](https://github.com/mrviduus/textstack) — every star tells me one more developer wants to finish DDIA without giving up + +📐 **Stack:** +- Backend: ASP.NET Core 10 (clean architecture: Domain / Application / Infrastructure / Api / Worker) +- Database: PostgreSQL 16 with FTS for in-book search +- Frontend: React 19 + Vite, React Native 0.83 (Expo) for mobile +- LLM: Ollama running `gemma4:e2b` for local jobs, OpenAI `gpt-5-mini` for translation +- Deployment: docker-compose, Cloudflare Tunnel, single VPS + +🔧 **Key commits behind the story:** + +- [PR #232](https://github.com/mrviduus/textstack/pull/232) — original swap `qwen3:8b` → `gemma4:e4b`, image pin, memory bump +- [`3999944`](https://github.com/mrviduus/textstack/commit/3999944) — worker `Connection refused` fix + the real timeout bump (30s → 90s after measurement) +- [`966b398`](https://github.com/mrviduus/textstack/commit/966b398) — the second model swap, `e4b` → `e2b` +- [`c6db540`](https://github.com/mrviduus/textstack/commit/c6db540) — 63,000-request load test + full LoadSurge report + +Full PR/commit history for the swap arc lives in [`CHANGELOG.md` under `[Unreleased]`](https://github.com/mrviduus/textstack/blob/main/CHANGELOG.md). The Gemma-using code lives in: + +- [`backend/src/Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs`](https://github.com/mrviduus/textstack/blob/main/backend/src/Vocabulary/TextStack.Vocabulary/DistractorGenerator.cs) — prompt template, parser, fallback cascade +- [`backend/src/Worker/Services/BookMetadataGenerator.cs`](https://github.com/mrviduus/textstack/blob/main/backend/src/Worker/Services/BookMetadataGenerator.cs) — fire-and-forget metadata enrichment + +## How I Used Gemma 4 + +**The model selection went through two rounds.** Gemma 4 ships in four sizes. The first time I built a trade-off table, I picked the wrong one — for understandable reasons. The second time I had production data and picked correctly. Both decisions live in the same article. + +Here's the matrix at the time of the first pick (E4B, day-one swap): + +| Model | Disk | RAM resident | Fits on my VPS? | First-pick reasoning | +|---|---|---|---|---| +| **E2B** (2B effective) | 7.2 GB | ~5 GiB | ✅ trivially | "Too small for nuanced technical-vocab distractors" — *I'd find out this was wrong* | +| **E4B** (4B effective) | 9.6 GB | 13 GiB | ✅ with cgroup bump 4G → 12G | "Sweet spot — strong enough on quality, fits the VPS" — *picked first* | +| **31B Dense** | ~18 GB | ~24 GiB | ⚠️ tight, no headroom for Postgres + .NET | "Overkill, no room for the rest of the stack" | +| **26B MoE** | ~15 GB | ~20 GiB | ⚠️ same constraint | "MoE doesn't help short prompts here" | + +The 31B and 26B MoE models would need either a GPU box or a much bigger VPS, neither of which fits an open-source project that has to remain deployable on a $20/month consumer host. So the real choice was between E2B and E4B. I went with E4B. I was wrong. + +**What Gemma 4 unlocked vs the cloud alternative.** Pre-swap, every distractor generation was a ~5¢ OpenAI call per word saved per user. With ~50 saved words per active reader per book, that's $2.50/book/user — fine for me running the only instance, fatal the moment someone else self-hosts it. Local Gemma 4 makes the marginal cost per distractor ~0 (just CPU on a box already running). Same for hints, explanations, and book metadata enrichment. + +**Local inference changed the economics of the feature completely.** That's the real reason the swap mattered — not the model quality, the cost shape. + +## What surfaced when I actually flipped it on + +The bug story isn't decoration — it's how I learned what each Gemma 4 quirk does in production. **Six lessons.** The first four came from getting e4b to run at all. The last two came from staring at the production stats after it was "running". + +### Lesson 1: floating image tags lie + +Original `docker-compose.yml` had: + +```yaml +ollama: + image: ollama/ollama # no version +``` + +Docker pulled `latest` two months ago and cached it. `latest` at that moment was 0.22.x. Gemma 4 wasn't released yet, so the binary doesn't recognize the model family. From the host's perspective, the "local Ollama" IS the latest version — `docker image ls` shows the cached SHA, not whether upstream has moved. + +```diff +- image: ollama/ollama ++ image: ollama/ollama:0.23.1 +``` + +Pull succeeded after pinning. 9.6 GB on disk for e4b. + +### Lesson 2: cgroup limits were a guess from the qwen3 era + +The container memory cap (4 GB) had been sized for `qwen3:8b` and never re-evaluated. Gemma 4 e4b weights need 9.8 GiB. Inference returned `model requires more system memory (9.8 GiB) than is available` until I bumped the limit: + +```diff + deploy: + resources: + limits: +- memory: 4G ++ memory: 12G +``` + +The lesson: every model swap should also re-evaluate the container resource block. Picked-once-and-forgotten limits are a category of silent drift. + +### Lesson 3: cold load and warm latency both blew past my API timeout + +First inference call hung ~60s before the first token. Default Ollama `keep_alive` is 5 minutes — after that the model unloads and the next cold call burns 60s again. Fix: `OLLAMA_KEEP_ALIVE=-1`, plus bump the API timeout from 10s → 30s. + +I shipped it. Then watched production: **2 distractor generations out of 13 saved words succeeded.** The model was resident the entire time. Every miss was a wall-clock timeout. E4B on CPU just takes more than 30 seconds for many prompts. + +So 30s wasn't enough either: + +```diff +- "TimeoutSeconds": 30 ++ "TimeoutSeconds": 90 +``` + +Success rate climbed to ~100%. **For CPU-only Gemma 4 on a 6-core consumer VPS, your timeout has to absorb 60–90 s tail latency, not 10 s.** That gap between toy-benchmark numbers and production reality is where most local-LLM ship-and-forget bugs live. + +### Lesson 4: the parser silently dropped half my output + +`DistractorGenerator`'s prompt asks for 5 wrong-answer words. Smoke test for `linearizability`: + +``` +consistency, atomicity, serialization, concurrency, visibility +``` + +Five single-word distractors. Clean. Then I tried `eventual consistency`: + +``` +strong consistency, read-after-write, data loss, causality, serialization +``` + +Now look at the parser: + +```csharp +.Where(w => w.Length > 1 + && w.Length < 50 + && w.Any(char.IsLetter) + && !w.Equals(originalWord, StringComparison.OrdinalIgnoreCase) + && !w.Contains(' ')) // ← drops "strong consistency", "data loss" +``` + +The filter rejects multi-word entries. Three of the five gone. With the `distractors.Count >= 3` requirement, the call returned `null` and the fire-and-forget path fell back to the hardcoded random-word picker. + +The filter was there since the original implementation. **qwen3 outputs single tokens by default, so the constraint was hidden. Gemma 4 prefers phrasal answers** — it's the most cross-model-family-sensitive parsing surface you'll hit when swapping. The fix was a single line in the prompt: + +``` +- SINGLE WORD ONLY — no spaces, no multi-word phrases + (use "linearizability" not "strong consistency"). Hyphens are fine. +``` + +After all four fixes, a real production save of `warehouse` returned: + +```json +["storeroom", "depot", "facility", "silo", "loft"] +``` + +Five domain-adjacent single-word distractors, exactly the shape the prompt asks for. That's the moment local Gemma 4 was finally doing real work. + +### Lesson 5: the worker had been silently failing for two months + +While collecting production stats for *this article*, I grepped the worker logs: + +```bash +$ docker compose logs worker | grep "Connection refused" +... lots of lines ... +``` + +`docker-compose.yml` had set `Ollama__BaseUrl` on the `api` service but **not on the `worker` service**. The worker fell back to the default (`localhost:11434` inside the worker container — there is nothing there) and every `BookMetadataGenerator` call hit `Connection refused` silently. Every user-uploaded book ended up with `genre = NULL`, which in turn meant the domain-aware translation prompt had nothing to bias against. + +This was a *second* silent fallback, completely orthogonal to the original one. Same shape, different surface. Fix: + +```diff + worker: + environment: ++ Ollama__BaseUrl: http://ollama:11434 ++ Ollama__Model: gemma4:e2b +``` + +Plus a one-shot `MetadataBackfillWorker` (a small `BackgroundService` that runs on worker startup) to heal the ~10 user-uploaded books with `genre = NULL`, idempotently. + +**The pattern is the lesson.** Anywhere you distribute environment via a compose file, ask: which services *actually need this variable* and is the variable set on each of them? "Inherits from .env" is not a thing in docker-compose service blocks. + +### Lesson 6: turn off thinking mode for structured outputs + +Modern Ollama models (including Gemma 4) default to a chain-of-thought "thinking" pass before the final answer. For freeform reasoning that's a quality win. For my use case — output a 5-element list of single words — the thinking pass is pure overhead. Every request was generating 50–200 tokens of internal reasoning the parser then threw away. + +In the Ollama call options: + +```diff +- options: { "temperature": 0.7 } ++ options: { "temperature": 0.7, "think": false } +``` + +Roughly halved the per-request token output. Roughly halved end-to-end latency. The quality of the distractors did not drop in my testing — for "give me 5 plausible wrong-answer words for `warehouse`", chain-of-thought wasn't doing anything load-bearing. + +If you're using Ollama for structured outputs, this is the single biggest perf knob most people don't know about. + +## The second swap: e4b → e2b + +After all six lessons above, distractor calls were succeeding at ~100%. But end-to-end save latency was still tail-heavy. Looking at the numbers honestly: most calls landed in the 30–60 s range, and the 90 s timeout was absorbing what should have been a comfortable fit. + +Two things were happening at once: + +1. **E4B's 13 GiB resident was contesting RAM with Postgres + .NET** on a 30 GB box. Not OOM-level, but the working set wasn't always in cache. +2. **Even with `think=false`, e4b is genuinely slow on a 6-core CPU.** I'd been benchmarking on a warm cache and short prompts; longer prompts (explanations, multi-sentence hints) routinely hit 60 s+. + +I swapped to **e2b**: + +| Metric | e4b (after all fixes) | e2b (current prod) | +|---|---|---| +| Disk | 9.6 GB | **7.2 GB** | +| RAM resident with `KEEP_ALIVE=-1` | 13 GiB | **7.7 GB** | +| Inference speed on same CPU | baseline | **~2–3× faster** | +| Quality on single-word distractor task | reference | **comparable** for short structured outputs | + +The first-pick reasoning ("E2B's quality is too weak for technical vocabulary") had been based on a *quality* benchmark. The real production constraint turned out to be *latency*. **For short structured outputs — distractor lists, single-line hints — e2b is fast enough that quality differences disappear into the prompt template**. The prompt was doing more work than I'd given it credit for. + +For longer freeform outputs (the 2–3 sentence native-language explanation), e2b is measurably less polished. Acceptable for the use case (it's a study aid, not a translation). If a future task demands better explanation quality, the path is a fine-tune of e2b on TextStack's domain corpus, not jumping back to e4b. Same hardware envelope, better domain fit. + +## Numbers (real, post-e2b) + +The numbers below are measured on the production server: AMD Ryzen 5 4600H, 6 cores / 12 threads, 30 GiB RAM, no GPU. Same box that serves traffic to [textstack.app](https://textstack.app). + +| Metric | Value | +|---|---| +| **Disk (`gemma4:e2b`)** | 7.2 GB | +| **RAM resident** with `KEEP_ALIVE=-1` | 7.7 GB | +| **Cold load** (container restart) | ~10 s | +| **Distractor cost per word** | ~0¢ (CPU on existing box) | +| **Equivalent OpenAI cost** | ~5¢ per word at gpt-5-mini rates | + +### Load test: 63,000 requests, 100% success, $0.002 + +After the e2b swap I stress-tested the production deploy with [LoadSurge](https://github.com/mrviduus/textstack/tree/main/tests/TextStack.LoadTests). Three scenarios — `GET /health`, `POST /translate`, `POST /explain` — at 30–50 virtual users for 30–60 seconds each. Headlines: + +| | | +|---|---| +| Total requests | **63,000** | +| Success rate | **100%** (0 failures) | +| Worst-case p95 latency | **20.5 ms** (smoke; translate and explain were lower) | +| Sustained RPS at 50 VU | **500** | +| OpenAI cost during the run | **$0.002** (10 cache-prewarm calls; zero during the stress phase) | +| Peak temperature on the host | **42 °C** (throttle threshold 95 °C) | + +The interesting part isn't the throughput — 500 RPS on a $20 box is real but not surprising for cached HTTP. The interesting part is that the expensive path disappeared entirely behind the cache. Translate and Explain are keyed by `(input, target_language, genre, sentence)`; on a hot cache the LLM never enters the request lifecycle. + +The auth-gated `POST /me/vocabulary/words` path that triggers actual Gemma 4 distractor generation wasn't covered by this run — that's the next test, with test-auth tokens and a bounded-concurrency queue in front of Ollama. The full per-scenario breakdown is in [`docs/loadtest/run-20260511-103451/REPORT.md`](https://github.com/mrviduus/textstack/blob/main/docs/loadtest/run-20260511-103451/REPORT.md). + +## Where OpenAI stays + +The split after both swaps: + +| Task | Provider | Why | +|---|---|---| +| Vocabulary distractors | **Local Gemma 4 e2b** | Tolerable quality, fire-and-forget, no per-user cost | +| Word hints | **Local Gemma 4 e2b** | Same | +| Native-language explanations | **Local Gemma 4 e2b** | Same; acceptable on long-form quality given the use case | +| Book metadata enrichment | **Local Gemma 4 e2b** | Same | +| Translation (18+ langs, incl. Ukrainian) | OpenAI gpt-5-mini | Small-model multilingual translation is still a weak spot | +| In-reader term explanation (live) | OpenAI gpt-5-mini | <1 s latency requirement during reading | + +Local LLMs aren't a wholesale cloud replacement. **They're a tool for tasks where quality is tolerant, latency is amortizable, privacy matters, or per-user cost matters.** When any of those breaks down — multilingual translation, latency-sensitive UI — cloud still wins. + +## Lessons (for anyone shipping local LLMs) + +**Silent fallback is the worst kind of bug.** Distractor generation had been failing in production for 60+ days and I had no signal — the fallback was a hardcoded random-word picker, indistinguishable to the user. **And it happened twice in the same system, on two different surfaces** (Ollama-not-installed, then Worker-can't-reach-Ollama). Next time: emit `llm.success` and `llm.fallback` counters per service, alert if the ratio drifts above 5%, and never make fallbacks bit-for-bit indistinguishable from the primary path. + +**Floating image tags lie.** Pin Ollama, pin Postgres, pin everything. `latest` freezes the day Docker pulls it; two months later it's lagging upstream and you have no signal until a new model breaks it. + +**Defend at parse, always — even if your model behaved on first try.** Same prompt — qwen3 returns single tokens, Gemma 4 returns phrases. The parser's pre-existing `!w.Contains(' ')` filter was correct in spirit but hidden from the model. Moved into the prompt, it became explicit and Gemma satisfied it. + +**Bench with real prompts on real hardware.** I tested e4b's quality on warm-cache short prompts and concluded it was the right pick. Real production tail latency on longer prompts was 3× what the smoke test suggested, and that's what forced the e2b downgrade. Toy benchmarks hide both model-family quirks (parsing) and hardware-bound failure modes (CPU latency). + +**Turn off thinking mode for structured outputs.** `think: false` is the single biggest perf knob on Ollama for short structured tasks. Most documentation doesn't surface it. + +**Distribute env vars deliberately across services.** Docker-compose service blocks don't inherit from each other. Whichever service actually needs a variable — list it explicitly in *that service's* env block. The day you add a new service, audit every variable. + +--- + +> The interesting part wasn't that the model failed. It was how long the system kept pretending it hadn't. + +## What's next + +**Fine-tune Gemma 4 e2b on TextStack's distractor task.** I now have a real production corpus building (a few hundred (term, distractor-list) pairs per week post-fix). The corpus that existed before the fix is gone — every distractor it produced came from the hardcoded fallback, not the model. The dataset starts fresh. + +**Add a bounded-concurrency queue in front of Ollama for the write path.** From the load test recommendations: a `Channels`-based worker with `MaxConcurrency = 2` plus a per-`(word, language)` shared cache. Mirrors the translate/explain caches that just held 500 RPS with zero LLM cost. + +**Run a second load test against the auth-gated write path.** The 63k-request test only measured cached reads. Distractor generation is the actual bottleneck, and it sits behind authentication. Need test-auth tokens and 10–20 VU to bound it. + +The full TextStack codebase is AGPL-3.0 at [github.com/mrviduus/textstack](https://github.com/mrviduus/textstack). If you've shipped local-LLM features in production, **run `ollama list` on your server, then `docker compose logs worker | grep -i refused`**. One of those might surprise you. Mine surprised me twice in the same codebase — same shape, different surface, two months apart. That's the part of operating local LLMs that nobody writes about, and the part that takes the longest to learn. + +--- + +*If you found this useful, the strongest signal is a star on the [repo](https://github.com/mrviduus/textstack). Every star tells me the next person abandoning DDIA mid-way might find this tool — and that's the whole point.* +``` + +--- + +## Posting checklist + +- [ ] Open https://dev.to/new — or use the prefilled Build template URL: https://dev.to/new?prefill=---%0Atitle%3A%20%0Apublished%3A%20%0Atags%3A%20devchallenge%2C%20gemmachallenge%2C%20gemma%0A--- +- [ ] Title: **I shipped local LLM features two months ago. Production never ran them once.** +- [ ] Tags: `devchallenge`, `gemmachallenge`, `gemma` (and optionally `ollama` as the 4th) +- [ ] Verify the first line is exactly `*This is a submission for the [Gemma 4 Challenge: Build with Gemma 4](https://dev.to/challenges/google-gemma-2026-05-06)*` — without it the post is not a valid submission +- [ ] Cover image: generate via Dev.to "Generate Image" with the prompt above +- [ ] Paste the body markdown — both media URLs are already inlined and point at `raw.githubusercontent.com/mrviduus/textstack/main/docs/marketing/srs-mcq-card.png` and `.../srs-mcq-demo.gif`. No drag-and-drop needed in the editor. +- [ ] Confirm media commits are pushed to `main` (`git log docs/marketing/srs-mcq-*.{png,gif}`); without push the raw URLs will 404 when Dev.to fetches them +- [ ] Verify all GitHub PR links resolve (#232, #233, #234) and the four follow-up commit links resolve (`3999944`, `a5a76d8`, `966b398`, `c6db540`) +- [ ] Verify the `DistractorGenerator.cs` and `BookMetadataGenerator.cs` paths in the Code section resolve to actual file URLs +- [ ] Advanced Options → Canonical URL: leave empty (Dev.to is the original) +- [ ] Preview — check tables, code blocks, the inline diff blocks render +- [ ] Schedule for 2026-05-12 12:30 UTC (= Tuesday 8:30 ET, peak Dev.to traffic window) + +## Post-publish (within 60 min on Tuesday morning) + +- [ ] Tweet from `@Rexetdeus` linking the Dev.to URL — quote the strongest line ("Production never ran them once") not the whole title +- [ ] Post in `/r/selfhosted`, `/r/LocalLLaMA`, and `/r/dotnet` with the article link and a 2-sentence Reddit-native intro (do not crosspost the same blurb to all three — each subreddit has its own tone; ready-to-paste texts in `social-media-pack.md`) +- [ ] Post in HackerNews — it'll either land or not, but the bar to try is low +- [ ] Reply to the first 3-5 comments on dev.to within 2 hours — judges weight engagement, and reactions are the official tie-breaker +- [ ] DM Jess Lee (organizer) to thank her for running the challenge — she's actively reading comments on her launch post and a polite tag visibility helps + +## Why this version is structured to win + +- **Submission header line is in.** The previous draft missed this; without it dev.to wouldn't count the post in the eligible pool. +- **Tags fixed** to the official three from the announcement. +- **Build template structure** (`What I Built` / `Demo` / `Code` / `How I Used Gemma 4`) is satisfied as section headers, not just implicitly. +- **"Intentional model selection"** — the first Build judging criterion — is shown twice over: first the E2B/E4B/31B/26B MoE trade-off table, then the *actual production downgrade* from e4b to e2b after measurement. That's a stronger demonstration than any other current Build entry's single-pick reasoning. +- **Demo media is real, in the repo, and inlined via raw.githubusercontent URLs** — no upload friction, no broken placeholders. +- **Six bug-chain lessons + a 63k-request load test** establish "technical implementation and code quality" against the second Build criterion. +- **All numbers are measured on the production server**, not estimated. The 100% success at 500 RPS, the p95 = 20.5 ms, the $0.002 cost — these are reproducible from the linked LoadSurge harness. +- **Hook is preserved** ("Production never ran them once") — strongest line in the field of current submissions. + +## Field analysis (snapshot Monday May 11, ~11:00 AM ET) + +`#gemmachallenge` field grew from 12 entries on May 8 to 85 on May 11. The top of the field by reactions: + +| Reactions | Title | Category | Notes | +|---|---|---|---| +| 85 | Gemma 4: Why Local AI is Finally Becoming Personal | Write | Generic essay; boost window closed (+11 over 4 days) | +| 12 | Your SOS App Can't Help... Local AI Safety Layer | Write | Concept-only, no built thing | +| 11 | The Local Model That Doesn't Sleep: MTP Marathon Engine | Write | Server-GPU technical deep-dive — different audience | +| 10 | Gemma 4 Has Four Models. Here's Which One You Need | Write | Comparison piece | +| 9 | The End of Renting Intelligence? | Write | Op-ed | +| 8 | I Tested Every Gemma 4 Model on a GTX 1650 | Write | Consumer-GPU benchmark; new on May 11 | +| ≤7 | All Build entries combined | Build | **Top Build entry has 7 reactions** | + +**Tahosin's 57-reaction "$500 GPU → $75 Raspberry Pi" piece has vanished from top/week.** Possibly DQ'd over missing `#gemma` tag; possibly withdrawn. One serious adjacent threat removed itself from the field. + +**No current Build entry has a deployed product with real production data, multiple model swaps, AND a 63k-request load test.** That's our wedge. + +## What changed from the previous draft + +Cuts: +- "AI / opensource / selfhosted" tags (replaced with official three) +- The whole "stuck on e4b" narrative — replaced with two-swap arc +- Estimated/wishful "warm 2.8s" inference number — replaced with measured prod data +- 10s → 30s timeout fix — corrected to the real story (30s → 90s after measurement) +- `gpt-4.1-nano` references everywhere — corrected to current prod model `gpt-5-mini` + +Adds (vs. yesterday's docx): +- **E4B → E2B downgrade story** with reasoning grounded in measured prod data — directly serves the "intentional and effective use of the chosen Gemma 4 model" judging criterion +- **Lesson 5 — Worker `Connection refused`** silent fallback (second silent-fallback surface in the same system) +- **Lesson 6 — `think=false` Ollama optimization** that roughly halved latency +- **Real warehouse example** — `warehouse → ["storeroom", "depot", "facility", "silo", "loft"]` as the "moment local Gemma 4 was finally doing real work" +- **Real production stats** — 13 saves, 2/13 success rate before the 90s timeout fix, ~100% after +- **63,000-request load test section** with full per-scenario table and the $0.002 OpenAI cost +- **What's next** expanded with bounded-concurrency-queue plan from the load test recommendations +- **6 lessons** in the closing section instead of 4 + +Word count: ~2,650 (was ~1,700). Length is justified by the production data and second swap; this is now a genuine "I shipped this twice and measured both" Build entry, not a single-decision narrative. diff --git a/docs/04-articles/play-store-android-developer-verification-gotcha.md b/docs/04-articles/play-store-android-developer-verification-gotcha.md new file mode 100644 index 00000000..fb7eb0d4 --- /dev/null +++ b/docs/04-articles/play-store-android-developer-verification-gotcha.md @@ -0,0 +1,177 @@ +# The Five Things Google Doesn't Tell You About Shipping an Expo App to Play Store + +**TL;DR:** I tried to ship TextStack's first Android build to Google Play. Five separate gotchas cost me an evening — each one cheap individually, but they compound in a specific order that you only learn by hitting them. Here's the order, the fix for each, and the order-of-operations that would have saved me a fresh build at every step. + +--- + +## The Setup + +TextStack is an Expo / React Native reader (EPUB / PDF / FB2 → web-parity reader, vocab SRS, offline library). The web app and PWA have been live for months. Mobile reader has been usable internally for a while. Today's plan: push the first APK to a tester device + queue the AAB for Google Play **Internal testing**. + +Stack relevant to this story: +- **Expo SDK 55**, managed workflow (no checked-in `android/` directory — generated on every build) +- **EAS Build** for cloud builds (`eas build -p android --profile production` → AAB; `--profile preview` → APK) +- **EAS-managed keystore** (a single upload key shared across `production` and `preview` profiles — fingerprint stays the same) +- **Google Play Console**, multi-account environment (TODO: details on the multi-account setup — `vasyl.vdov@gmail.com` admin, second account for the developer registration, why the split exists) + +What I expected: drag an AAB into Play Console, fill in a few forms, hit publish, done. + +What actually happened: **Play Console rejected the AAB before even letting me create the listing**, because the package name `app.textstack.mobile` wasn't *verified* yet against my developer account. + +That's where the rabbit hole starts. + +--- + +## Gotcha #1: AAB vs APK at ownership-verification time + +Play Console's package-ownership flow is one of those things Google added late and never quite finished documenting. For a **new app you've never uploaded before**, Google needs to prove that the developer account uploading it actually owns the package name. + +The flow they hand you is called **"Sign and upload an APK"**, and the word *APK* there is literal. You can have a perfectly valid `.aab` ready to ship — Play Console **will not accept it for the verification step**. It wants an APK signed with the same upload key. + +So my production build pipeline (which only produced an AAB) was useless for this one specific gate. I needed a parallel APK build with the same key. + +```bash +# This produces an .aab — for actual rollout to Internal/Closed/Production tracks. +eas build -p android --profile production + +# This produces an .apk — same upload key, but APK distribution. +# Used for the ownership-verification step only. +eas build -p android --profile preview +``` + +`eas.json` profiles were already shaped right (`buildType: "apk"` on `preview`, `STORE` distribution on `production`); the fix was just to *run the preview profile too*, not to change config. ~15 min in the cloud queue. + +> **Save yourself a step:** the first build you upload to a brand-new Play Console app should be an APK, not an AAB. After ownership is verified, switch to AAB for actual releases. + +--- + +## Gotcha #2: The Android Developer Verification snippet + +After uploading the APK Play Console came back with a second hurdle: a **registration snippet** that has to live inside the APK itself, at a specific path, with a specific filename. + +The snippet for my account looked like this: + +``` +DP5ACMZ5E2B4MAAAAAAAAAAAAA +``` + +It belongs in `android/app/src/main/assets/adi-registration.properties` inside the built APK. Play Console reads it at upload time to confirm the build came from the developer account claiming the package. + +That's already weird (a per-account token baked into the binary?), but the *real* trap is the next gotcha. + +--- + +## Gotcha #3: `expo prebuild` blows away the `android/` directory + +Expo's **managed workflow** doesn't ship a checked-in `android/` folder. Every EAS build runs `expo prebuild` first, which **regenerates** `android/` from scratch based on `app.json` and installed packages. Anything you put under `android/` manually gets nuked at build time. + +That means a "just paste the file into `android/app/src/main/assets/`" workaround works **once** locally, then disappears the moment EAS builds in the cloud. + +The fix is a **config plugin** — a small Node module that runs during `prebuild` and writes files into the generated tree, every time: + +```js +// apps/mobile/plugins/with-adi-registration.js +const { withDangerousMod } = require('expo/config-plugins'); +const fs = require('fs'); +const path = require('path'); + +const ADI_SNIPPET = 'DP5ACMZ5E2B4MAAAAAAAAAAAAA'; + +module.exports = function withAdiRegistration(config) { + return withDangerousMod(config, [ + 'android', + async (config) => { + const assetsDir = path.join( + config.modRequest.platformProjectRoot, + 'app', 'src', 'main', 'assets' + ); + fs.mkdirSync(assetsDir, { recursive: true }); + fs.writeFileSync( + path.join(assetsDir, 'adi-registration.properties'), + ADI_SNIPPET + '\n', + 'utf8' + ); + return config; + }, + ]); +}; +``` + +Wire it into `app.json`: + +```json +{ + "expo": { + "plugins": [ + "expo-router", + "expo-secure-store", + "./plugins/with-adi-registration" + ] + } +} +``` + +The `withDangerousMod` API is the right tool: it lets you write directly into the prebuild output. The "dangerous" name is real — anything in there runs after Expo's own modifications, so you have to be careful not to stomp generated config — but for "write one file into assets" it's exactly what you need. + +> Once ownership is verified on Play Console, you can remove the plugin. The token file is only needed at verification time, not for ongoing releases. + +--- + +## Gotcha #4: Multi-account Play Console gymnastics + +<!-- TODO: this section needs vasyl's notes: + - Why two Google accounts (vasyl.vdov@gmail.com admin + the dev-registration account)? + - Where exactly did the verification token come from — Console > Setup > ? > ? + - Three declarations (data safety / target audience / something else?) and which one is the one most people forget + - Did `vasyl.vdov@gmail.com` need to be added as a second admin to the second account, or was it the other direction? +--> + +*(Filling in once Vasyl shares the multi-account walkthrough.)* + +--- + +## Gotcha #5: Three declarations Play Console won't let you skip + +<!-- TODO: which three? Best guesses: + 1. App Content / Privacy policy (URL) + 2. Data safety (questionnaire about what data you collect, share, encrypt in transit, etc) + 3. Government / news / financial declarations? Target audience + content rating? + Need Vasyl's actual sequence + which one was the most annoying. +--> + +*(Filling in once Vasyl confirms which three he hit and in what order.)* + +--- + +## The order-of-operations I wish I'd had + +If I were doing this from a clean repo, in the order that avoids redoing builds: + +1. **Create the Play Console app first** — get to the screen where it asks for the verification snippet, before you build anything. +2. **Add the snippet via a config plugin**, not by hand-editing `android/`. Commit the plugin. +3. **Build a preview APK** (`--profile preview`). This is the APK you upload for "Sign and upload an APK". Same keystore as production, signed by EAS. +4. **Drag the APK into the ownership-verification form.** Wait for Play Console to confirm. +5. **Now** build the production AAB (`--profile production`). Same keystore, same package name, but the format Play Console actually wants for Internal/Closed/Production tracks. +6. **Upload the AAB to Internal testing track.** Add tester emails (yourself first). Roll out. + +Total: 2 EAS builds (one APK, one AAB), 0 wasted ones — if you do it in this order. + +Total I actually did: **4 EAS builds**, because I learned each gotcha in production order and had to re-run with the fix. + +--- + +## What's still hand-wavy + +The bits I haven't actually run myself yet (filling in as I go): + +- The multi-account Play Console flow — there's a *reason* it's split that's specific to TextStack. (Section above.) +- The three declarations — I know there are at least three forms Play Console blocks rollout on, but my notes from today don't match what I expected. (Section above.) +- `eas submit` for future releases — once ownership is verified, the manual upload becomes a one-line `eas submit --platform android --profile production`. Haven't wired that up yet because I needed to clear ownership first. + +--- + +## Why I'm writing this down + +The Expo + Play Store gotchas above are individually all over StackOverflow, but I couldn't find a single page that strung them together in the order you'll hit them on a brand-new app. The order matters: Gotcha #1 sends you to build #2, Gotcha #3 invalidates the file you put in build #1, and Gotchas #4 and #5 block you *after* ownership is verified, so they hide until you've cleared the first three. + +If you're shipping your first Expo Android build to Play Console, hit the gotchas in the right order. Don't be me. diff --git a/docs/fixes/explain-404.md b/docs/fixes/explain-404.md new file mode 100644 index 00000000..f3ae0e99 --- /dev/null +++ b/docs/fixes/explain-404.md @@ -0,0 +1,199 @@ +# Fix: `/explain` returns 404 in production + +## Observed (production) + +On `textstack.app`, after fixes from `tap-on-word-and-explain.md` already shipped: + +1. Open any user-uploaded book in the reader (URL pattern `/library/my/{id}/read/...`). +2. Select a sentence containing a technical term. +3. Click the 💡 Explain icon in the selection toolbar. + +Expected: 2–3 sentence LLM explanation popup. +Actual: popup shows `Explain failed: 404`. + +`/api/translate` on the same selection works correctly, so this is not a generic API outage — it's specific to the Explain endpoint. + +## Root cause (confirmed) + +Three asymmetries vs `Translate` (which works) cause this: + +### Asymmetry 1 — backend route registration + +`backend/src/Api/Endpoints/ExplainEndpoints.cs:13-17` registers ONE route only: + +```csharp +public static void MapExplainEndpoints(this WebApplication app) +{ + var group = app.MapGroup("/explain").WithTags("Explain"); + group.MapPost("", Explain).WithName("Explain").RequireRateLimiting("explain"); +} +``` + +Compare with `backend/src/Api/Endpoints/TranslationEndpoints.cs:11-20` which registers BOTH `/api/translate` and `/translate`: + +```csharp +public static void MapTranslationEndpoints(this WebApplication app) +{ + var group = app.MapGroup("/api/translate").WithTags("Translation"); + + group.MapPost("", Translate).WithName("Translate").RequireRateLimiting("translate"); + group.MapGet("/languages", GetLanguages).WithName("GetTranslationLanguages"); + + // Also map without /api/ prefix for nginx compatibility + app.MapPost("/translate", Translate).WithTags("Translation").WithName("TranslateCompat").RequireRateLimiting("translate"); +} +``` + +The dual-registration is what makes Translate work regardless of which path the request arrives on. Explain doesn't have it. + +### Asymmetry 2 — frontend URL + +`apps/web/src/api/explain.ts:1-20` uses an inconsistent URL convention (and a misleading comment): + +```ts +// API_BASE is the host (dev: http://localhost:8080) or `/api` (prod, nginx +// strips the prefix and proxies the rest to backend). Backend route is +// `/explain` (no prefix). Don't add `/api/` here or prod gets `/api/api/...`. +const API_BASE = import.meta.env.VITE_API_URL ?? '' + +// ... + +export async function explain(req: ExplainRequest, signal?: AbortSignal): Promise<ExplainResponse> { + const res = await fetch(`${API_BASE}/explain`, { +``` + +Compare with `apps/web/src/api/translation.ts:32` which uses the `/api/` prefix: + +```ts +const res = await fetch(`${API_BASE}/api/translate`, { +``` + +The "Don't add `/api/`" warning in the explain.ts comment is wrong as of the deployed prod build — `VITE_API_URL` is set to `/api` in prod (see `docker-compose.yml:143`, `Makefile:47`, `.github/workflows/deploy.yml:52`), so the comment's "or prod gets `/api/api/...`" scenario would actually be **the working** scenario for some routes (Translate's backend mounts both paths, so `/api/api/translate` is handled at backend by the `/api/translate` registration). Explain's backend mounts only `/explain`, so neither path reaches it cleanly. + +### Asymmetry 3 — nginx (cosmetic, but a consistency miss) + +`infra/nginx/textstack.conf:188-199` has a dedicated rate-limited location for translate: + +```nginx +# Translation endpoint — stricter per-IP rate limit +location /api/translate { + limit_req zone=translate_limit burst=2 nodelay; + proxy_pass http://textstack_api/translate; + proxy_http_version 1.1; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto https; + proxy_set_header X-Site-Id general; + proxy_set_header Connection ""; +} +``` + +No equivalent `/api/explain` block exists, so Explain falls through to the generic `/api/` catchall — which is fine in principle, but means there's no per-endpoint rate limiting. + +## Fix — exact diffs + +### Diff 1: `backend/src/Api/Endpoints/ExplainEndpoints.cs` + +Replace lines 13–17 with: + +```csharp +public static void MapExplainEndpoints(this WebApplication app) +{ + var group = app.MapGroup("/api/explain").WithTags("Explain"); + group.MapPost("", Explain).WithName("Explain").RequireRateLimiting("explain"); + + // Also map without /api/ prefix for nginx compatibility — mirrors + // TranslationEndpoints. Without this the endpoint 404s in production when + // nginx forwards requests stripped of the /api/ prefix and the build + // happens to ship a frontend bundle that hits the bare path. + app.MapPost("/explain", Explain).WithTags("Explain").WithName("ExplainCompat").RequireRateLimiting("explain"); +} +``` + +### Diff 2: `apps/web/src/api/explain.ts` + +Replace lines 1–4 with: + +```ts +// API_BASE is the host in dev (http://localhost:8080) or "/api" in prod +// (nginx routes /api/* to the backend). Use the same /api/ prefix as +// translation.ts — backend mounts the Explain handler at BOTH /explain and +// /api/explain (see ExplainEndpoints.cs), so this works in either env. +const API_BASE = import.meta.env.VITE_API_URL ?? '' +``` + +Replace line 20: + +```ts + const res = await fetch(`${API_BASE}/explain`, { +``` + +with: + +```ts + const res = await fetch(`${API_BASE}/api/explain`, { +``` + +### Diff 3: `infra/nginx/textstack.conf` + +Above the existing `location /api/translate { ... }` block at line 188, add a new explain rate-limit zone. Find the existing rate-limit zones near the top of the file (probably `limit_req_zone ... zone=translate_limit:10m rate=5r/m;` or similar) and add: + +```nginx +limit_req_zone $binary_remote_addr zone=explain_limit:10m rate=20r/m; +``` + +Then between lines 199 and 200 (after the translate location, before the generic `/api/` block at 202), add: + +```nginx +# Explain endpoint — per-IP rate limit, same shape as translate. +location /api/explain { + limit_req zone=explain_limit burst=2 nodelay; + proxy_pass http://textstack_api/explain; + proxy_http_version 1.1; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto https; + proxy_set_header X-Site-Id general; + proxy_set_header Connection ""; +} +``` + +The backend already has a `RequireRateLimiting("explain")` policy at `Program.cs:253`, so the nginx zone is defense-in-depth, not the only line. If you'd rather rely on the backend limiter and skip nginx changes, that's acceptable — the route fix above is what makes Explain work; nginx is consistency-only. + +## Verification + +After deploying, run: + +```bash +# 1. Both routes accept POST (backend dual-registration check). +curl -i -X POST https://textstack.app/api/explain \ + -H "Content-Type: application/json" \ + -d '{"word":"polling","sentence":"the client repeats the query every 5 seconds (this is known as polling).","targetLang":"ru","genre":"computer science"}' +# Expected: HTTP 200 with JSON {explanation, word, cached} + +curl -i -X POST https://textstack.app/explain \ + -H "Content-Type: application/json" \ + -d '{"word":"polling","sentence":"the client repeats the query every 5 seconds (this is known as polling).","targetLang":"ru","genre":"computer science"}' +# Expected: HTTP 200 with identical JSON shape (might hit cache from previous call) +``` + +Both should return 200, not 404. The explanation text in the response should be 2–3 sentences in Russian explaining the distributed-systems meaning of "polling" (periodic client query, not electoral polls). + +Then in the UI: + +1. Sign in, open any user-uploaded book → reader → select a sentence → click 💡 → expect a 2–3 sentence explanation popup, not `Explain failed: 404`. +2. Repeat on a curated library book → same behavior. + +## P.S. — sanity-check Bug 1 runtime while you're in there + +Bug 1 from the previous brief (`tap-on-word-and-explain.md`) is wired correctly end-to-end in the code, but the live production behavior on `warehouse` in a DDIA upload still shows `almacén` without the README-promised parenthetical clarifier (`almacén (almacén de datos)`). Likely cause: the user-book's `Genre` field is NULL in the DB because `BookMetadataGenerator` (Ollama, fire-and-forget) hasn't populated it. With `genre = null`, the prompt still has `sentence` but no `Domain hint:` line, so `gpt-5-mini` may not consistently trigger the clarifier path. + +Quick diagnostic (one query): + +```sql +SELECT id, title, genre FROM user_books WHERE title ILIKE '%data-intensive%' OR title ILIKE '%DDIA%'; +``` + +If `genre` is NULL: either backfill it from the title/description for a sample of books, OR strengthen the prompt to derive a domain hint from the sentence alone when genre is missing (e.g. add: "If no domain hint is given, infer one from the sentence context."). Out of scope for this PR — file as a follow-up. diff --git a/docs/fixes/readme-demo-gif.md b/docs/fixes/readme-demo-gif.md new file mode 100644 index 00000000..bc6ab3b0 --- /dev/null +++ b/docs/fixes/readme-demo-gif.md @@ -0,0 +1,123 @@ +# Task: wire the new product demo GIF into README + +A real demo GIF (`docs/demo.gif`, 190 KB, 4.5s) has been added to the repo. The current README already references `docs/demo.gif` at line 32 but the alt text and surrounding HTML comments describe a different flow (the old "tap-word → translation" idea). Update README to reflect what the GIF actually shows, and add one extra placement so the GIF appears where the feature is explained, not only at the top. + +## What the new GIF actually shows + +`docs/demo.gif` is a 4.5-second screen recording captured on `textstack.app`: + +1. Reader open on *Designing Data-Intensive Applications*, chapter 6 (Trade-offs in Data Systems Architecture) +2. User highlights the phrase **"Extract-Transform-Load (ETL)"** +3. Selection toolbar appears (4 highlight colors, Translate, **Explain (💡)**, TTS, Copy) +4. User clicks 💡 +5. **Explanation popup** opens with a **2-3 sentence Spanish explanation**: *"En este contexto, 'Extract-Transform-Load (ETL)' se refiere a un proceso en el que los datos se recogen de diferentes fuentes (extracción), se modifican o limpian según las necesidades (transformación) y luego se almacenan en un lugar central para su análisis (carga). Es como recoger ingredientes de varias tiendas, prepararlos y luego guardarlos en una despensa lista para cocinar."* + +It demonstrates the **Explain feature** specifically, not the simpler tap-word translation. Headline message: *select any technical phrase → 2-3 sentence explanation in your native language, with the book's domain in mind, complete with a concrete analogy*. + +The GIF intentionally uses **Spanish** rather than Russian for the target language so the asset is politically neutral for international audiences (build-in-public threads, Hacker News, LATAM dev community). Don't swap to Russian. + +## Changes to `README.md` + +### Change 1 — replace the placeholder block at lines 29–32 + +Current: + +```markdown +<!-- TODO: Replace with actual demo GIF showing tap-word → translation flow --> +<!-- Suggested: 3-5 second GIF, ~600px wide, showing a real DDIA paragraph with a tap interaction --> + +![TextStack demo — tap any term, get a context-aware translation](docs/demo.gif) +``` + +Replace with: + +```markdown +![TextStack demo — select any technical phrase, get a 2-3 sentence explanation in your native language](docs/demo.gif) +``` + +(Drop the two HTML TODO comments — they're stale. The placeholder is filled in.) + +### Change 2 — tighten the surrounding lines + +Line 27 currently reads: + +```markdown +I quit *Designing Data-Intensive Applications* three times before I built this. +``` + +Keep that line — it's the perfect lead-in for the GIF, because the GIF is filmed on DDIA. No change needed. + +The pull-quote line right after the GIF (line 34): + +```markdown +> ⭐ Star the repo if you've ever abandoned a technical book mid-way — it's the strongest signal that this kind of tool is worth building. +``` + +Keep as is — good CTA flow. + +### Change 3 — add a second placement deeper in the README + +The GIF demonstrates the *Explain* feature, but right now Explain isn't called out anywhere by name in README — it's only mentioned obliquely as "Explanation mode" in the description (line 56-59). Add an explicit feature heading. + +Find the "**Reader**" subsection at line 83-90: + +```markdown +**Reader** +- Kindle-like experience — themes (light/sepia/dark), fonts, fullscreen, + keyboard shortcuts +- Text selection — contextual translation in 18+ languages (OpenAI + `gpt-5-mini`), explanation mode for English-only readers, dictionary + fallback (Free Dictionary API), highlights +- TTS — Edge TTS via direct WebSocket (200+ voices, 0.75×–2.0× speed, two- + layer cache) +- Offline reading — PWA with IndexedDB caching, download manager +``` + +Promote the Explain feature inline. Replace the second bullet ("Text selection — ...") with two bullets: + +```markdown +- **Explain** — select any technical phrase → 2-3 sentence explanation in your native language, aware of the book's domain. Uses OpenAI `gpt-5-mini`. Includes a concrete analogy when the term is technical (see GIF above). +- Text selection extras — contextual translation in 18+ languages, dictionary fallback (Free Dictionary API), highlights +``` + +This way Explain has its own line, the GIF is referenced as the proof, and translation/dictionary remain mentioned but as secondary text-selection capabilities. + +### Change 4 — don't break the "Why I built it" link + +Line 21 has a link to the dev.to article (`Why I built it`). Keep untouched — it's the lead-in narrative the GIF supports. + +## Verification + +After applying the changes: + +1. `grep "docs/demo.gif" README.md` should return exactly **one** line (the new image tag), not the old TODO comments. +2. Open the README on GitHub (or render locally with `gh readme view`, or just view the file in a markdown previewer). The GIF should auto-play and loop. File size 190 KB — well under any limits. +3. The "Reader" feature list should now have a dedicated **Explain** bullet. +4. No broken image references — `docs/demo.gif` exists, is 190 KB, 4.5s long. + +### Change 5 — hero image has been replaced (action: refresh alt text) + +`docs/assets/hero.png` has been replaced. The previous hero showed a "calm place to read books online" tagline with classical-literature covers (The Goldfinch, Renaissance paintings) — leftover from an earlier general-reader positioning. It was actively off-message for the current "reader for devs finishing tech books" pitch. + +The new `docs/assets/hero.png` (1600×806, 259 KB) is a clean product screenshot of `textstack.app`: DDIA reader text with **Extract-Transform-Load (ETL)** highlighted and the Explain popup open showing the Spanish 2-3 sentence explanation (same flow as `docs/demo.gif`, just frozen at the popup-visible moment). The old hero is preserved as `docs/assets/hero-old-classical.png` in case revert is needed. + +In `README.md` line 10: + +```markdown +<img src="docs/assets/hero.png" alt="TextStack — a reader for developers finishing English technical books in their native language" width="800"> +``` + +The current alt text is still accurate for the new hero, so **no string change needed** — just confirm the image renders correctly when README is viewed. If you want sharper alignment, optionally tighten alt to: + +```markdown +<img src="docs/assets/hero.png" alt="TextStack reader showing an Explain popup over the term 'Extract-Transform-Load' with a 2-3 sentence Spanish explanation" width="800"> +``` + +…but this isn't required. The existing alt is fine. + +## Out of scope + +- Don't re-record the GIF in a different language. Keep Spanish — it's deliberate (politically neutral, broad audience, demonstrates non-English target). +- Don't move the GIF file to `docs/assets/` — README path expects `docs/demo.gif` and we shouldn't churn paths. +- Don't compress the GIF further — 190 KB at 720p × 4.5s is already a good balance; smaller would degrade text legibility. +- Don't restore the old classical-literature hero from `hero-old-classical.png` — it represents stale positioning, kept only as an audit-trail backup. diff --git a/docs/fixes/tap-on-word-and-explain.md b/docs/fixes/tap-on-word-and-explain.md new file mode 100644 index 00000000..94da9e60 --- /dev/null +++ b/docs/fixes/tap-on-word-and-explain.md @@ -0,0 +1,162 @@ +# Fix: domain-aware tap-on-word translation + Explain 404 + broken book title + +This PR addresses three production bugs observed on textstack.app that together break the core "context-aware reader" promise from the README. Tackle in one PR — they're all in the same surface area. + +## Bug 1 — Tap-on-word translation is dictionary-grade, not domain-aware + +### Observed +- DDIA, tap "polling" with target Russian → popup shows `опросы` (electoral-polls meaning, wrong domain) +- DDIA chapter on data systems, tap "warehouse" with target Portuguese → `armazém` + dictionary def "A place for storing large amounts of products. In logistics, a place where products go to from the manufacturer..." — wrong domain, despite an explicit "Data warehouse" diagram in the same paragraph + +### Root cause +- `apps/web/src/lib/wordBubbleFetch.ts:61` — `translateApi(word, bookLanguage, targetLang, signal)` is called with **no book context**. +- `apps/web/src/api/translation.ts:14–28` — sends only `{text, sourceLang, targetLang}`, no `editionId`/`bookId`/`sentence`. +- `backend/src/Api/Endpoints/TranslationEndpoints.cs:71–72` — current system prompt: + ```csharp + $"You are a translation engine. Translate from {srcLang} to {tgtLang}. " + + "Output ONLY the translated text. No preface, no quotes, no explanation." + ``` + No domain hint, no genre, no surrounding sentence. OpenAI defaults to the most common everyday meaning. + +### Fix + +**Backend `TranslationEndpoints.cs`:** + +1. Extend `TranslateRequest` to optionally accept `Guid? BookId`, `string? Sentence`, `string? Genre`. +2. Mirror the genre-lookup pattern from `ExplainEndpoints.cs:44–65` — when `BookId` is present and `Genre` is null, look up genre from `Editions` first then `UserBooks`. Wrap in try/catch, log warning on failure, fall back to "general". +3. Replace the system prompt with: + ```csharp + var domainHint = string.IsNullOrWhiteSpace(genre) + ? "" + : $"Domain hint: {genre.Trim()}. Prefer the domain-specific meaning over the everyday meaning when the word is ambiguous. "; + + var sentenceCtx = string.IsNullOrWhiteSpace(sentence) + ? "" + : $"Sentence context: \"{sentence.Trim()}\". "; + + var systemPrompt = + $"You are a translation engine for readers of technical books. " + + $"Translate from {srcLang} to {tgtLang}. " + + domainHint + + sentenceCtx + + "Output ONLY the translation. " + + $"If the word has a domain-specific meaning that differs from its everyday meaning, " + + $"append a SHORT clarifier in {tgtLang} parentheses, e.g. " + + $"\"увага (механізм у нейромережах)\" or \"опитування (періодичний запит до сервера)\". " + + "Otherwise output just the translation. No preface, no quotes, no markdown."; + ``` + The parenthetical-clarifier pattern is what the README explicitly promises (`увага (механізм у нейромережах)`). Make sure the prompt encourages it. +4. Extend the cache key to include `genre` (or domain bucket). Otherwise the first-translated word poisons the cache for all readers across all genres. + +**Frontend `apps/web/src/lib/wordBubbleFetch.ts` + `apps/web/src/api/translation.ts`:** + +1. Update `translate()` signature to accept optional `bookId` and `sentence`. +2. In `fetchWordBubble()`, extract the surrounding sentence using the same logic that `ReaderHighlights.tsx` already uses to build the Explain payload (search the codebase for the existing helper — don't reimplement). +3. Pass the current book's id to the call. For curated books that's the `editionId`, for user books that's the `userBookId` — pass whichever is in scope, the backend already handles both via the `Editions` → `UserBooks` cascade. +4. When called from contexts without book scope (preview mode, marketing landing widget, etc.), omit the new fields — the backend gracefully falls back to context-free behavior. + +## Bug 2 — `/explain` returns 404 in production + +### Observed +On user-uploaded DDIA, select sentence with "polling" → click 💡 (Explain) → popup shows `Explain failed: 404`. Translation on the same selection works fine, so it's not a generic api outage. + +### Root cause (suspected) +`backend/src/Api/Endpoints/ExplainEndpoints.cs:15` registers only the bare `/explain` route: +```csharp +var group = app.MapGroup("/explain").WithTags("Explain"); +group.MapPost("", Explain).WithName("Explain").RequireRateLimiting("explain"); +``` + +Compare with `TranslationEndpoints.cs:13–19`: +```csharp +var group = app.MapGroup("/api/translate").WithTags("Translation"); +group.MapPost("", Translate).WithName("Translate").RequireRateLimiting("translate"); +group.MapGet("/languages", GetLanguages).WithName("GetTranslationLanguages"); + +// Also map without /api/ prefix for nginx compatibility +app.MapPost("/translate", Translate).WithTags("Translation").WithName("TranslateCompat").RequireRateLimiting("translate"); +``` + +Translation has dual-registration and a dedicated nginx location at `infra/nginx/textstack.conf:189`. Explain has neither. Likely a regression where Explain wasn't migrated when the dual-registration pattern was added, and a build of the frontend went out without `VITE_API_URL=/api` (or with it and nginx didn't have the explicit location to make the catchall work as expected). + +### Fix + +**Backend `ExplainEndpoints.cs`:** + +Mirror `TranslationEndpoints.cs:13–19` exactly. Register Explain at both `/api/explain` and `/explain`: + +```csharp +public static void MapExplainEndpoints(this WebApplication app) +{ + var group = app.MapGroup("/api/explain").WithTags("Explain"); + group.MapPost("", Explain).WithName("Explain").RequireRateLimiting("explain"); + + // Also map without /api/ prefix for nginx compatibility + app.MapPost("/explain", Explain).WithTags("Explain").WithName("ExplainCompat").RequireRateLimiting("explain"); +} +``` + +**Frontend `apps/web/src/api/explain.ts`:** + +Change the URL to be consistent with `translation.ts`: +```ts +const res = await fetch(`${API_BASE}/api/explain`, { ... }) +``` +Update the file's leading comment to match — the "Don't add `/api/` here" warning is misleading now. + +**Nginx `infra/nginx/textstack.conf`:** + +Optional but consistent — add a dedicated location with explain rate limit, mirroring `/api/translate` block at line 189: +```nginx +location /api/explain { + limit_req zone=explain_limit burst=2 nodelay; + proxy_pass http://textstack_api/explain; + # ... copy headers from /api/translate block +} +``` +Add `limit_req_zone ... zone=explain_limit:10m rate=20r/m;` near the other zones at the top of the file. + +### Verification + +After fix, on textstack.app: +1. Open any user-uploaded book → reader → select a sentence → click 💡 → expect 200 with 2-3 sentence explanation in target language. +2. Repeat on a curated library book → expect same behavior. +3. Hit `/api/explain` and `/explain` directly with curl — both should accept POST and return identical results. + +## Bug 3 — Broken title `(for )` on book detail page + +### Observed +Book detail page header reads **"Designing Data-Intensive Applications (for )"** with empty parentheses. Visible on user-uploaded DDIA (URL pattern `/library/my/{id}/`). Reader header carries the same broken title forward. + +### Suspected root cause +A template like `"{title} (for {targetLanguage})"` or `"{title} (for {audience})"` with empty interpolation when the field is missing. Search for the literal `"(for "` or `(for {` in `apps/web/src/pages/` (likely `BookDetailPage.tsx` or user-book equivalent — given URL `/library/my/...` it's the user-book detail page, possibly `apps/web/src/pages/UserBookDetailPage.tsx` or similar). + +### Fix +Conditional render: if the interpolated field is empty/null, omit the `(for )` segment entirely. Don't render an empty placeholder. + +### Verification +Open `https://textstack.app/en/library/my/{any-user-book-id}/` — title should be `Designing Data-Intensive Applications` with no trailing parenthetical when the relevant field is empty. + +## Verification — overall + +After Bug 1 and Bug 2 fixes deploy, on textstack.app, with target Russian: + +| Word | Book | Expected | +|------|------|----------| +| polling | DDIA, distributed systems chapter | `опрос (периодический запрос к серверу)` or similar | +| warehouse | DDIA, ETL chapter | `хранилище (хранилище данных)` or equivalent gloss | +| attention | any ML book | `внимание (механизм в нейросетях)` | +| eventual consistency | DDIA | `конечная согласованность` | +| polling | a poll-related news article (different genre) | `опросы` (everyday meaning preserved when domain doesn't suggest otherwise) | + +The last row matters — make sure the domain hint *biases* the model but doesn't force technical meaning when the genre is wrong (e.g. if a user uploads a book of poetry, "warehouse" should still mean storage building). + +For Bug 2: `/api/explain` and `/explain` both accept POST on textstack.app, both return identical results, neither 404s on user-uploaded books. + +For Bug 3: book detail page never renders an empty `(for )` parenthetical. + +## Out of scope + +- Replacing Free Dictionary API entirely — keep the dictionary popup separate, it's the secondary fallback. Bug 1 fix only addresses the LLM-translation half of the popup. +- Translation latency / cost — `gpt-5-mini` handles word-level translations cheaply; no architecture change. +- The `Words read=0 / Sessions=7 / Reading time=5m` stats inconsistency on the same page (separate ticket — file separately). diff --git a/docs/fixes/x-profile-banner.md b/docs/fixes/x-profile-banner.md new file mode 100644 index 00000000..7808fa89 --- /dev/null +++ b/docs/fixes/x-profile-banner.md @@ -0,0 +1,120 @@ +# Task: generate subtle X profile banner + +Generate a 1500×500 PNG banner for X (Twitter) profile @Rexetdeus. Output: `docs/assets/x-banner.png`. + +## Constraints + +- **Dimensions: exactly 1500 × 500 px** (X banner aspect, displays full on most screens) +- **Filesize:** under 500 KB (X has a 2 MB limit but smaller renders faster) +- **NOT promo:** no big logos, no GitHub URL, no "TextStack" plastered across. The banner should signal "thoughtful dev with taste" — not "buy my product". +- Tone: understated, dev-aesthetic, the kind of banner a senior engineer with a quiet brand would have. + +## Aesthetic — terminal/editor with a single-line comment + +Dark muted background, monospace text, faded code-comment color. Like glancing at an editor where someone left one self-aware line of code. + +### Visual spec + +| Element | Spec | +|---|---| +| Background | Solid `#0d1117` (GitHub dark theme bg) OR subtle vertical gradient from `#0d1117` to `#161b22` | +| Text content (pick one — see options below) | A single code-comment line | +| Font | JetBrains Mono if available (`apt-get install fonts-jetbrains-mono` or download from JetBrains site). Fallback: `DejaVu Sans Mono`, then any monospace | +| Font size | 28–32 px (subtle but legible on most viewports) | +| Text color | `#6e7681` (GitHub comment color — muted, not screaming) | +| Text position | Left-aligned with 90 px left margin, vertically centered | +| Optional decoration | A faint blinking-cursor-style block `▍` at end in slightly brighter `#8b949e`. Just one character. | +| Padding around text | Generous — empty space carries the design | + +### Tagline options — pick the best one (or generate variants and let user choose) + +```python +TAGLINE_OPTIONS = [ + "// reading, in production.", # plays on "X in production" trope + TextStack + "// notes on books I keep quitting.", # self-deprecating, honest + "// DDIA, attempt #4. shipping the reader that fixes it.", # specific story + "// some books read you back.", # introspective, no product mention + "// .NET. React Native. Books I never finish.", # 3-element stack signature +] +``` + +**Default recommendation:** `// reading, in production.` — it ties to the Gemma 4 post-mortem article (LLM in production) while staying ambiguous enough to NOT read as promo. Devs who land on the profile get the wink; others see a stylized tagline. + +If unsure, generate ALL FIVE as separate files (`docs/assets/x-banner-v1.png` … `v5.png`) and let user pick. + +## Implementation + +Use Python with Pillow (PIL). Suggested approach: + +```python +from PIL import Image, ImageDraw, ImageFont + +WIDTH, HEIGHT = 1500, 500 +BG = "#0d1117" +TEXT_COLOR = "#6e7681" +CURSOR_COLOR = "#8b949e" +TAGLINE = "// reading, in production." + +img = Image.new("RGB", (WIDTH, HEIGHT), BG) +draw = ImageDraw.Draw(img) + +# Try preferred fonts in order +font_candidates = [ + "/usr/share/fonts/truetype/jetbrains-mono/JetBrainsMono-Regular.ttf", + "/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf", + # ... fall back to whatever ImageFont.load_default() +] +font = None +for path in font_candidates: + try: + font = ImageFont.truetype(path, 30) + break + except Exception: + continue +if font is None: + font = ImageFont.load_default() + +# Vertical center +bbox = draw.textbbox((0, 0), TAGLINE, font=font) +text_h = bbox[3] - bbox[1] +y = (HEIGHT - text_h) // 2 + +draw.text((90, y), TAGLINE, fill=TEXT_COLOR, font=font) + +# Optional cursor block at end +text_w = bbox[2] - bbox[0] +draw.text((90 + text_w + 14, y), "▍", fill=CURSOR_COLOR, font=font) + +img.save("docs/assets/x-banner.png", optimize=True) +``` + +Adjust as needed if Pillow isn't available — use `cairosvg` or generate SVG and rasterize. Whatever works. + +## Verification + +After generation: + +1. Open the PNG and confirm: + - Dimensions exactly 1500×500 + - Filesize < 500 KB + - Text is legible but quiet (NOT screaming) + - No accidental TextStack/GitHub/promo branding +2. Render at 760×254 (X displays roughly half-size on most viewports) — text should still be readable. +3. If multiple variants generated, save all and list them in the commit message so user can choose by filename. + +## Out of scope + +- Don't add the TextStack logo, name, or URL to the banner. +- Don't use bright/saturated colors — the whole palette should feel like a dev tool dark theme. +- Don't put images of books, screenshots, or product visuals. Pure typographic banner. +- Don't generate multiple aspect ratios — X only needs 1500×500 for the profile banner. + +## Commit message suggestion + +``` +chore(marketing): add subtle X profile banner + +1500x500 PNG, JetBrains-Mono code-comment aesthetic. +Tagline: "// reading, in production." +No product branding — banner reads as quiet dev profile, not promo. +``` diff --git a/docs/marketing/campaign-tracker.md b/docs/marketing/campaign-tracker.md new file mode 100644 index 00000000..64b99c1b --- /dev/null +++ b/docs/marketing/campaign-tracker.md @@ -0,0 +1,356 @@ +# TextStack Marketing Campaign — Live Tracker + +Last updated: 2026-05-21 (Thursday, post-routine + posting session) + +## Daily X routine log + +| Date | Candidates drafted | Posted | Notes | +|---|---|---|---| +| 2026-05-12 | 2 | 1 posted | Manual first-run trigger. **Posted reply on @simonw "30GB Mac memory" post (64.5K views, 540 likes)** — 30GB number tie-in with our gemma4:e2b production deployment. Feed quality issue noted (Ferarri Prime growth-hack reposts). File: `docs/marketing/x-routine/2026-05-12.md` | +| 2026-05-13 | 3 fresh + 1 continued | pending | Following feed still dominated by mutual-follow bait; pivoted to live search + tribe scan. Top pick: **@im_yeyito** "llama.cpp eval path / vibes vs data" (direct ask for the prod numbers we have). Other candidates: @VladimirVivien (Gemma 4 2B CPU), @dotnet (Agent Framework 1.0 MCP). Continued: @PaulChen088 on Synthadoc. File: `docs/marketing/x-routine/2026-05-13.md` | +| 2026-05-14 | 3 fresh + 1 continued | **4 posted** | Following feed *still* bait-dominated (3rd session). Live search again carried the load. **All 4 fresh candidates posted with user approval:** @MozillaAI (Gemma 4 / llamafile — included TextStack prod-numbers mention), @TheWordWeaver_ (framer-motion ESM debug fix), @RahulGangwani24 (Ollama latency counter-perspective), @asiokun3 (JP, base_url swap gotcha). Paul Chen continued conversation deferred (yesterday's draft handles it). Account at 11 followers (+9 since baseline). File: `docs/marketing/x-routine/2026-05-14.md` | +| 2026-05-15 | 4 fresh + 1 continued (3rd carry-over) | **3 posted, 1 blocked** | Following feed bait-dominated (4th session). Live search + tribe scan (@simonw, @theo, @karpathy, @arvidkahl) carried. **Posted with user approval:** @MoureDev (Local AI workshop — TextStack prod-numbers mention), @aterrel (Spark DGX inference-stack question), @ollies0x (3090 vs cloud + CPU-VPS third path). **@swyx skipped** — post had "Only some accounts can reply" restriction; also chart on inspection showed $445M ARR now, so the drafted $15B EOY guess was off by ~10x — would have looked like we didn't read the chart. Lesson: always inspect attached charts before drafting a number-specific reply. Paul Chen on 3rd carry-over — still pending decision. New tribe watchlist adds: @mudler_it (LocalAI maintainer), @MoureDev (verified, EN/ES). File: `docs/marketing/x-routine/2026-05-15.md` | +| 2026-05-18 | 5 fresh + 1 continued (4th carry-over) | pending | First Monday session after weekend skip. Following feed loaded after Retry; mix of @levelsio social-commentary + crypto/CoinMarketCap noise — live search (`"local LLM" OR ollama OR gemma`) carried the harvest. **Top pick:** @cwwhitehead asking @tobi "Why qwen and not Gemma 4?" — direct hit for TextStack qwen→gemma4 migration numbers (carries the daily prod-numbers mention). Other candidates: @tallhamn (reply to @antirez on local-AI middleware glue), @levelsio (3D terrain on Hoodmaps — technical implementation question), @dotnet (async patterns — ConfigureAwait take), @VitalikButerin (AI + formal verification spec gap). Paul Chen on 4th carry-over — still unattended. Account at 10 followers (+8 since baseline, flat vs last session). New tribe watchlist adds: @antirez (Tier E), @cwwhitehead (Tier B), @tallhamn (Tier C). File: `docs/marketing/x-routine/2026-05-18.md` | +| 2026-05-19 | 4 fresh + 1 continued (5th carry-over) | pending | Following feed still crypto-heavy (CoinMarketCap, Bitcoin posts dominated top of feed) — live search (`ollama lang:en min_faves:5` and `"local LLM" OR gemma OR ollama OR "Claude Code"`) carried the harvest again. **Top pick:** @levelsio "ask Claude Code to audit your devices" — fresh (~1h), Tier A, high reply visibility. **TextStack prod-numbers mention:** @DAlistarh's new GSQ quantization paper (CPU-only deployment angle). Other candidates: @imikerussell (Claude Code → Home Assistant scenes loop, fresh thread), @JulianGoldieSEO (Local Agent Stack — Ollama glue debugging). Paul Chen on 5th carry-over — recommended skip (stale). Account at 8 followers (-2 vs last week — possibly unfollow churn; baseline +6). Suggestion in file: unfollow CoinMarketCap and Bitcoin to declutter Following tab, or move high-signal accounts to a List. File: `docs/marketing/x-routine/2026-05-19.md` | +| 2026-05-20 | 4 fresh + 3 continued | pending | 6th session. Following tab still stale (Elon 22h / May-18 reposts / Karpathy 21h on top, nothing in the 1–3h window) — live search + direct tribe-profile scans (@simonw, @theo, @swyx, @levelsio, @arvidkahl) carried again. **Top pick:** @theo (~3h) on Gemini 3.5 Flash shipping a Jan-2025 knowledge cutoff — freshest, biggest dev story of the day. Other candidates: @simonw (Gemini 3.5 Flash 3x pricing), @arvidkahl (GitHub internal-repo breach / supply-chain, Tier A). **TextStack prod-numbers mention:** @DivyanshT91162 "local LLM needs a GPU" myth — caveat flagged in file: news-farm account, optional/skippable. **3 continued conversations** — first reciprocity pass on the May 14–15 replies: @TheWordWeaver_ ("thanks"), @ollies0x (substantive GPU-vs-VPS counter), @RahulGangwani24 (answered the Ollama-usage question) all replied back, all external. Paul Chen: still skip (stale, 6th carry-over, promo + link). Account at **6 followers (−2 vs May 19)** — churn continuing. File: `docs/marketing/x-routine/2026-05-20.md` | +| 2026-05-21 | 4 fresh + 0 new continued (3 carry-overs) | **3 posted, 1 skipped** | 7th session. Following tab still stale (Elon 4h promo / 14–23h reposts, nothing in the 1–3h window) — live search carried again; @simonw scan confirmed tribe quiet (21h). **Posted with user approval:** @faradaymachines (Chrome Canary browser-level local inference), @noguchis (local-LLM-as-pre-screen-gate thread — carried the TextStack prod-numbers mention), @hiouso (build-in-public founder-metrics tool). **@NikkiSiapno skipped** — passed the search-preview check but the full post was a paid-partnership ad (#AtlassianPartner #Ad); user chose skip (no pure promo). Lesson #7 logged. No new reciprocity — @ollies0x / @TheWordWeaver_ / @RahulGangwani24 continued drafts from May 20 still pending; Paul Chen 7th carry-over, skip. **First posting since May 15** — user flagged the drafting-without-posting gap: May 18–21 drafts all went unposted and the account decayed 11→6; posting reconnected today as the fix. Account at 6 followers. File: `docs/marketing/x-routine/2026-05-21.md` | + +## Quick status + +| Metric | Value | Goal | +|---|---|---| +| External GitHub stars | **0** | 10+ | +| Total channels active | 5 | — | +| Open feedback loops | 4 | — | + +--- + +## Channels + +### 🟢 X / Twitter — `@Rexetdeus` + +| Action | URL / Detail | Status | Metrics | +|---|---|---|---| +| Standalone post (DDIA + GIF) | [status/2053615432037257506](https://x.com/Rexetdeus/status/2053615432037257506) | ✅ Live, **pinned** to profile | 49 views, 2 reposts, 1 reply, 0 likes (last check ~13:00 May 11) | +| Reply-to-self (GitHub CTA) | Reply on standalone post | ✅ Live | 1 like, 2 views (early check) | +| Tier-1 reply to @1Umairshaikh | "What are you building this week" thread, 1.8K views | ✅ Live | Reply count 56→57 confirmed | +| Engagement reply to @nabuhad (Inkett) | Smart question, no self-promo | ✅ Live | Deposit goodwill — Nabil reply pending | + +**Next action:** Monitor for replies. Don't add more replies in 24h (algo spam risk). + +--- + +### 🟡 GitHub — `mrviduus/textstack` + +| Item | Status | +|---|---| +| Repo description | Updated by Claude Code (positions for devs/AGPL/local-LLM audience) | +| Topics | react, open-source, postgres, react-native, dotnet, self-hosted, reading, spaced-repetition, agpl, epub, srs, aspnet-core, fb2, book-reader, ai-engineering, learning-tools, llm, kindle-alternative, pdf, expo | +| README hero image | New `docs/assets/hero.png` — product screenshot of Explain popup on ETL in DDIA | +| README demo gif | `docs/demo.gif` — 178 KB, cropped, no browser chrome | +| **Total stars** | **4** (all self: mrviduus, r3xetdeus-bot, gl1tchmary, vasylvd) | +| **External stars** | **0** | + +**Stargazers (all self, baseline reference):** +- mrviduus (Pinnacle) +- r3xetdeus-bot (joined Apr 18, 2026) +- gl1tchmary (joined May 3, 2026) +- vasylvd (joined May 23, 2023) + +**Next action:** Watch [stargazers page](https://github.com/mrviduus/textstack/stargazers) for new external entries. + +--- + +### 🟢 Reddit — `r/SideProject` + +| Field | Value | +|---|---| +| Post URL | reddit.com/r/SideProject/comments/1ta9w9l/i_quit_designing_dataintensive_applications_three/ | +| Author | r3xetdeus | +| Posted | 2026-05-11 ~13:15 | +| Sub size | 364K weekly visitors | +| Status | ✅ Live (manual post by user) | +| Metrics | Pending — first feedback usually 30-60min after post | + +**Next action:** Check upvotes/comments at +1h, +3h, +24h. + +--- + +### 🔴 Hacker News — `viduus` + +| Item | Status | +|---|---| +| Account created | 2026-05-10 | +| Profile | About + email filled — humanized | +| Karma | 1 (started 1, no growth) | +| Comment on Idempotency thread | ⚠️ **[flagged]** — invisible to default view | +| Comment on Ask HN "What are you working on" | ⚠️ **[flagged]** — invisible to default view | +| Email to `hn@ycombinator.com` (dang) | ✅ Sent 2026-05-10 evening — request to review/unflag | + +**Failure mode:** Karma=1 new account + product link in 1st comment + 2nd comment on front-page thread within 30 min → HN anti-spam fired. + +**Next action:** Wait for dang reply (6-24h typical). If unflagged on Idempotency comment — viduus gains legitimacy, can continue slow karma build. If silent ≥48h — abandon viduus, move on without HN this round. + +--- + +### 🟡 Indie Hackers — (account exists) + +| Item | Status | +|---|---| +| Account login | ✅ Active | +| Posting privileges | 🛑 **Gated** — must build "authentic contribution pattern" through comments, OR pay for IH Plus | +| Comment 1 — Manish Bhusal "0 paying customers" thread | ✅ Posted 2026-05-11 ~14:00 | +| Comment ID | `-OsN6Rm1U_iTlfvjGTyW` | + +**Strategy:** 1-2 thoughtful comments per day for 1-2 weeks → IH mods grant posting privilege. Then post TextStack as its own thread. + +**Next action:** Tomorrow add one more substantive comment on a different IH thread (different topic). Avoid posting >2 comments same day (spam-pattern risk). + +--- + +### ⚪ dev.to — `mrviduus` + +| Item | Status | +|---|---| +| Existing article | "I quit Designing Data-Intensive Applications three times. Here's what I build on the fourth" — already published | +| Follow-up article | ❌ Not yet drafted | + +**Next action:** Write `30 days later: shipping TextStack` follow-up (1500 words, DDIA continuation + tech stack + lessons learned). High passive leverage — dev.to articles live in Google index for months. + +--- + +### ⚪ LinkedIn + +| Item | Status | +|---|---| +| Profile post | ❌ Not yet drafted | + +**Next action:** Short (~300 words) professional-tone post about shipping TextStack. Warm audience (former colleagues, recruiters) — high conversion potential per impression. + +--- + +### ⚪ Personal DMs + +| Item | Status | +|---|---| +| List of dev contacts | ❌ Not assembled | +| Template | ❌ Not drafted | + +**Next action:** Identify 5-10 dev contacts (LinkedIn / Telegram / Slack / X DMs). 50%+ star conversion expected. Highest signal-per-effort channel. + +--- + +## Files created during campaign + +### Marketing assets +- `docs/demo.gif` (178 KB, cropped, README-canonical) +- `docs/assets/hero.png` (new product screenshot) +- `docs/assets/hero-old-classical.png` (backup of old hero) +- `docs/marketing/textstack-explain-short.mp4` (58 KB, 4.5s) +- `docs/marketing/textstack-explain-short.gif` (178 KB) +- `docs/marketing/textstack-explain-demo.mp4` (96 KB, 7.5s — longer for blog/dev.to) +- `docs/marketing/textstack-explain-demo.gif` (421 KB) +- `docs/marketing/textstack-demo.mp4` (older 31s version, kept for reference) +- `docs/marketing/textstack-demo.gif` +- `docs/marketing/textstack-srs-demo.mp4` (SRS-flashcard demo, alternative angle) +- `docs/marketing/textstack-srs-demo.gif` + +### Playbooks & strategy +- `docs/marketing/twitter-replies-playbook.md` — variants A-E + Tier-1 account list +- `docs/marketing/campaign-tracker.md` — this file + +### Bug fix briefs (for Claude Code) +- `docs/fixes/tap-on-word-and-explain.md` — domain-aware translation + Explain 404 + broken title +- `docs/fixes/explain-404.md` — focused brief (later: turned out Explain works, brief obsolete) +- `docs/fixes/readme-demo-gif.md` — README integration of new GIF + hero replacement + +--- + +## Bug fixes status (product side) + +| Bug | Status | Notes | +|---|---|---| +| **#1 Tap-on-word not domain-aware** | ⚠️ Code OK, runtime partial | All 4 files in chain wired correctly. Live result still `warehouse → almacén` without clarifier. Suspected cause: user-book `Genre` field is NULL in DB (Ollama BookMetadataGenerator didn't populate). Diagnostic SQL: `SELECT id, title, genre FROM user_books WHERE title ILIKE '%data-intensive%';` | +| **#2 Explain 404 on user books** | ✅ Resolved | Verified working in prod via user screenshot — was likely transient/selection-specific issue in initial test, not a real bug | +| **#3 Broken `(for )` title** | ✅ Fixed and deployed | Claude Code created `BookTitleCleaner` utility + 2 EF migrations to clean existing titles at source | + +--- + +## Action queue (priority order) + +### Today (May 11) — DONE, now in passive monitoring +- [x] X campaign (done yesterday — 5 actions) +- [x] HN email to dang (done yesterday) +- [x] Reddit r/SideProject post (done — user manual) +- [x] IH first comment on Manish/PostDew thread (done; comment ID `-OsN6Rm1U_iTlfvjGTyW`) +- [x] Verified Hackernoon writing access (logged in, no gate — "Import Story" feature available) +- [x] User polishing dev.to draft for tomorrow ("I shipped local LLM features two months ago. Production never ran them once." — Gemma 4 Challenge submission) +- [ ] **Passive monitor only — no new actions today:** + - Email inbox for dang reply + - GitHub stargazers + - X notifications + - Reddit r/SideProject votes + - IH comment reaction from Manish + +### Wave 2 — May 12 — STATUS: ~85% complete + +**Executed today:** + +| Action | Status | Note | +|---|---|---| +| dev.to article publish | ✅ Live | "I shipped local LLM features two months ago..." Gemma 4 Challenge submission. 6 reactions in first 10 min (❤️🎉🤯👏🔥). 1 bookmark. URL: dev.to/mrviduus/i-shipped-local-llm-features-two-months-ago-production-never-ran-them-once-41g7 | +| Hackernoon import + submit | ✅ Submitted | After URL import (which scraped dev.to chrome), user manually cleaned and submitted. 24-72h editorial review queue. Draft URL: app.hackernoon.com/mobile/6a0332e68f0929adca01637f | +| r/selfhosted post | ✅ Submitted (user manual) | Title: "TextStack — open-source reader for tech books with local LLM features. AGPL-3.0, docker compose up, no GPU." | +| X bio updated | ✅ Live | "Building TextStack — open-source reader for tech books you keep quitting (DDIA broke me 3 times). .NET/JS/RN. Indie ship logs + LLM in prod." | +| X follow batch — 21 new follows | ✅ Done | Tier A-E mix (levelsio, swyx, karpathy, sindresorhus, dan_abramov, etc.). Following went 77 → 92. | +| X reply on @theo (security psychosis) | ✅ Live | Post grew 1.6K → 3.6K views during session | +| X reply on @masondrxy (Gemma 4 / Ollama / Deep Agents) | ✅ Live | Reposted by Harrison Chase/LangChain. Perfect topical match. | + +**Blocked:** + +| Channel | Block reason | +|---|---| +| r/LocalLLaMA | Sub-specific karma gate (0 karma in sub) | +| Indie Hackers post | Same karma gate as HN viduus | +| HN regular submit | Pending decision — viduus is shadow-flagged, no other account ready | + +**Not done (skipped for today):** + +- LinkedIn announcement (5-10 min, warm audience, ~1-3 stars) +- Personal DMs (10-15 contacts, the GUARANTEED floor — 5-10 stars) +- X thread (5 tweets — infrastructure not driver with 2 followers) +- Hashnode cross-post (5 min copy-paste) +- GitHub Topic / awesome-* submissions + +### Monitor schedule + +**Tonight / overnight (May 12 → May 13):** + +| Channel | What to check | +|---|---| +| github.com/mrviduus/textstack/stargazers | New external stars overnight | +| dev.to article | Reactions count, comments, views (Stats tab on article) | +| reddit.com/r/selfhosted/{post-id} | Upvotes, comments | +| x.com/Rexetdeus | Notifications (follow-backs from 21 follows, replies on Theo/Mason) | +| Inbox `mrviduus@gmail.com` | dang reply about HN unflag | + +**Tomorrow morning (May 13):** + +- If Hackernoon approved overnight: their newsletter distribution kicks in → potential burst +- Reddit r/selfhosted: 24h mark = visibility cliff (post moves out of /new into archive) +- dev.to article: organic Google indexing starts to matter + +### Realistic 7-day expectation (revised based on actuals) + +| Outcome | Probability | +|---|---| +| 0 external stars in 7 days | ~10% (would mean total Wave 2 failure) | +| 1–5 external stars | ~40% (median outcome — slow accrual from dev.to + Reddit) | +| 5–15 external stars | ~30% (if Hackernoon approves + Reddit catches some upvotes) | +| 15–50 external stars | ~15% (if one channel meaningfully lands) | +| 50+ external stars | ~5% (would require article hitting Google trending, Reddit /hot, or newsletter pickup) | + +**Median estimate: 5-10 external stars by May 19.** + +### Key learnings — Wave 2 + +1. **DEV Challenge submission has visibility boost** — challenge tag surfaces article in dedicated feed +2. **Hackernoon URL-import scrapes page chrome** — should use "Blank Draft With Editor 3.0" + manual markdown paste instead (or accept editor cleanup) +3. **r/LocalLLaMA has sub-specific karma gate** — like HN, requires patience for proper submission +4. **dev.to API gives clean markdown** via `/api/articles/{username}/{slug}` — useful for cross-platform repurposing +5. **X reply game is real infrastructure** — Theo and Mason replies were appropriate, substantive, didn't spam-flag despite low follower count +6. **DM list still unexecuted** — this remains the single biggest reliable lever for stars but requires user's contact knowledge + +--- + +### Tomorrow (May 13+) — Wave 3 candidates + +**High-priority if no traction:** +- [ ] **Personal DMs** to 10-15 dev contacts — single most controllable channel +- [ ] **LinkedIn announcement** post (~300 words, link to dev.to) +- [ ] **Hashnode cross-post** of dev.to article (canonical = dev.to) +- [ ] **X thread** for build-in-public credit + retargeting material + +**Medium-priority:** +- [ ] Continue HN unflag waiting for dang (24-48h more max) +- [ ] r/LocalLLaMA karma build (1 thoughtful comment per day, no product link) +- [ ] r/programming submit (risky, strict mods) +- [ ] r/devops submit (different angle: production deployment story) +- [ ] Email pitch to Bytes / JavaScript Weekly / TLDR Tech newsletters + +**Low-priority:** +- [ ] GitHub Topic submissions (awesome-readers, awesome-llm-apps, awesome-selfhosted) +- [ ] Prepare GitHub Copilot Challenge submission (when launches) +- [ ] Prepare Google I/O 2026 Writing Challenge submission (when launches) + +### Tomorrow (May 12) — Wave 2 distribution of Gemma 4 article + +**Sequence (timezone Toronto = ET):** + +| Time | Action | +|---|---| +| 08:00 | Publish dev.to article from preview | +| 08:15 | Hackernoon "Import Story" — paste dev.to URL, set canonical, submit for editorial review (24-72h queue) | +| 08:30 | X announcement thread: main tweet + 3-5 follow-ups with key insights (3GB/30GB RAM, 63k requests, $0.002, p95=20.5ms, e4b→e2b swap, six bugs) | +| 09:00 | Reddit r/LocalLLaMA submit — Gemma 4 angle (~180K subs, prime audience) | +| 09:30 | HN submit as REGULAR link (not Show HN) — title from dev.to, URL points to dev.to article. **NOT from viduus** — use older account if available, OR ask friend with HN karma, OR skip HN. | +| 10:00 | LinkedIn short announcement post linking dev.to | +| Optional | Cross-post to r/MachineLearning + r/SideProject with different angles | + +**Why dev.to article is positioned strongly:** +1. Confession hook ("shipped features, prod never ran them") — dev community always reads +2. Concrete numbers in every paragraph (RAM, requests, latency, cost) — HN/dev.to love specifics +3. Story arc with two model swaps + six bugs — proper journey +4. TextStack as setting, NOT subject — frames as tech post-mortem, not product pitch +5. Gemma 4 Challenge submission — dev.to boosts challenge entries + +### This week (after Wave 2) +- [ ] 5-7 more IH comments (1-2 per day, building toward posting privilege) +- [ ] Personal DM outreach (5-10 dev contacts, highest-conversion channel) +- [ ] If dang unflags viduus: slow karma build with non-product comments +- [ ] Submit repo to GitHub Topic lists / awesome-lists (awesome-readers, awesome-llm-apps, awesome-selfhosted) +- [ ] Cover image for Hackernoon article (styled RAM graph or production logs screenshot — Hackernoon prominently displays it) + +### This week +- [ ] dev.to follow-up article (1.5-2h work, write + publish) +- [ ] LinkedIn post (30min) +- [ ] 5-7 more IH comments (1-2 per day, different threads, building karma) +- [ ] Personal DM outreach (5-10 contacts) +- [ ] If dang unflags HN: continue slow karma build on viduus, no product links + +### Next week+ +- [ ] If IH posting unlocked: submit TextStack as proper IH post +- [ ] dev.to second article (different angle, e.g., technical deep-dive on Explain feature) +- [ ] Lobsters (if invite obtainable) +- [ ] Reddit secondary subs: r/selfhosted, r/opensource (different angles per sub) +- [ ] Show HN attempt (only if viduus has 20+ karma by then, ideally 50+) + +--- + +## Realistic outcome expectations + +| Timeframe | Expected external stars | +|---|---| +| End of day 1 (today) | 0–3 | +| End of week 1 | 5–15 | +| End of month 1 | 20–50 (if consistent activity) | +| End of month 3 | 100+ (if dev.to article ranks in Google + 1-2 Show HN attempts land) | + +Star count is a lagging signal. Better leading indicators: +- Repo clones (visible in `Insights → Traffic`) +- textstack.app analytics (if instrumented) +- Replies/comments on our posts (engagement) +- Followers gained on @Rexetdeus + +--- + +## Key learnings logged + +1. **HN karma=1 + product link + multiple-comments-same-hour = auto-flag.** Should have waited 6-12 hours between comments and avoided product link in first one. +2. **IH gates posting** through manual mod review — different from HN, requires "authentic contribution pattern" demonstrated through comments. +3. **Reddit r/SideProject is the most accessible big channel** — designed for self-promo, no significant gating beyond avoiding spam patterns. +4. **X reach for new pinned post** is small without paid promotion or established following — ~50 views per post baseline for ~2-follower account. +5. **GIF/visual quality matters** — cropping out browser chrome + dock made the asset look professional vs amateur recording. +6. **The DDIA hook is strong** — multiple platforms responded to it positively; keep this as primary narrative anchor. +7. **Paid-partnership / sponsored posts must be filtered out before drafting.** On 2026-05-21 a candidate (@NikkiSiapno) passed the search-preview check, but the full post carried a "Paid partnership" label + #Ad — pure promo by definition, off-strategy to reply under. Always open the full post and check for an #Ad / "Paid partnership" marker before drafting; a search-result excerpt can hide it. +8. **Drafting without posting produces zero growth.** Followers tracked 2 → 11 (after the May 14–15 posting burst of 8 replies) → decayed to 6 across May 18–21 when drafts were generated but never posted. The reply-game only compounds if replies actually go out; a drafts file alone is wasted effort. diff --git a/docs/marketing/ih-launch-draft.md b/docs/marketing/ih-launch-draft.md new file mode 100644 index 00000000..e527ae62 --- /dev/null +++ b/docs/marketing/ih-launch-draft.md @@ -0,0 +1,151 @@ +# Indie Hackers — First Starting Up Post + +**Status**: FINAL DRAFT v2 — optimized for max reach (sharp hook, scannable, concrete) +**Channel**: Starting Up section on indiehackers.com +**Goal**: первый пост от @textstack, founder story, drive engagement + visibility +**Tone**: honest, builder voice, no marketing, no fabricated numbers + +--- + +## Pre-publish checklist + +- [ ] Прочитать вслух — звучит как ты, не как маркетолог? +- [ ] Цифры верифицированы: 25 users / 9 returning / 32m 36s avg / 44 GSC clicks +- [ ] No fabricated данные (нет SRS retention %, нет "users upload 2-3 books") +- [ ] Опубликовать Tue/Wed/Thu 8-11am EST +- [ ] Готов первые 2 часа отвечать на комменты + +--- + +## Title (final) + +# I built software to read one book. + +**Backup options** (если первый не нравится): +- I gave up on DDIA three times. So I built a reader to finish it. +- Six months of code to finish one technical book + +**Recommendation**: первый — самый сильный hook на ленте IH. Curiosity-driven, ничего не требует знать заранее. В первых 2 строчках IH preview виден весь интригующий setup. + +--- + +## Body (final, ~390 words) + +I gave up on *Designing Data-Intensive Applications* three times. + +The third time, I built software to finish it. Six months later, that software is TextStack — open source, AGPL-3.0, free at textstack.app. + +--- + +**The friction** + +The problem wasn't the math. It was vocabulary. + +Page 256 of DDIA uses "phantom" as a database isolation anomaly. The dictionary tells me it's a ghost. Google tells me it's a Rolls-Royce model. Kindle's Word Wise — same. + +Every chapter has 5-10 words like that. Each lookup breaks the thread. I gave up on chapter 7 three times. + +--- + +**What I built** + +A reader that knows what book it's reading. + +Tap any word, get a 2-3 sentence explanation in the book's domain. "Phantom" in DDIA returns the database meaning, not the ghost. + +The rest: + +- Upload EPUB/PDF/FB2 — your own books +- Vocabulary SRS with 5 stages (Recognition → Recall → Context → Mastered) +- Edge TTS audio, no API key needed +- Translation via OpenAI, dictionary, full-text search + +Stack: .NET 10 + PostgreSQL backend, React + React Native frontend. Self-host with `docker compose up`, or try at textstack.app without signup. + +--- + +**Three weeks of clean data** + +- **25 unique users.** 19 new, 9 returning. +- **32 minutes** average engagement time per user. +- **8.2 sessions** per active user. +- **44 Google clicks** in 3 months (broader trajectory). + +Most of those 25 are people I told directly. The 9 organic strangers are scattered: US, Ireland, Pakistan, Colombia. Tiny audience, but the engagement says the ones who find it actually read. + +The hard part: my audience — non-native English speakers reading technical books in English — is real but globally distributed. They're not concentrated in one subreddit or one country. + +--- + +**Two questions** + +1. If you've ever quit a technical book — what was the friction? Was it vocabulary like me, or something else I'm missing? + +2. How did you find your first 100 real users when your audience isn't in one place? Open to anything that worked. + +--- + +github.com/mrviduus/textstack — happy to dig into any technical decisions in the comments. + +--- + +## Why this version works better than v1 + +| Element | v1 | v2 | +|---------|----|----| +| Opening | "I'm Vasyl. For the last six months..." (warmup) | "I gave up on DDIA three times." (hook) | +| Friction example | "5-10 terms with domain-specific meanings" (abstract) | "Page 256, 'phantom'..." (concrete) | +| Metrics format | Bulleted list mid-paragraph | Pull-quote block with bold numbers | +| Paragraph length | 4-6 lines (wall on mobile) | 1-3 lines (scannable) | +| Subheadings | None | "The friction" / "What I built" / "Three weeks of data" / "Two questions" | +| Closing | List of features + questions | One line + GitHub link | +| Word count | 430 | 390 | + +The story arc is the same — failure → root cause → built solution → honest metrics → ask community. But every paragraph fights for the reader's next 5 seconds of attention. + +--- + +## Длина и формат + +~390 слов. Это оптимальная длина для IH Starting Up posts — короче 250 выглядит лениво, длиннее 600 не дочитывают. + +Subheadings (bold, single phrase) разбивают пост на 4 ясных секции — каждую видно на screen view даже не скроллируя. + +Pull-quote блок с метриками — самая важная часть для scanners. Кто пробегает глазами — видит цифры. Кто читает — видит контекст. + +--- + +## Когда публиковать + +- **Best**: Tuesday, Wednesday, Thursday 8-11am EST +- Для Eastern Canada (твой часовой пояс) это 8-11am локально +- **Avoid**: weekends (посты теряются), monday morning (founders ещё разбирают почту), late afternoon EST (поздний US, спящая Европа, Азия спит) +- **Avoid**: дни больших announcements (большие YC демо, Stripe Press launch и т.п.) + +--- + +## Что делать первые 2 часа после публикации + +1. Каждые 10-15 минут проверять комменты — отвечать substantive в течение 15-30 минут +2. На каждый комментарий отвечать с **вопросом обратно** ("What about X in your case?") — это удваивает депость threads +3. Не "Thanks for the comment!" — это спам и алгоритм это видит +4. Если кто-то критикует — спроси follow-up, не защищайся. "What would have made it work for you?" гасит конфликт. +5. Не упоминать в комментариях Twitter handle или другие projects — выглядит как self-promo carousel + +## После 2-3 часов когда уже есть engagement + +- Cross-post в Twitter с одной строкой: "Wrote about why I built TextStack on Indie Hackers — would love feedback" + link +- НЕ запрашивать апвоты — IH модерация banит за это +- НЕ постить в нескольких subreddit-ах с тем же текстом — выглядит как dropping + +## Через 24-48 часов + +- **Engagement хороший** (10+ comments, 20+ upvotes) — это валидация, через 3-4 недели можно повторно постить с другим angle (milestone post через месяц) +- **Тишина** — это тоже data. Возможно timing был плохой или title не зацепил. Не паника, не удалять. Через 4-6 недель попробовать снова с другим hook. + +## Чего не делать после поста + +- Не постить второй пост в IH в течение недели — выглядит как спам +- Не отвечать на критику оборонительно — "Actually if you read more carefully..." убивает goodwill +- Не давать промокоды/discount — продукт бесплатный, не надо +- Не упоминать "btw also check out my @..." в комментариях — плохая форма на IH diff --git a/docs/marketing/ih-warmup-comments.md b/docs/marketing/ih-warmup-comments.md new file mode 100644 index 00000000..b98baf8b --- /dev/null +++ b/docs/marketing/ih-warmup-comments.md @@ -0,0 +1,172 @@ +# IH Warm-up Comments + +Цель: получить posting access на IH через substantive comments. Модераторы вручную грантят его аккаунтам которые показывают authentic contribution. Срок обычно 5-14 дней при стабильной активности. + +## Правила безопасности (важно — чтобы не забанили) + +1. **Никаких ссылок на textstack.app** в комментах. Вообще. Никогда. +2. **TextStack упоминать минимально**: только если прямо по теме поста, и только мимоходом без call-to-action. +3. **Никаких "Great post!" / "Thanks for sharing"** — модераторы это видят как spam-индикатор и грантят доступ медленнее или не грантят вообще. +4. **80-150 слов на коммент** — достаточно substantial, но не назойливо. +5. **1-2 коммента в день максимум** — НЕ 5 за один присест в один вечер. Это выглядит как фарминг. +6. **Только на посты которые тебе реально интересны** — это видно по тону. +7. **Не давать совет с высоты опыта если у тебя его нет** — на IH быстро ловят posers. +8. **Не упоминать в комменте свой Twitter, GitHub, ничего своего** — после grant можешь добавить, сейчас фокус на value to OP. + +## Cadence (рекомендую) + +- **Понедельник (сегодня)**: 1 коммент. Самый сильный — post #1. +- **Вторник**: 1 коммент. Post #2 или #3. +- **Среда**: 1-2 коммента. Из оставшихся. +- **Четверг**: 1 коммент. +- **Пятница**: 0-1 коммент. +- К выходным проверь grant — если есть, публикуй пост во вторник 26-го мая. + +--- + +## Comment #1 — READY TO POST + +**Post**: "The most embarrassing realization I had this week: Our startup was completely invisible to Google." by Russ & Ali (fixRAgent) + +**URL**: https://www.indiehackers.com/post/the-most-embarrassing-realization-i-had-this-week-our-startup-was-completely-invisible-to-google-c878bf461c + +**Context**: Non-technical founders launched fixRAgent, hit PH #44, ran Meta ads, then realized they were invisible on Google. They ask: "What is the most obvious tech-world standard practice that completely blindsided you when you first started?" + +**Why this is perfect for you**: ты прямо сейчас живёшь следующий уровень их урока — у тебя GSC был с дня 1, и всё равно 44 клика за 3 месяца. У тебя есть real insight, не теория. + +**Draft (copy-paste ready)**: + +``` +The follow-up rabbit hole gets weirder when you DO have GSC set up from day 1. I did — submitted sitemap, schema markup, the whole technical checklist before launch. Three months in: 44 clicks, average position 59 on most queries. + +The lesson that blindsided me wasn't "submit your sitemap", it was: indexed ≠ ranked. Getting Google to know you exist is the easy part. Getting Google to put you above 60 other sites that figured it out earlier is the actual work, and there's no clean checklist — backlinks, content depth, and time, where the first two are slow and the third is just slow. + +For non-technical founders specifically: build the muscle of reading the Performance report weekly. Impressions growing without clicks growing means your titles and descriptions are weak, which is fixable in an hour. That's the next layer past "just submit the sitemap". +``` + +~150 слов. Substantive, honest, не упоминает TextStack, дает прямой actionable совет для OP. + +--- + +## Comment #2 — TEMPLATE (need to read post first) + +**Post**: "I built a URL indexing SaaS in 40 days — here's the honest story" by @alex80 + +**Why relevant**: Прямо на тему indexing. Если он строит indexing-SaaS, у него есть мнение про что в Google Search Console сейчас сломано/работает. Можешь поделиться своим опытом с indexing patterns. + +**Прочитать первое**, потом написать коммент на основе: + +- Какие конкретные insights он раскрывает в посте? +- Есть ли в его подходе что-то с чем ты не согласен из собственного опыта? +- Можешь ли добавить data point из своего опыта (например, что у тебя только 9% URL индексировано из ~3.6K crawled)? + +**Black-out draft** (после чтения поста подставь специфику): + +``` +Honest write-ups like this are useful — most indexing-saas content I've seen is either fearmongering ("Google hates you!") or magic ("just call this endpoint!"). + +Specific data point from my side: in 3 months I have 391 URLs in sitemap, 327 indexed, but Google has discovered ~3.6K URLs total via internal links. Of the non-sitemap URLs about 2K are intentionally noindex (chapter-level content I don't want competing with public-domain sources), the rest sit in "Crawled - currently not indexed" purgatory. + +What I learned the hard way: sitemap submission is necessary but probably 20% of the actual indexing work. The other 80% is making each page substantial enough that Google decides it's worth keeping. Curious if your tool surfaces that or sticks to the technical "is this discoverable" layer — they feel like different problems to me. +``` + +~150 слов. Делится конкретными числами без preaching, заканчивается вопросом который дает OP повод ответить. + +--- + +## Comment #3 — TEMPLATE (need to read post first) + +**Post**: "I made a mistake every first-time founder makes — I built first, validated later. Here's what I'd do differently." by @SuhailQureshi + +**Why relevant**: Универсальная founder story. У тебя похожий путь — построил TextStack, теперь ищешь свою аудиторию. + +**Прочитать первое**, потом написать на основе: + +- Какие конкретные вещи он жалеет? +- Совпадают ли с твоим опытом? Или были разные? +- Есть ли что-то ценное что ты можешь добавить о своём пути? + +**Skeleton draft** (после прочтения адаптируй): + +``` +I'm in a similar arc but at the "still figuring it out" stage. Built [TextStack] for myself first — I had a specific frustration with reading technical books as a non-native English speaker, and the existing tools (Kindle Word Wise, dictionaries) didn't fit. The "build first" part worked because I was the user. + +Where I'm hitting your "validate later" wall: I have ~25 active users in the last 3 weeks, 9 returning, average session 32 minutes. The engagement says the product works. The hard part is finding the next 100 — the audience is real but globally distributed, no single subreddit or community. + +If I were starting over, I'd ask the "build vs. discover" question differently: not "should I validate?" (yes, always) but "validate WHAT specifically?". For me, "is there a vocabulary problem with technical books?" was the obvious thing to validate. "How do I find non-native English readers globally?" is what I should have validated, and didn't. +``` + +~165 слов. Здесь TextStack упомянут один раз и только для контекста — это нормально и даже честно. НЕТ ссылок, НЕТ "check it out". + +--- + +## Comment #4 — TEMPLATE (need to read post first) + +**Post**: "I used to think onboarding was unnecessary. I was wrong." by @showesome + +**Why relevant**: У TextStack reader с vocabulary SRS — сложная фича которая требует объяснения. У тебя есть real onboarding опыт. + +**Прочитать первое**, потом адаптировать: + +- Какие конкретные onboarding ошибки он описывает? +- Что у тебя в TextStack onboarding'е? Есть ли паттерны которые он не упомянул? + +**Skeleton**: + +``` +The thing that flipped for me on onboarding: it's not "show the user what's possible", it's "give them one specific win on day 1". For my reading app, the day-1 win is "tap a word, see a definition aware of the book's domain" — not "look at all these features". + +The temptation when you ship something with multiple features (SRS, dictionary, translation, TTS) is to walk the user through each one. I tried that. Users churned. Now the onboarding shows one tap-explain interaction on a sample page, then gets out of the way. The other features get discovered organically when the user is ready. + +What I'd add to your post: track the activation event specifically. For me it was "user taps and reads an explanation". Until that event fires for a new user, every feature beyond that is noise. +``` + +~155 слов. Конкретный пример из своего опыта, actionable совет (track activation event). + +--- + +## Comment #5 — TEMPLATE (need to read post first) + +**Post**: "I built a cron job monitor. Now I'm trying to figure out who actually loses sleep over this problem." by @KrasimirP + +**Why relevant**: У тебя такая же проблема — продукт работает, но аудитория distributed and hard to reach. + +**Прочитать первое**, потом: + +``` +The "who actually loses sleep" framing is good but here's a trap I fell into with my own niche product: the people who lose sleep are often NOT the people who buy. For cron monitoring, the engineer who got paged at 3am is the one losing sleep, but the budget owner who pays for monitoring is their manager or platform team lead. + +I'd separate two questions: +1. Who has the pain hard enough to seek a tool? (your sleep-losers) +2. Who has the authority to commit to a tool? (your buyers — different person, sometimes the same) + +For my reading app I conflated these for months — built features for the "I want to actually finish this book" pain (the user), then wondered why nobody was telling friends. Turns out the person you tell about a reading app isn't the person who suffered through the friction; you forget the friction once you've finished. + +Curious if cron monitoring has this gap or if pain-haver = buyer in your space. +``` + +~175 слов — чуть длиннее. Если сократить — убери последнюю строчку про "Curious if". + +--- + +## Self-check перед каждым коммент + +Перед нажатием Submit на любом коммент, ответь "да" на ВСЕ: + +- [ ] Я отвечаю конкретно на то что сказал OP, а не пишу общий совет? +- [ ] Я даю value автору поста, а не пытаюсь продать себя? +- [ ] Нет ссылок на textstack.app или другие мои проекты? +- [ ] TextStack упомянут максимум один раз и только для контекста, без CTA? +- [ ] Это звучит как мой голос, а не как ChatGPT-сгенерированный коммент? +- [ ] Это не воскресенье/праздник (когда комменты теряются)? +- [ ] Это мой 1-й или 2-й коммент за сегодня (не 5-й)? + +Если хотя бы один "нет" — не пости, переработай. + +--- + +## После того как получишь posting grant + +1. Опубликуй основной пост (draft в [ih-launch-draft.md](./ih-launch-draft.md)) +2. Twitter cross-post через 2-3 часа после +3. Продолжай делать substantive комменты на posts которые тебе интересны — 2-3 в неделю это хороший cadence для долгосрочного presence на IH diff --git a/docs/marketing/linkedin-routine/README.md b/docs/marketing/linkedin-routine/README.md new file mode 100644 index 00000000..cf244a74 --- /dev/null +++ b/docs/marketing/linkedin-routine/README.md @@ -0,0 +1,129 @@ +# LinkedIn comment-game playbook + +Companion to `docs/marketing/x-routine/`. Runs **Mon/Wed/Fri at 10:00 AM Toronto** via the `daily-linkedin-comment-game` scheduled task. Goal: build a niche professional audience (AI engineers + .NET / dev community) through substantive comments on others' posts — same engagement philosophy as the X reply-game, but tuned for LinkedIn's different mechanics. + +## Why LinkedIn + +- LinkedIn's algorithm rewards **dwell time on comments** more than reactions; a 3-sentence substantive comment outperforms 10 likes. +- The audience is older, more decision-shaped (technical leads, hiring managers, founders), conversion to actual contributors / TextStack users is much higher per impression than X. +- Reply windows are longer — a comment posted 24h after the parent post still gets visibility, unlike X where the 1-3h window is hard. + +## Tone & format vs. X + +| | X reply | LinkedIn comment | +|---|---|---| +| Length | 100-280 chars | 200-600 chars (2-4 sentences) | +| Hook | Optional | First sentence must stand alone — LinkedIn collapses by default | +| Emoji | Max 1, rare | None unless matching parent's tone exactly | +| Hashtags | Never | Never in comments (only in own posts) | +| Sign-off | None | None (no "Cheers, Vasyl" — looks AI-generated) | +| Persona | Peer-to-peer, terse | Peer-to-peer, slightly more formal but still direct | + +## Target tribe — priority order + +These are **names**, not URLs. The scheduled-task run will need to search LinkedIn for each. The pool is intentionally larger than the X list because LinkedIn surfaces a smaller subset of any given person's posts per visit. + +**Tier A — .NET / Microsoft ecosystem (highest topical overlap with TextStack stack):** +- Scott Hanselman (Microsoft, .NET evangelism) +- David Fowler (.NET architect) +- Damian Edwards (.NET PM) +- Khalid Abuhakmeh (JetBrains, .NET advocacy) +- Maarten Balliauw (.NET, Azure) +- Jeff Fritz (Microsoft .NET community) +- Andrew Lock (.NET author) +- Steve Smith (Ardalis, .NET architecture) + +**Tier B — AI engineering / local LLM ecosystem:** +- Simon Willison (Datasette, llm CLI) +- swyx (Latent Space) +- Andrej Karpathy (sporadic on LinkedIn but high-value) +- Mitchell Hashimoto (HashiCorp founder, recent solo AI work) +- Harrison Chase (LangChain) +- Aravind Srinivas (Perplexity) +- Logan Kilpatrick (Google AI / DeepMind) +- Eduards Sizovs (sizovs.net — dev architecture content, strong LinkedIn) + +**Tier C — Indie / build-in-public (highest reply-back rate):** +- Arvid Kahl (FeedbackPanda, Podscan) +- Pieter Levels (post sometimes on LinkedIn) +- Marc Lou +- Daniel Vassallo +- Sahil Lavingia + +**Tier D — Recruiter / hiring-adjacent (low engagement priority but parallel benefit):** +- Gergely Orosz (Pragmatic Engineer) +- Ryan Peterman (FAANG-focused content) + +## What to comment on + +- A post with a **claim** (architecture take, performance number, framework opinion) — counter or extend with a data point. +- A post with a **question** asked sincerely — answer it concretely. +- A post sharing a **debugging story** — share a parallel war-story (this is where TextStack production experience naturally fits). +- A post about **AI / local LLM tradeoffs** — peer-to-peer technical perspective. + +## What to skip + +- Pure promotion ("excited to announce" / launch posts) — engagement value is low; LinkedIn already amplifies these. +- Long-form opinion threads with 200+ comments already — your comment drowns. +- Posts older than 48 hours — LinkedIn algorithm has decayed by then. +- Political / current-events posts — same tribe constraints as X. +- "Inspirational" / motivational posts — wrong tribe. + +## TextStack reference rules — LinkedIn ≠ X + +LinkedIn is **personal brand only**. Do not name TextStack, do not cite TextStack production numbers, do not use "we shipped X" framing that ties back to the product. This is the **opposite** of the X reply-game (where 1 reference per session is allowed and working). + +- **Zero** TextStack mentions per session — no exceptions, even when it would topically fit. +- **Never include** github.com/mrviduus/textstack or textstack.app URLs. +- If a comment would only work with a TextStack reference, **drop the comment entirely** rather than rewriting around it. Speak generically about CPU-only deployment, local LLM tradeoffs, etc., without naming the product. +- Speak as a senior AI engineer with opinions and war stories — authority + perspective, not founder-led product PR. Recruiters and hiring managers respond to the former and tune out the latter. + +## Constraints (never violate) + +- **Never post a comment autonomously.** Each comment requires explicit per-message approval from the user. Scheduled task drafts; user approves and posts (or asks Claude to post once approved). +- **Never send connection requests autonomously.** +- **Never react / like autonomously** — that changes profile-side state. +- **Never engage with political, religious, geopolitical, or controversy content.** + +## Output format + +Save drafts to `docs/marketing/linkedin-routine/YYYY-MM-DD.md` in this format: + +```markdown +# LinkedIn comment-game drafts — [DATE] + +Generated by daily-linkedin-comment-game scheduled task. +N candidates selected. Pending user review and per-comment approval before posting. + +--- + +## Candidate 1 — [Name] ([Title at Company]) + +**Source post:** [LinkedIn URL] +**Posted:** [N hours ago] +**Excerpt:** "..." (2-3 lines of the original) + +**Draft comment:** +> [comment text — 200-600 chars] + +**Why this adds value:** [one line of reasoning] + +--- + +[continue for 2-5 candidates] +``` + +## Cumulative tracking + +Append a one-line summary to `docs/marketing/campaign-tracker.md` under a new section `## LinkedIn comment-game log` after each session — mirror the X routine log format. + +## Calibration + +- Weekly: 1-3 new connections expected by week 4 (lower volume than X follows but higher per-touch conversion) +- Monthly: 10-30 new connections by month 1 +- A comment has succeeded if it (a) gets a reply from the original poster, or (b) prompts a connection request from another commenter +- A session has succeeded if 3+ candidates were drafted with substantive value + +## Notes for the runtime + +The current LinkedIn URL for the @Rexetdeus / TextStack account is **unverified** — the scheduled task should first navigate to `https://www.linkedin.com/feed/` and confirm the logged-in user matches Vasyl Vdovychenko before scanning. If the session isn't logged in, write a one-line markdown file at `docs/marketing/linkedin-routine/YYYY-MM-DD.md` saying "LinkedIn session not authenticated — skipped" and exit cleanly. diff --git a/docs/marketing/x-routine/2026-05-12.md b/docs/marketing/x-routine/2026-05-12.md new file mode 100644 index 00000000..d778dfcd --- /dev/null +++ b/docs/marketing/x-routine/2026-05-12.md @@ -0,0 +1,55 @@ +# X reply-game drafts — 2026-05-12 + +Generated by daily-x-reply-game routine (first run, manual trigger). +Session notes: Following feed mostly noise today (Ferarri Prime amplifies growth-hack reposts; afternoon EDT = dev quiet hours). Found best candidates by going directly to @simonw profile after search returned mostly stale content. + +**2 candidates** selected. Pending user review and per-message approval before posting. + +--- + +## Candidate 1 — @simonw (HIGH PRIORITY) ✅ POSTED + +**Source post:** https://x.com/simonw — pinned post from 22h ago +**Author:** Simon Willison (174.5K followers — Tier B, we follow) +**Post excerpt:** +> "My Mac had less available memory than I expected, turned out the 'claude' Claude Code processes on this machine (running in various terminal windows) were consuming ~30GB on their own!" + +**Draft reply:** +> Same number 😂 — we run gemma4:e2b production on a 30GB CPU VPS (no GPU, no concurrency limit). Claude Code is squatting that exact limit on dev laptops. Wondering if there's a 'lightweight context' mode that drops to <5GB when idle, or is reservation always full? + +**Why this adds value:** +Concrete production-memory anecdote that mirrors Simon's exact 30GB number from the opposite direction (his Mac vs our prod VPS). Asks a real question Simon might actually know the answer to (Claude Code memory management). Natural TextStack production reference without link. + +**Day quota used:** 1/1 TextStack production reference for today. + +--- + +## Candidate 2 — @simonw (MEDIUM PRIORITY) + +**Source post:** https://x.com/simonw — 20h ago +**Author:** Simon Willison (same) +**Post excerpt:** +> "Wrote about today's GitLab restructuring / 'workforce reduction' announcement, and ended up digging around in version control for both the GitLab and the 37signals public employee handbooks to help illustrate my thoughts" +> (with link to simonwillison.net/GitLab Act 2) + +**Draft reply:** +> Curious — when you mine these public handbooks via version control, are you scripting the diff-extraction or doing it manually? I'd kill for a tool that surfaces 'what changed in policy X over the last N revisions' without me having to bisect git blame by hand. + +**Why this adds value:** +Specific question about Simon's workflow (he's known for tool-building, will engage). Implies a real research need (handbook-diff tool) that's adjacent to his expertise. No TextStack mention — pure community engagement. + +--- + +## Continued conversations + +None today — checked x.com/Rexetdeus/with_replies, no responses on prior replies to @theo or @masondrxy. + +## Notifications + +No new external follows or notable likes. Same self-engagement from @gl1tchmary alt account, same Paul Chen promo bot reply. + +## Routine notes + +- Feed quality issue: Ferarri Prime + a few accounts are reposting growth-hack "follow for follow" spam, flooding Following tab. Consider muting @callmeoscar_1, @CelebrityPuls_1, and similar accounts that retweet through Ferarri Prime to clean up the feed. +- Best content found by going DIRECT to @simonw profile rather than feed/search. Future runs: navigate to top 3-5 target profiles individually if Following feed is noisy. +- Afternoon EDT (14:50) is dev-quiet hours. Optimal post time for engagement is 8-10am ET or 6-9pm ET. Tomorrow's scheduled 9:04am ET run hits the morning peak. diff --git a/docs/marketing/x-routine/2026-05-13.md b/docs/marketing/x-routine/2026-05-13.md new file mode 100644 index 00000000..de94e7eb --- /dev/null +++ b/docs/marketing/x-routine/2026-05-13.md @@ -0,0 +1,92 @@ +# X reply-game drafts — 2026-05-13 (Wednesday) + +Generated by `daily-x-reply-game` scheduled task. +4 candidates selected (3 fresh + 1 continued). Pending user review and per-message approval before posting. + +**Session note:** Following feed was again dominated by mutual-follow engagement-bait (Web3_Jarin, Gooddlovee, Sindirella, CoinMarketCap). No on-topic candidates surfaced there. Falling back to live search + target-tribe profile scan worked — pulled two strong on-topic posts from `f=live` search, one from the official @dotnet account. + +--- + +## Candidate 1 — @im_yeyito (STRONG / TOP PICK) + +**Source post:** https://x.com/im_yeyito/status/2054547865867845997 +**Posted:** ~7m before scan (still in peak visibility window) +**Excerpt:** +> "llama.cpp getting a real eval path would be huge. local inference has plenty of vibes and anecdote already; comparable community evals are how the weird builds stop being impossible to reason about." + +**Draft reply:** +> Hard agree. Ran 63k requests through gemma4:e2b on a single-CPU prod VPS — p95 20.5ms, 100% success — and posting those numbers still felt like vibes-into-the-void. Nobody publishes comparable runs. A standard local-inference eval harness would change the whole conversation. + +**Why this adds value:** Directly answers the "vibes vs comparable data" complaint with a concrete data point from our own deployment. This is the one reply per session that's allowed to drop TextStack production numbers, and this is the perfect spot for it (the parent post is literally asking for that kind of data). Ends with implicit invitation to keep the methodology conversation going. + +--- + +## Candidate 2 — @VladimirVivien + +**Source post:** https://x.com/VladimirVivien/status/2054542251368263751 +**Posted:** ~27m before scan +**Excerpt:** +> "Generating source code with Gemma 4 — Gemma 4 2B on CPU, LiteRT-LM runtime, Example in litertlm-go" + +**Draft reply:** +> Curious how LiteRT-LM compares to llama.cpp/ollama for CPU inference under concurrent load. We ended up downsizing from gemma4:e4b to e2b on a single-CPU VPS once load hit — couldn't keep TTFB under 2s otherwise. What's your throughput look like on 2B? + +**Why this adds value:** Asks a specific runtime-comparison question framed around our actual e4b → e2b migration. Doesn't mention TextStack by name (no production-numbers reuse). Implicit invitation through the throughput question. Author is a Go/CPU-inference peer — high chance of a real reply. + +--- + +## Candidate 3 — @dotnet (Microsoft official) + +**Source post:** https://x.com/dotnet/status/2054548484145946675 +**Posted:** ~5m before scan +**Excerpt:** +> "Agentic AI is growing up. Microsoft Agent Framework 1.0 brings stable APIs, A2A cross-runtime messaging, MCP tool discovery, graph-based workflows, and a DevUI debugger for real-time agent tracing. #dotNET + Python parity finally lands." + +**Draft reply:** +> MCP tool discovery in-framework is the missing piece — wiring it by hand in ASP.NET has been the friction point. Is there a sample of A2A actually bridging a .NET agent to a Python one, or is parity mostly API-shape parity for now? + +**Why this adds value:** Substantive technical question on a megaphone account (replies on @dotnet posts get high incidental views from the .NET crowd). Frames us as a real ASP.NET builder using MCP. Question is specific enough that a sample link or a "wait for v1.1" answer both move the convo forward. + +**Risk:** Megaphone reply threads often get buried. Mitigation: post early before the thread fills up. + +--- + +## Continued conversations + +### @PaulChen088 — Synthadoc / Claude Code CLI Provider + +**Original thread:** https://x.com/PaulChen088/status/2053865235941826783 (replying to @Rexetdeus + @buildinpublic) +**Posted:** May 11 (2 days old — outside ideal window, but polite follow-up still worthwhile) +**Their message:** +> "It integrates with Claude code and Open code installations using CLI Provider. See this blog for details. [dev.to/synthadoc-your-coding-tool-is-now-your-wiki-brain]" + +**Draft reply:** +> Just opened the post — the CLI Provider angle is clever. Are you persisting the doc index between Claude Code sessions, or rebuilding per-invocation? The "re-derive context every run" tax is the part I haven't seen anyone solve elegantly. + +**Why this adds value:** Specific technical question about session persistence — moves us from "thanks" territory to actual peer engagement on his design choices. + +--- + +## New follow-backs + +None notable in last 48h. Only inbound activity was from @gl1tchmary (user's own alt — per task constraints, never engage). No external dev/indie/AI accounts in the >1000-follower range followed back. + +--- + +## Candidates considered but rejected + +- **@SnowCrashLabs** (6m, "CVE-2026-7482 turned 300K Ollama instances into cross-tenant memory leaks") — sensational format + suspicious CVE number; could be misinformation. Skip per "avoid security/controversy noise" principle. If real, will surface on HN. +- **@theo** (6h, "Is HTML the new Markdown?") — video reply requires watching to engage substantively; outside the time budget for this session. +- **@simonw** (May 11, claude processes / 30GB Mac memory) — already engaged in yesterday's session. +- **@karpathy** (May 11, "structure your response as HTML") — 2 days old, window closed. +- **@swyx** (11h, "increasing levels of autonomy: /skill, /plan, /goal") — interesting but slightly stale; defer to a fresher post. +- **@tdinh_me** (4h, referral-link self-congrats) — promotional, not substantive. + +## Suggestions for target list + +- Add **@VladimirVivien** to peer-CPU-inference watchlist if today's engagement lands — Go + on-device LLM is a tight overlap with our prod stack. + +## Calibration notes + +- Following-feed signal quality remains low; live search + tribe profile scan continues to be the only viable path. Suggest the user unfollow some of the engagement-bait accounts to clean up the Following feed for future sessions. +- Today's @im_yeyito candidate is the strongest fit since the routine started — direct ask for the exact data we have publicly written about. Worth posting first. diff --git a/docs/marketing/x-routine/2026-05-14.md b/docs/marketing/x-routine/2026-05-14.md new file mode 100644 index 00000000..3beaf43c --- /dev/null +++ b/docs/marketing/x-routine/2026-05-14.md @@ -0,0 +1,114 @@ +# X reply-game drafts — 2026-05-14 (Thursday) + +Generated by `daily-x-reply-game` scheduled task. +4 candidates selected (3 fresh + 1 continued). Pending user review and per-message approval before posting. + +**Session note:** Following feed is *still* dominated by mutual-follow engagement-bait (Sindirella, Ferarri Prime, "Say Hi for 550+ followers" templates) — third session in a row. Live search on `"local LLM" OR gemma OR ollama` and `"building in public"` again pulled the real candidates. Worth a follow/unfollow pass on the Following list this weekend. + +**Account state:** 11 followers (up from 2 at baseline 3 days ago — +9 in 3 days), 104 following, 28 posts. Bio + DDIA hook in place. + +--- + +## Candidate 1 — @MozillaAI (STRONG / TOP PICK — TextStack mention) + +**Source post:** https://x.com/MozillaAI/status/2054909949583450192 +**Posted:** ~5 minutes before scan (peak visibility window) +**Excerpt:** +> "A 4B model just read all of Alice in Wonderland offline on a MacBook. We used Gemma 4 with llamafile to test long-context reasoning, memory, and state tracking without the cloud." + +**Draft reply:** +> Curious where the 4B started to wobble on state tracking — we run gemma4:e2b in prod on a CPU VPS for short-form generation and it holds up, but the long-context CPU story still feels brittle. Does llamafile change that vs plain Ollama, or is it mostly packaging? + +**Why this adds value:** Asks a specific technical question about a real distinction (llamafile vs Ollama) on a megaphone account (mozilla.ai, 845 followers but Mozilla-branded reach). This is the one allowed TextStack-production-numbers mention of the day — it earns its place here because the parent post is *literally* about Gemma 4 on consumer hardware, which is exactly what we run in prod. High chance of a real reply from mozilla.ai's social team or someone on the llamafile team. + +--- + +## Candidate 2 — @TheWordWeaver_ (peer dev support, high follow-back probability) + +**Source post:** https://x.com/TheWordWeaver_/status/2054911084801282142 +**Posted:** ~3 minutes before scan (fresh, peak window) +**Excerpt:** +> "Day 3 of building in public 🙃 y'all, I having my ass handed out to me 🤧 9hrs+ now and counting, I been debugging. so here's the issue; I am trying to move Susu build environment from Replit to GitHub+ Vercel so it can get permanent hosting. that is where my woe began" +> +> [Screenshot shows Vercel build failing with `framer-motion` module resolution error] + +**Draft reply:** +> That looks like a framer-motion ESM import issue — Replit forgives loose CJS/ESM, Vercel doesn't. Pin framer-motion to a known-good version and make sure no `import` is missing a file extension (`.tsx` in your `src/components/...` lines). Cost me 4 hours last month. + +**Why this adds value:** Reads the actual screenshot, identifies the specific error, gives an actionable fix. Tier-C indie peer (bio: "I build AI trading bots & write about the process"), highest follow-back probability of the day. No TextStack mention — pure community contribution. Ends with shared frustration ("cost me 4 hours") to land empathetically. + +--- + +## Candidate 3 — @RahulGangwani24 (counter-perspective with real data) + +**Source post:** https://x.com/RahulGangwani24/status/2054908838931738890 +**Posted:** ~8 minutes before scan +**Excerpt:** +> "Can't use Copilot? Run local coding models through Ollama — unlimited free code assistance. Code Llama and DeepSeek Coder run on consumer hardware and integrate with VS Code. No subscription. No cloud dependency. Your code never leaves your machine. Privacy + free." + +**Draft reply:** +> The "consumer hardware" part hides the real cost: latency. On CPU-only laptops, DeepSeek Coder takes 5–15s per completion vs Copilot's <1s. Great for batch refactor, painful for inline autocomplete. Where are you actually using it day-to-day — chat or completions? + +**Why this adds value:** Counter-perspective backed by concrete numbers (5–15s vs <1s). Doesn't dismiss the idea, reframes it — there's a real use case for local code models, just not inline autocomplete. Ends with a specific question that forces the OP to clarify their actual usage (which is the more interesting conversation). No TextStack mention. + +--- + +## Candidate 4 — @asiokun3 (Japanese, lower priority but on-topic) + +**Source post:** https://x.com/asiokun3/status/2054909601569787921 +**Posted:** ~5 minutes before scan +**Excerpt (translated from Japanese):** +> "I'm running Ollama as a server and accessing it from an existing app as an OpenAI-compatible API. The only code change needed is updating the base_url. I was able to switch to local with almost zero migration cost." + +**Draft reply:** +> The base_url swap is the killer feature. We did the same — only gotcha was that some libs hardcode the `gpt-` prefix in model names for routing. Did you hit anything similar, or did your existing app accept a custom model id cleanly? + +**Why this adds value:** Validates a real production pattern, adds one specific gotcha (model-name prefix hardcoding) others might hit, ends with a clarifying question. **Caveat:** OP tweets primarily in Japanese — reply in English may go unanswered. Lower priority than #1–3. Post only if user is comfortable with cross-language engagement. + +--- + +## Continued conversations + +### @PaulChen088 — Synthadoc / Claude Code CLI Provider + +**Original thread:** https://x.com/PaulChen088/status/2053865235941826783 (replying to @Rexetdeus + @buildinpublic) +**Posted:** May 11 (3 days old — outside the ideal window, but this is the 2nd touch in an active thread, polite to close the loop) +**Their message:** +> "It integrates with Claude code and Open code installations using CLI Provider. See this blog for details. [dev.to/synthadoc-your-coding-tool-is-now-your-wiki-brain]" + +**Note:** Yesterday's session drafted a near-identical follow-up (re: doc index persistence). If yesterday's draft wasn't posted, post that one rather than this — they're the same intent. If it was already posted, skip this entry today. + +**Draft reply (only if yesterday's wasn't posted):** +> Smart — CLI provider sidesteps the per-user key chore entirely. Quick follow-up: does it fall back gracefully if the user doesn't have Claude Code or OpenCode installed, or is one a hard prereq? That shapes who I'd recommend it to. + +**Why this adds value:** Acknowledges Paul's answer, asks a meaningful prereq question (installation requirements) that anyone considering Synthadoc would care about. Keeps the conversation alive without being sycophantic. + +--- + +## New follow-backs + +**Notable:** @Mary — followed @Rexetdeus on May 10 and liked 4 posts including the AGPL/DDIA hook post. Not in tier list but worth noting — the DDIA hook continues to land. Account scale unknown. + +No other external follow-backs since last session. Total followers: 11 (+9 since baseline). + +--- + +## Candidates considered but rejected + +- **@simonw** (8h, "Doing this is a great way to make a bonfire of your reputation" re: AI-generated LinkedIn comments) — visibility window closed, and we already engaged with @simonw earlier this week. +- **@swyx** (12h reposted talk announcement, 22h "haha openclaw bad" prompt-injection thread) — both outside the 1–3h freshness window. +- **@arvidkahl** (recent posts are May 9–May 11) — window closed; all top posts are 3+ days old. +- **@__aplace__** (May 12, local LLM cost-effective token generator) — 2 days old. +- **@WayneallenEnt** ("$MNFT building in public") — crypto promo, off-tribe. +- **@jakubmuzzik** ("literally building in public today" + cafe photo) — no substantive content to engage with. + +## Suggestions for target list + +- Add **@MozillaAI** to Tier B watchlist if today's reply lands — they post Gemma/llamafile updates regularly and the topical overlap with our stack is high. +- Consider **@TheWordWeaver_** for Tier C if they engage back — indie peer with AI trading bot bio, similar build-in-public energy. + +## Calibration notes + +- 3rd consecutive session where Following feed signal quality is essentially zero. Recommend the user spend 15 minutes this weekend unfollowing the engagement-bait accounts (Sindirella, Ferarri Prime, generic "Say Hi" accounts) — they're crowding out the signal we actually pay attention to. +- Follower growth from 2 → 11 in 3 days is consistent with the routine working at expected rate (5–15/week projection). Continue. +- Top pick today (@MozillaAI) is the only one that warrants the TextStack production-numbers mention. If you only post one reply today, post that one. diff --git a/docs/marketing/x-routine/2026-05-15.md b/docs/marketing/x-routine/2026-05-15.md new file mode 100644 index 00000000..7f173fc4 --- /dev/null +++ b/docs/marketing/x-routine/2026-05-15.md @@ -0,0 +1,133 @@ +# X reply-game drafts — 2026-05-15 (Friday) + +Generated by daily-x-reply-game scheduled task. +4 fresh candidates selected, 1 continued conversation. Pending user review and per-message approval before posting. + +Session notes: Following feed quality remains essentially zero for the 4th consecutive day — engagement-bait reposts (Sindirella, Ferarri Prime, "Drop Hi" growth-hack accounts) dominate the top 20. Live-search on `"local LLM" OR ollama OR gpt-5 -filter:replies lang:en` carried this session, plus three Tier B/C profile scans (@simonw, @theo, @karpathy, @arvidkahl) to spot-check the tribe. + +--- + +## Candidate 1 — @MoureDev (Brais Moure, verified) + +**Source post:** https://x.com/MoureDev/status/2055272855705718857 +**Posted:** 13:03 UTC (~2 min before scan — fresh, peak visibility window) +**Excerpt:** +> "The AI Workshop on Local from Scratch is now published! +> - Fundamentals +> - How to Choose a Model +> - Installation and Setup of Local Models with LM Studio and Ollama +> - Using Local Models in VS Code and Claude Code +> - Recommendations" +> (links to youtube.com/live/LQqv4M_NtNc) + +**Draft reply:** +> Solid stack — LM Studio for the on-ramp, Ollama once people stop wanting a UI. We landed on gemma4:e2b in prod on a 30 GB CPU VPS (no GPU), p95 ~20ms over a 63k-request load test. The e4b→e2b downgrade only became obvious under concurrent load. Does the workshop touch model-size vs hardware tradeoff? + +**Char count:** ~280 (right at limit — trim "Solid stack — " if needed: brings to ~265) +**Why this adds value:** Specific production data point (model, p95, load test size) that nobody mentions in beginner tutorials. The "e4b→e2b under concurrent load" detail is a real gotcha learners hit later. Ends with a question that prompts a workshop creator to engage. This is **today's single TextStack production-numbers mention** per the daily cap. + +**Priority:** P1 — highest value reply of the session. Brais has a verified account with sizable EN+ES following and his workshop overlaps directly with our deployment story. + +--- + +## Candidate 2 — @aterrel (Dr. T, Anaconda) + +**Source post:** https://x.com/aterrel/status/2055273129447174572 +**Posted:** 13:04 UTC (~1 min before scan) +**Excerpt:** +> "Come join me at @anacondainc booth today for our CUDA Python 1.0 release. I'll be hosting our Local LLM generated DJ party on a Spark DGX from 1:15-3 and presenting on the stack at 4pm! #PyConUS" + +**Draft reply:** +> Spark DGX as a DJ box is a fun flex. What's the inference path driving the set — vLLM, Triton, something custom? Streaming generation per-track or pre-rendering the playlist? Curious where the latency budget actually lives. + +**Char count:** ~225 +**Why this adds value:** Asks a real, specific technical question (inference stack + streaming vs pre-render) that a presenter at PyConUS would enjoy answering. Light tone matches the parent post. No TextStack mention — pure community contribution. + +**Priority:** P2 — Anaconda employees engage well; lower follower count than Brais but in-tribe. + +--- + +## Candidate 3 — @ollies0x (OLLIE) + +**Source post:** https://x.com/ollies0x/status/2055272527728177361 +**Posted:** 13:02 UTC (~3 min before scan) +**Note:** This is a reply within a thread (replying to @ollies0x's own parent post), not a standalone — engagement risk slightly elevated since the parent context shapes the conversation. Worth a look at the parent thread before posting. +**Excerpt:** +> "OpenCode and Ollama Cloud is $30 a month so why would you want a 3090 at this point (yes I know privacy etc.) You'd have to use OpenCode and Ollama Cloud for 40 months before you recoup the cost of a 3090. In 40 months, the tech available is going to be better than a 3090" + +**Draft reply:** +> The 40-month math assumes steady usage. Burst workloads leave the 3090 idle 90% of the time and cloud wins on $/used-hour. Third path nobody talks about: small-model on a CPU-only VPS — kills the GPU-vs-cloud debate for a wide band of use cases. + +**Char count:** ~250 +**Why this adds value:** Counter-perspective that doesn't dunk — agrees with their math while adding a missing dimension (usage profile) and a third option (CPU). The "nobody talks about" framing positions us as the missing voice in the conversation. No TextStack name-drop but our deployment IS the implicit example. + +**Priority:** P2 — economics threads tend to get good follow-engagement. + +--- + +## Candidate 4 — @swyx (Tier B, 157.5K followers) + +**Source post:** https://x.com/swyx/status/2055231013253472418 +**Posted:** 10:17 UTC (~3h before scan — at edge of freshness window) +**Excerpt:** +> "also i think the publicly disclosed revenue time series looks like this now btw. project to EOY, closest to the correct EOY ARR prediction gets a like" (attached chart, unseen by us — almost certainly OpenAI/Anthropic/xAI ARR curve) + +**Draft reply:** +> Calling $15B EOY. The pattern that keeps surprising me on this kind of curve is the inflection lands 1–2 quarters after the team itself notices — public time series lags internal vibe by exactly one fundraise. + +**Char count:** ~215 +**Why this adds value:** Plays the game swyx is asking for (gives a specific number) AND adds a meta-observation about how ARR curves leak ahead of public disclosure — a take swyx, as a fund-adjacent insider, may actually want to push back on. Optimized for a quote-reply or like from swyx, which compounds visibility 10–100x. + +**Priority:** P1 (visibility-weighted) — swyx's reply-game is high-volume and he engages with specific predictions. **Caveat:** we cannot see the chart image — if the curve is obviously not an AI lab (e.g., it's Bitcoin or a public co), the EOY guess needs to change. **User: please skim the image before posting.** + +--- + +## Continued conversations + +### @PaulChen088 — Synthadoc / Claude Code CLI Provider (3rd carry-over) + +**Original thread:** https://x.com/PaulChen088/status/2053865235941826783 (replying to @Rexetdeus + @buildinpublic) +**Their message (May 11):** +> "It integrates with Claude code and Open code installations using CLI Provider. See this blog for details. [dev.to/synthadoc-your-coding-tool-is-now-your-wiki-brain]" + +**Status:** Drafted on 2026-05-13 and 2026-05-14, deferred both times. Today is the **third** carry-over. Two options: + +1. **Post the existing draft from 5-13/5-14** (re: graceful fallback if Claude Code/OpenCode isn't installed) — closes the loop politely, no fresh wording needed. +2. **Drop the thread** — 4 days is past the polite-close window. If the user has no intention of posting, deleting from the queue is cleaner than carrying forever. + +**Fresh draft (only if option 1 is chosen and yesterday's wording feels stale):** +> Bookmarked. The CLI-Provider trick sidesteps the per-user key chore entirely — nice call. Quick prereq question: does Synthadoc degrade gracefully if the user has neither Claude Code nor OpenCode installed, or is one of them a hard requirement? Shapes who I'd point at it. + +**Recommendation:** post option 1 today or close the loop in the tracker. + +--- + +## New follow-backs + +No new external follow-backs since the 2026-05-14 session. Account state appears unchanged from yesterday (11 followers). + +Mary's follow + 4-post likes from May 10 was already logged in yesterday's file. + +--- + +## Candidates considered but rejected + +- **@simonw** — freshest post is 14h old ("Mitchell's React Native porting" + the @mitchellh quote), outside the 1–3h freshness window. We engaged with him this week already. +- **@karpathy** — freshest post is May 11 (4 days), window closed. +- **@theo** — fresh "Subnautica 2 day off" post (1h) is off-topic; the pinned "I cancelled my Claude Code sub" thread is 16h old and has 500+ replies, our reply would drown. +- **@arvidkahl** — freshest substantive post is May 14 (~18h), outside window. +- **@distokens** — "How AI founders accidentally destroy margins" thread is on-topic but the account is openly selling AI-cost-reduction infrastructure, so it's a startup-promo thread, not peer dev contribution. Off-tribe. +- **@mudler_it (Ettore Di Giacinto, LocalAI maintainer)** — fresh reply about llama.cpp community history is substantive but mid-thread, hard to engage cleanly without parent context. **Add to tribe watchlist** for next session — high-relevance maintainer account. +- **@NVIDIAAI** — replied "Sounds dreamy" to a Local LLM workshop tweet 6 min ago, no substance to engage with from us. +- **@wakanara** — Japanese-language Ollama post; lower follow-back probability vs effort, especially since we tried a JP candidate yesterday (@asiokun3) and got no response yet. + +## Suggestions for target list + +- **Add @mudler_it (LocalAI maintainer)** to Tier B watchlist — high topical overlap, posts daily about the local LLM ecosystem. +- **Add @MoureDev** to Tier B watchlist if today's #1 reply lands — verified, large EN/ES bilingual reach, workshop content overlaps our stack monthly. + +## Calibration notes + +- **Following feed unusable for the 4th straight session.** Reiterating yesterday's recommendation: ~15 min unfollowing the engagement-bait accounts (Sindirella, Ferarri Prime, Zinariya/@Oxboddi, Christiana, Lily_9010, Isabella9zox, Nitya_4u) would meaningfully reclaim the signal. Live-search + tribe profiles will continue carrying these sessions regardless. +- **No reply yet from @asiokun3** to yesterday's base_url-prefix gotcha question — JP-language reply takes longer; check again Monday. +- **No new follow-backs from yesterday's 4 posts.** That's the expected pattern at this scale — follower lift comes from likes/quote-replies on the *reply itself*, which take 24–72h to surface. diff --git a/docs/marketing/x-routine/2026-05-18.md b/docs/marketing/x-routine/2026-05-18.md new file mode 100644 index 00000000..ed110530 --- /dev/null +++ b/docs/marketing/x-routine/2026-05-18.md @@ -0,0 +1,105 @@ +# X reply-game drafts — 2026-05-18 (Monday) + +Generated by daily-x-reply-game scheduled task. +5 candidates selected. Pending user review and per-message approval before posting. + +Account state at run time: 10 followers (up from 2 at baseline), 99 following, 36 posts. Notifications quiet since May 11. + +--- + +## Candidate 1 — @cwwhitehead (TextStack mention — use this one for the daily prod-numbers reference) + +**Source post:** https://x.com/cwwhitehead/status/2056359150611726471 +**Posted:** ~15 min ago (9:00 AM EDT) +**Parent:** Reply to @tobi (Tobi Lütke, Shopify CEO, 469K followers — high reply-chain visibility) +**Chris bio:** Head of AI @ LXN +**Excerpt:** "Why qwen and not Gemma 4?" + +**Draft reply:** +> We ran the opposite path — qwen3:8b → gemma4:e4b → e2b. e2b held p95 ~20ms over a 63k-request load on a single CPU VPS; e4b couldn't keep up under concurrent load. qwen edged it on quality but the latency tax wasn't worth it for our use case. + +**Char count:** 268 +**Why this adds value:** Answers Chris's question with the exact comparison he's asking about, backed by Vasyl's own migration numbers. Tobi's audience reads his thread replies. This is the day's single TextStack production-numbers mention. + +--- + +## Candidate 2 — @tallhamn (Marcus Tallhamn, verified, indie dev) + +**Source post:** https://x.com/tallhamn/status/2056358574289231964 +**Posted:** ~15 min ago +**Parent:** Reply to @antirez (Salvatore Sanfilippo, Redis creator). antirez's parent post — "Imagine a local agent where cache misses don't exist, tools don't need translations, you see progress for prefill, tokens are emitted ASAP" — has 276 likes, 21K views. +**Excerpt:** "Performance aside a solid vertically integrated coding stack would make it so much easier for people to solve application problems with local AI. I've spent a ton of time experimenting and benching Claude/codex/opencode/pi + llama.cpp/ollama/lmstudio + middleware for API glue and performance monitoring (tps etc) + qwen/ds/... with various configs" + +**Draft reply:** +> +1 on middleware pain — once you swap providers, the glue layer becomes the product. Per-token cost & latency don't live in the same dashboard as app SLOs, so failures look like "random slowness" for weeks. + +**Char count:** 211 +**Why this adds value:** Validates Marcus's pain point and adds the specific observability gap that doesn't get talked about. antirez's reply-chain is high-visibility; no TextStack mention keeps it pure community contribution. + +--- + +## Candidate 3 — @levelsio (Tier A, 868K followers) + +**Source post:** https://x.com/levelsio/status/2056355830778261651 +**Posted:** ~30 min ago +**Excerpt:** "Testing a bit with 3d terrain on Hoodmaps today. You can actually see the favelas in Rio pretty well, this is Rocinha, it's kinda hidden between a mountain top and another one..." + +**Draft reply:** +> What's powering the elevation — Mapbox terrain-rgb tiles, or are you sampling SRTM directly? Curious whether you went raster or vector mesh for the favela density bits. + +**Char count:** 168 +**Why this adds value:** Specific technical question on his build choices. levelsio responds to good implementation questions in his replies. Implicit invitation for him to share, which keeps reply-chain visibility going. + +--- + +## Candidate 4 — @dotnet (339K followers, primary stack) + +**Source post:** https://x.com/dotnet/status/2056355426396995709 +**Posted:** ~30 min ago +**Excerpt:** "Coding standards aren't 'nice to have' — they're performance features. Today's On .NET LIVE, we take you through project structure to API shape to async patterns..." + +**Draft reply:** +> The biggest async one nobody enforces: ConfigureAwait(false) in library code. Lint-rule it, save your future self a year of deadlocks when someone uses your code from a SynchronizationContext. + +**Char count:** 196 +**Why this adds value:** Concrete, contentious-enough .NET pattern most teams under-weight. Likely to prompt either agreement or "actually, AsyncLocal now…" replies — both are good for thread visibility. + +--- + +## Candidate 5 — @VitalikButerin (6.3M followers) + +**Source post:** https://x.com/VitalikButerin/status/2056354141832626487 +**Posted:** ~35 min ago +**Excerpt:** "Many people have claimed that with AI-assisted bug finding, secure code (and hence trustless anything) will be impossible. I have a much more optimistic take, and AI-assisted formal verification is a major part of the reason why: [links shallow dive into formal verification]" + +**Draft reply:** +> Curious where you'd draw the line — AI-assisted spec writing is the harder gap IMO than the verification step itself. The proof checker is rigorous; the question is whether the spec actually captures intent. Are you betting on LLMs closing that gap? + +**Char count:** 249 +**Why this adds value:** Thoughtful divergent take phrased as a question, on a well-known limit of formal methods (the spec problem). Frames as inquiry not contradiction — keeps Vitalik likely to engage. Massive visibility if he or anyone in the thread replies. + +--- + +## Continued conversations + +### Paul Chen @PaulChen088 — May 11 (1 week old, still unattended) + +**Original thread:** Vasyl tagged @buildinpublic; Paul Chen replied with: "It integrates with Claude code and Open code installations using CLI Provider. See this blog for details." Linked dev.to article "Synthadoc: Your Coding Tool Is Now Your Wiki Brain." + +**Draft continued reply:** +> Just read it — CLI provider angle is clever. How are you handling rate-limit/cost when wiki crawls hit Claude's API at scale? That's the bit that nuked my first attempt at something similar. + +**Char count:** 192 +**Why this is worth catching up on:** External, on-topic developer. A week old but the reply slot is unclaimed, and continuing the thread keeps the relationship warm + signals attention. Lower priority than today's fresh candidates. + +--- + +## New follow-backs since baseline + +- Followers grew 2 → 10 since May 12 (flagship article publication). No specific notable accounts surfaced in the May 10-11 notification window — mostly the @gl1tchmary alt and one "Mary" follow (likely same alt). + +## Proposed additions to target list + +- **@antirez** (Salvatore Sanfilippo, Redis creator) — already prominent in the local-agent conversation tribe; high gravitational pull on AI-coding discussion. Worth adding to Tier E. +- **@cwwhitehead** (Head of AI @ LXN) — small but in-tribe, posts on local LLM topic. Worth adding to Tier B. +- **@tallhamn** (Marcus Tallhamn, "Code slinger for robot swarms") — verified indie dev benching local AI stacks. Worth adding to Tier C. diff --git a/docs/marketing/x-routine/2026-05-19.md b/docs/marketing/x-routine/2026-05-19.md new file mode 100644 index 00000000..0214151d --- /dev/null +++ b/docs/marketing/x-routine/2026-05-19.md @@ -0,0 +1,105 @@ +# X reply-game drafts — 2026-05-19 + +Generated by daily-x-reply-game scheduled task. +4 candidates selected + 1 continued conversation. Pending user review and +per-message approval before posting. + +Notes on this session: +- Following-tab signal was weak — heavy crypto/sports noise, only 1 strong + candidate from Tier-A targets (@levelsio). Backfilled via live search on + `ollama lang:en min_faves:5` and `"local LLM" OR gemma OR ollama OR + "Claude Code"`. +- 1 reply mentions TextStack production experience (the @DAlistarh one) — + within the "max 1/day" budget. The rest are pure community contribution. +- Notifications check: no new follow-backs since 2026-05-15. Follower count + is 8 (up from baseline 2 on 2026-05-12, so +6 over the week). + +--- + +## Candidate 1 — @DAlistarh (Tier B, ~4h old) ⭐ TextStack-mention reply + +**Source post:** https://x.com/DAlistarh/status/2056661176843436421 +**Posted:** ~4 hours ago +**Excerpt:** "Weight-only quantization powers local LLMs like llama.cpp or Ollama. But SOTA quantized accuracy requires complex kernels that are notoriously hard to implement. Can we get SOTA accuracy and keep things simple? Our new GSQ (Gumbel-Softmax Quantization) method says yes." + +**Draft reply:** +> We run gemma4:e2b via Ollama on a CPU-only VPS — downgraded from e4b because concurrent throughput was the bottleneck, not accuracy. Curious how GSQ kernels compare to Q4_K_M on pure-CPU inference, where most indie deployments actually live. + +**Char count:** 245 +**Why this adds value:** Real-world deployment constraint (CPU-only, concurrency-bound) that the paper's accuracy framing doesn't address. Asks a specific kernel comparison question that invites a technical follow-up. Positions us as someone with skin in the local-LLM game without naming the product. + +--- + +## Candidate 2 — @levelsio (Tier A, ~1h old) + +**Source post:** https://x.com/levelsio/status/2056705090073870460 +**Posted:** ~1 hour ago +**Excerpt:** "A nice way to stay safe is to ask Claude Code to audit your devices. I do same on my VPS servers, so today I tried it on my MacBook Pro and it's pretty good at it too. It founds lots of stuff that was not secured, I actually forgot to enable FileVault when I got this new MBP..." + +**Draft reply:** +> Same approach found a stale nginx rule on mine I'd assumed was localhost-only — it wasn't. Did the audit catch anything you wouldn't have looked for manually, or mostly things you already knew you'd been sloppy about? + +**Char count:** 220 +**Why this adds value:** Concrete confirming data point (stale nginx rule, not-actually-localhost) + a sharp question that splits "Claude Code as net-new auditor" vs "Claude Code as procrastination-killer." Either answer is a reply hook. Tier-A reply visibility on a fresh post is the highest-EV slot today. + +--- + +## Candidate 3 — @imikerussell (mid-tier, ~5min old at scan time) + +**Source post:** https://x.com/imikerussell/status/2056722925634322670 +**Posted:** ~5 minutes ago (top of thread; we're replying to Step 3 in the thread) +**Excerpt:** Multi-step thread on rebuilding his YouTube studio with AI. Step 3: "I gave Claude Code control of my studio lights. Plugged it into Home Assistant (Elgato + Hue). Told it to cycle scenes and iterate until my live audience approved. 30+ variations. My hands never touched a dimmer." + +**Draft reply:** +> Step 3 is the one that breaks my brain — the audience-feedback loop is what makes it work, not the API access. Did you give it a "stop iterating" criterion, or just bail when you liked the result? Curious how you avoided cost-of-iteration blowing up. + +**Char count:** 248 +**Why this adds value:** Identifies the actually-novel part of his setup (feedback loop, not the HA integration which is well-trodden), and asks the engineering question that anyone running iterative LLM loops in prod cares about: stop condition. Likely to land in a thread where his followers are already paying attention. + +--- + +## Candidate 4 — @JulianGoldieSEO (~2h old) + +**Source post:** https://x.com/JulianGoldieSEO (profile — search bar to locate; thread starts "LOCAL COMPUTER USE AGENTS ARE FINALLY REAL") +**Posted:** ~2 hours ago +**Excerpt:** "LOCAL COMPUTER USE AGENTS ARE FINALLY REAL. You can now run an AI agent on your own machine that opens apps, writes notes, browses, and works in the background. But the setup breaks if you miss one step. The Local Agent Stack: → Ollama runs the local model on your machine →..." + +**Draft reply:** +> That "setup breaks if you miss one step" line is the entire local-LLM story. The Ollama → app glue is where 80% of the deployment debugging actually lives, not the model itself. What broke for you that took longest to find? + +**Char count:** 225 +**Why this adds value:** Validates his framing while shifting attention to the actually-load-bearing engineering insight (glue is harder than the model). The closing question is generous — gives him a chance to share a war-story which is good thread fuel. + +--- + +## Continued conversations + +### @PaulChen088 — pending reply from May 11 + +**Their reply on our @buildinpublic thread:** +> "It integrates with Claude code and Open code installations using CLI Provider. See this blog for details." (links Synthadoc blog on dev.to) + +**Note:** This is 8 days old, which is past the "fresh engagement" window for most reply chains, but reciprocity matters and Paul replied to one of our own posts. Borderline whether to engage; included for your call. + +**Draft reply:** +> Synthadoc looks like a similar bet on the "ground the LLM in your codebase, not the open web" pattern. Curious if you're seeing latency from the CLI provider hop, or if it's mostly fine for interactive use. + +**Char count:** 207 + +--- + +## Target list — proposed additions + +None this session — the Tier A/B list still pulls in good content when we +backfill via live search. The bigger problem is that the Following tab is +crypto-heavy; consider unfollowing CoinMarketCap and Bitcoin to declutter, +or moving high-signal accounts to a Twitter List for faster scanning. + +--- + +## Posting order recommendation + +If only posting 1: **@levelsio** (Tier A, fresh, highest visibility). +If posting 2: add **@imikerussell** (fresh thread, mid-tier author, technical question hooks into engineering audience). +If posting 3-4: add **@DAlistarh** (TextStack mention budget) and **@JulianGoldieSEO** (lower-tier but on-topic). +Skip the @PaulChen088 continued conversation unless you specifically want to reciprocate — it's stale. diff --git a/docs/marketing/x-routine/2026-05-20.md b/docs/marketing/x-routine/2026-05-20.md new file mode 100644 index 00000000..0435044c --- /dev/null +++ b/docs/marketing/x-routine/2026-05-20.md @@ -0,0 +1,107 @@ +# X reply-game drafts — 2026-05-20 (Wednesday) + +Generated by daily-x-reply-game scheduled task. +4 candidates selected + 3 continued conversations. Pending user review and per-message approval before posting. + +**New follow-backs:** none. Notifications quiet — only @PaulChen088 (May 11, promo reply w/ dev.to link — skip, also flagged stale in prior logs) and @gl1tchmary activity (own alt account). Account state: **100 following / 6 followers** (−2 vs May 19's 8 — slow churn continuing). + +**Feed note (6th session running):** Following tab still stale — top posts were Elon (22h), then May-18 reposts, Karpathy 21h, Theo 9h, Rauch 23h. Nothing in the 1–3h window from the Following tab. Live search (`"local LLM" OR ollama OR gemma`, then quality-filtered variants) + direct tribe-profile scans (@simonw, @theo, @swyx, @levelsio, @arvidkahl) carried the harvest again. Freshest genuine candidate is Theo at ~3h; the rest are 11–14h (still inside their audience's visibility window given a quiet Wed-morning EDT slot). Standing suggestion from May 19 still applies: move high-signal accounts into a dedicated List so the Following tab stops being noise. + +--- + +## Candidate 1 — @theo *(Tier B — top pick, freshest)* + +**Source post:** https://x.com/theo/status/2057029377310052521 +**Posted:** ~3 hours ago (quote-tweet of @seconds_0) +**Excerpt:** "Wait wtf, they STILL haven't updated pretraining???" — QT of a post noting Gemini 3.5 Flash ships with a Jan 2025 knowledge cutoff (17 months stale) despite a May 19 2026 release; it's a refinement of the 3 Pro base, not a fresh pretrain. + +**Draft reply:** +> Knowledge cutoff barely matters for agentic work if you pipe current docs into context. The real cost of stale pretraining: the model confidently autocompletes deprecated APIs it "remembers." Retrieval fixes facts, not muscle memory. Seeing the same? + +**Why this adds value:** Reframes the complaint with a concrete failure mode (deprecated-API hallucination) instead of piling onto "stale = bad" — gives Theo a specific thing to react to. Fresh post, high reply visibility, no TextStack mention (pure community contribution). + +--- + +## Candidate 2 — @simonw *(Tier B/E)* + +**Source post:** https://x.com/simonw/status/2056867815605625172 +**Posted:** ~14 hours ago +**Excerpt:** "My notes on Gemini 3.5 Flash – 3x the price of Gemini 3 Flash but Google are planning to use it for many of their own products." + +**Draft reply:** +> The 3x jump is the interesting part — "Flash" used to be the no-brainer cheap default. At any real volume that bump flips the math from "just upgrade" to "benchmark whether 3 Flash is still good enough." Curious if the quality delta actually justifies it. + +**Why this adds value:** Adds the build-vs-buy / volume-economics angle Simon writes about constantly, and ends on an open question about the eval delta — invites his take rather than just agreeing. No TextStack mention. + +--- + +## Candidate 3 — @DivyanshT91162 *(local-LLM topic match — carries today's TextStack mention)* + +**Source post:** https://x.com/DivyanshT91162/status/2056910886108021119 +**Posted:** ~11 hours ago +**Excerpt:** "...local AI still feels like a developer hobby: install Ollama → fight dependencies → pray your GPU survives." (post then shares the nay-cat/ChromeChat repo) + +**Draft reply:** +> The "pray your GPU survives" part is the myth worth killing — you often don't need one. We run gemma4:e2b in prod on a CPU-only 30GB VPS, no GPU, p95 ~20ms across a 63k-request load test. Right-sizing the model beats throwing hardware at it. What's your stack landing on? + +**Why this adds value:** Directly counters the post's central assumption (local LLM = GPU pain) with hard production numbers — genuinely useful to anyone reading. This is the day's single TextStack-experience mention. + +**Caveat for review:** @DivyanshT91162 is a high-volume AI-news/repo-curation account (~2,900 posts, multiple/hour), not a peer builder — lower follow-back value. Reply still reaches post readers. If you'd rather not engage a news-farm account, skip this one and the session still stands at 3 strong candidates with no TextStack mention (the routine caps it at 1/day — zero is fine). + +--- + +## Candidate 4 — @arvidkahl *(Tier A — open-source / supply-chain security)* + +**Source post:** https://x.com/arvidkahl/status/2056912488159936956 +**Posted:** ~11 hours ago (quote-tweet of @github) +**Excerpt:** "We have wormsign. Did Shai Hulud strike at the source?" — QT of GitHub announcing it is investigating unauthorized access to its internal repositories. + +**Draft reply:** +> The unsettling part isn't one more bad package — it's the blast radius moving up to the host itself. Pinned versions and lockfiles defend against bad publishes; they don't help when the trust root itself is in question. Not sure what defense-in-depth even looks like there. + +**Why this adds value:** Elevates the thread from "another breach" to the structural point (host compromise vs package compromise breaks your usual lockfile defenses) — a real concern for any OSS maintainer. Divergent-take ending invites Arvid to weigh in. No TextStack mention. Note: kept deliberately general — avoids asserting specifics about the Shai-Hulud worm. + +--- + +## Continued conversations + +These are external-user replies that landed on Vasyl's prior replies (posted May 14–15; this is the first session checking reciprocity on them). None are @gl1tchmary. + +### Continued 1 — @TheWordWeaver_ (GhostPen) + +**Thread:** https://x.com/Rexetdeus/status/2055068233107927261 +**They replied:** "Thanks so much man. I really appreciate. Would get on it now." (on Vasyl's framer-motion ESM debug tip) + +**Draft continued reply:** +> Anytime. One more if it still fights you after the version pin: check that framer-motion isn't getting double-bundled — Vercel's tree-shaking trips on mixed CJS/ESM and you get two copies fighting over context. Ping me how it goes. + +**Why:** Turns a thank-you into a second concrete tip + an open door — keeps a build-in-public peer (bio: builds AI trading bots) warm. + +### Continued 2 — @ollies0x (OLLIE) + +**Thread:** https://x.com/Rexetdeus/status/2055411803941658635 +**They replied:** "Yeah, very interesting, and good point. I personally think that the 'home' (and vps) tech will advance quicker than the current tech... Making home tech, possibly, riskier." + +**Draft continued reply:** +> Right — that's the real case for renting or small-model-on-VPS: you're not betting 40 months on one GPU generation. Hardware depreciation is the line item nobody prices in. Cloud/VPS turns capex guesswork into a monthly opex you can cancel. + +**Why:** Extends his point (fast-moving tech → ownership risk) with the capex-vs-opex framing — substantive, peer-level, keeps a real GPU-vs-cloud discussion going. + +### Continued 3 — @RahulGangwani24 (Rahul) + +**Thread:** https://x.com/Rexetdeus/status/2055068537601822838 +**They replied:** "Latency on CPU laptops was never the surprise... Most people using Ollama daily lean on chat-style workflows: refactors, debugging, test generation and not Copilot-style keystroke autocomplete." + +**Draft continued reply:** +> That tracks — and it reframes the whole "local can't replace Copilot" debate. They're different tools: one is keystroke-latency-bound, the other throughput-bound. Local wins the second category outright. The autocomplete comparison was always apples-to-oranges. + +**Why:** He answered Vasyl's earlier question directly; this validates and sharpens it into a reusable framing (latency-bound vs throughput-bound) — a clean note to end or extend the thread on. + +--- + +## Posting checklist (for the user) + +- Each reply needs explicit approval before posting — reply with which numbers to post (e.g. "post 1, 2, continued 2"). +- All drafts: 100–280 chars, no links, no emojis. Only Candidate 3 mentions TextStack production numbers (the daily max of 1). +- Don't post all at once — space them out to avoid spam-pattern detection. +- Top pick if posting only one: **Candidate 1 (@theo)** — freshest (~3h), highest reply visibility, on the day's biggest dev story (Gemini 3.5 Flash). diff --git a/docs/marketing/x-routine/2026-05-21.md b/docs/marketing/x-routine/2026-05-21.md new file mode 100644 index 00000000..9adeb773 --- /dev/null +++ b/docs/marketing/x-routine/2026-05-21.md @@ -0,0 +1,105 @@ +# X reply-game drafts — 2026-05-21 (Thursday) + +Generated by daily-x-reply-game scheduled task. +4 candidates selected + 0 new continued conversations (3 carry-overs noted). + +**STATUS — updated 2026-05-21 (posting session): 3 POSTED, 1 SKIPPED.** Candidates 1, 2, 4 posted with user approval. Candidate 3 (@NikkiSiapno) skipped — the full post turned out to be a paid-partnership ad (#AtlassianPartner #Ad); user chose skip (no pure promo). First posting session since May 15 — user flagged that May 18–21 drafts were never posted and the account decayed 11→6. + +**New follow-backs:** none. Notifications quiet — top item is still @PaulChen088 (May 11, promo reply w/ dev.to link — skip, stale carry-over). Nothing new since May 11. Account state: **101 following / 6 followers** (followers flat vs May 20; +1 following). + +**Feed note (7th session running):** Following tab still stale — top posts were Elon (4h, "Try Composer 2.5" — promo, not substantive), BridgeMind (23h), Eytan Seidman/Shopify (19h), Mitchell Hashimoto via Theo (14h), VS Code (16h). Nothing in the 1–3h window from the Following tab. Live search carried the harvest again: `ollama OR "local model" OR "local LLM" (inference OR prod OR deploy OR GPU OR CPU) -filter:replies`, a build-in-public variant, and a `"Claude Code" OR "AI coding"` variant. Direct tribe-profile scan (@simonw) confirmed the tribe is quiet too — his freshest post is 21h old. Standing suggestion from May 19/20 still applies: move high-signal accounts into a dedicated List so the Following tab stops being noise. + +**Visibility caveat:** the two freshest candidates (1 and 2) are small accounts caught minutes after posting — high freshness, low current reach. They are strong "prompt a reply from the OP" bets (small accounts almost always see and answer replies) but weak "reach a big audience" bets. Candidates 3 and 4 are larger accounts / older posts with wider reach. This is the usual fresh-vs-reach tradeoff when the target tribe isn't posting. + +--- + +## Candidate 1 — @faradaymachines *(top pick — freshest, on-topic)* — [POSTED 2026-05-21] + +**Source post:** https://x.com/faradaymachines/status/2057446794749972721 +**Posted:** ~minutes ago when scanned (2026-05-21 ~13:02 UTC) +**Excerpt:** "chrome canary shipping with local llm support is wild. not a toy — actual model running on your machine. browser-level local inference means soon every webpage will have an ai copilot without phoning home. firefox when?" + +**Draft reply:** +> Privacy win is real, but you inherit whatever model the browser ships — no swapping for the task. And once every tab wants inference at once, contention becomes the bottleneck, not capability. Does Canary sandbox that per-tab? + +**Why this adds value:** Doesn't just cheer the privacy angle — adds two concrete failure modes (no per-task model choice, resource contention across tabs) and ends on a specific technical question. No TextStack mention. + +--- + +## Candidate 2 — @noguchis *(local-LLM topic match — carries today's TextStack mention)* — [POSTED 2026-05-21] + +**Source post:** https://x.com/noguchis/status/2057446737145671742 +**Posted:** ~minutes ago when scanned (2026-05-21 ~13:02 UTC) — post [3/4] of the author's own thread +**Excerpt:** "[3/4] Axis 2 — split models by role. Opus = orchestrator. Sonnet = implementer. Local LLM (Ollama) = a gate placed *before* spending Claude tokens. Pre-screen the diff locally first → zero token cost, and sensitive data never leaves the machine." + +**Draft reply:** +> This gate pattern holds up in prod — we run gemma4:e2b on a CPU-only VPS as exactly that pre-screen layer: p95 ~20ms over a 63k-request load test. Key is keeping the local model small enough that the gate adds no latency — e4b buckled under concurrency, e2b held. + +**Why this adds value:** Validates the author's architecture with hard production numbers and adds the non-obvious gotcha (the local gate only works if the model is fast enough not to add latency — concretely the e4b→e2b lesson). This is the day's single TextStack-experience mention. + +**Caveat for review:** Small account, post is a mid-thread reply (~7 views when scanned) — low reach, but a thoughtful builder and a near-certain "OP replies" bet. If you'd rather not spend the daily TextStack mention on a low-reach post, skip this one — the routine caps it at 1/day and zero is fine; the session still stands at 3 strong candidates. + +--- + +## Candidate 3 — @NikkiSiapno *(dev educator — wider reach)* — [SKIPPED 2026-05-21 — paid-partnership ad] + +**Source post:** https://x.com/NikkiSiapno/status/2057440161315868673 +**Posted:** ~33 minutes ago when scanned +**Excerpt:** "AI coding agents are only as good as the context they have. Atlassian just solved that with Cursor in Jira. Context is what makes agents actually useful. Atlassian holds the full context of work: tickets, specs, decisions, and the teams behind it all..." + +**Draft reply:** +> Agreed context is the unlock, but volume isn't the same as signal. Jira is also where specs go stale and decisions get reversed three comments deep. Feeding an agent all of it can bury the relevant 5% — curation beats raw access. How's it scoping what's current? + +**Why this adds value:** Offers a genuine counter-perspective ("more context" ≠ "better context" — context rot / stale-ticket noise) rather than agreeing. Larger account = better reach than candidates 1–2. Ends on a question that invites her take. No TextStack mention. + +**SKIPPED — reason:** On opening the full post it carried a "Paid partnership" label and ended with "#AtlassianPartner #Ad" — it is a sponsored ad, not an organic post. The search-result excerpt cut this off. Replying under paid promo is off-strategy (routine rule: no pure promo), and a counter-take under someone's paid ad reads as needlessly combative. User confirmed skip. See campaign-tracker learning #7. + +--- + +## Candidate 4 — @hiouso *(build-in-public — real question, in 1–3h window)* — [POSTED 2026-05-21] + +**Source post:** https://x.com/hiouso/status/2057413037762453697 +**Posted:** ~2 hours ago when scanned +**Excerpt:** "founders/builders — would this actually be a good business idea or just peak build in public brainrot? a tool tracking founder metrics like: screenshots-to-users ratio, domains owned to paying customers, 'just one more feature' velocity, coffee to shipped features..." + +**Draft reply:** +> The funny ones (coffee:features, domains owned) are screenshot bait — fun, unpaid. The one that'd actually drive decisions is time-from-idea-to-first-paying-user. Problem: that's the metric founders least want to look at. Build for the painful number, not the shareable one. + +**Why this adds value:** Answers the author's actual question (good business idea or brainrot?) with a real product-direction take — separates vanity metrics from the one decision-driving metric, and names the adoption risk (fun-but-unpaid). Substantive, honest, peer-level. No TextStack mention. + +--- + +## Continued conversations + +No **new** external replies landed on Vasyl's prior replies since the May 20 reciprocity pass. Carry-overs still pending user action: + +- **@ollies0x (OLLIE)** — substantive GPU-vs-VPS counter; continued reply already drafted in `2026-05-20.md` → "Continued 2". Still pending. No re-draft today (would duplicate). +- **@TheWordWeaver_ (GhostPen)** — "thanks" + second tip drafted in `2026-05-20.md` → "Continued 1". Still pending. +- **@RahulGangwani24 (Rahul)** — latency-bound vs throughput-bound framing drafted in `2026-05-20.md` → "Continued 3". Still pending. +- **@PaulChen088** — promo reply with dev.to link; 7th carry-over. Recommend skip (stale, promotional, links out). + +If you want those continued replies posted, approve them against the `2026-05-20.md` file — they don't need re-drafting. + +--- + +## Tribe watchlist — propose adding + +- **@NikkiSiapno** — dev educator (Level Up Coding), large following, posts consistently on AI engineering / coding agents. Good Tier B candidate; would surface in future Following-tab scans if followed. *(Following changes profile state — user decision only; never auto-followed.)* + +--- + +## Posting result (2026-05-21) + +Posted with user approval, spaced out, all confirmed sent by X: + +- **Candidate 1 — @faradaymachines** — posted. https://x.com/faradaymachines/status/2057446794749972721 +- **Candidate 2 — @noguchis** — posted (carried the daily TextStack prod-numbers mention). https://x.com/noguchis/status/2057446737145671742 +- **Candidate 4 — @hiouso** — posted. https://x.com/hiouso/status/2057413037762453697 + +Skipped: + +- **Candidate 3 — @NikkiSiapno** — skipped, paid-partnership ad (see candidate's SKIPPED note above). + +Carry-over continued conversations (@ollies0x, @TheWordWeaver_, @RahulGangwani24 from `2026-05-20.md`) — still unposted; not actioned this session. + +**Follow-up:** check these three posts in 1–2 days for likes from non-followers / replies from the OP (the routine's success metric). Reciprocity on any responses gets picked up by the next session's `/with_replies` scan. diff --git a/docs/marketing/x-routine/2026-05-22.md b/docs/marketing/x-routine/2026-05-22.md new file mode 100644 index 00000000..5ac1c40b --- /dev/null +++ b/docs/marketing/x-routine/2026-05-22.md @@ -0,0 +1 @@ +Chrome MCP unavailable — skipped today's session diff --git a/docs/seo/audit-2026-05-14.md b/docs/seo/audit-2026-05-14.md new file mode 100644 index 00000000..7a763d83 --- /dev/null +++ b/docs/seo/audit-2026-05-14.md @@ -0,0 +1,144 @@ +# SEO Audit — 2026-05-14 + +Источники: GSC (textstack.app), GA4 (property 532821906), Ahrefs Site Audit (project 9661893). + +## TL;DR + +Технический фундамент в основном на месте (SSG работает, sitemap корректный, schema стоит, indexing strategy — noindex chapter pages, index только метадата — это правильно). Проблема в **двух местах**: (1) индексируемых метадата-страниц мало и часть из них тонкие — Google не считает их достойными индекса; (2) реального organic-трафика почти нет — 44 клика за 3 месяца, средняя позиция 59 (страница 6). До 50k clicks/month отсюда расти в 1000+ раз. Реалистичный путь — 12-24 месяца через scaling индексируемых страниц до 2K+ с сильным контентом и hub-страницы под информационный intent. + +## Текущее состояние (snapshot 2026-05-14) + +**GSC (3 месяца):** +- Total clicks: 44 +- Total impressions: 1.09K +- Average CTR: 4% +- Average position: 59 (страница 6+) +- Indexed pages: 327 +- Not indexed: 3.3K (из них 2,102 — noindex by design на chapter pages; ~890 — legacy URLs из периода поломки SEO, выгорят сами) +- Sitemap: index с 4 sub-sitemaps (books.xml, authors.xml, genres.xml, pages.xml), всего 391 URLs = реальная индексируемая поверхность сайта. 327/391 = **84% indexed coverage — здоровое соотношение**. +- Core Web Vitals: not enough usage data (трафика мало для Chrome UX Report) + +**Top queries (по impressions):** +- `textstack` — 3 clicks / 134 impr / pos 6.1 (бренд) +- `complete novels of james joyce` — 0 / 61 / pos 89.9 +- `barchester towers` — 0 / 29 / pos 67.4 +- `mary shelley books` — 0 / 17 / pos 72.9 +- Ещё ~350 author/book запросов на позициях 60-99 (страница 7-10) → 0 кликов + +**GA4 (28 дней):** +- Active users: 7.4K, New users: 7.8K, Event count: 38K +- Sessions: 7,493 total + - Direct: 7,222 (96.38%), avg engagement 1s → шум/боты/misattribution, не реальные люди + - Unassigned: 227 (3.03%), 5m 13s engagement → реальные пользователи + - Organic Search: 66 (0.88%), 33s engagement → ~2/день настоящего search-трафика + - Referral: 10, Organic Social: 2 +- Топ страницы по views: Clean Code (2.1K), Reader (1.7K), Vocabulary (1.3K), Clean Code Focus (1.1K), My Library (555) +- Bounce rate на топ-страницах 1.4%-33% (хорошо, контент не отталкивает) + +**GA4 anomaly — RESOLVED**: 23 апреля 2026 Direct sessions упали с 647 до 1, USA sessions — 671→0. Причина: сайт был добавлен в каталог, лиший ботов под видом Direct трафика. После удаления из каталога метрики нормализовались. **Implication**: 7,222 Direct sessions с 1s engagement за 28-дневное окно — почти все боты (период до 23 апреля). Реальный baseline после очистки: ~10-20 direct + 2 organic + 8 engaged Unassigned = **~30 реальных пользователей/день**. + +**Ahrefs Site Audit (12 May):** +- Health Score: 87/100 +- Crawled URLs: 2,482 (Internal 2,021, External 100, Resources 361) +- Errors: 445 (308 URLs affected), Warnings: 3,025, Notices: 7,979 + +**Top Ahrefs Errors:** +- Page has links to broken page — 171 страниц (+106 new) ↑ растёт быстро +- 404 page — 126 (+73 new) ↑ +- 4XX page — 126 (+73 new) ↑ +- Duplicate pages without canonical — 11 (+6 new) +- Page has no outgoing links — 11 (+6 new) + +**Top Ahrefs Warnings (низкоприоритетное по большей части):** +- Noindex page — 1,490 (by design — chapter pages, игнорируем) +- Low word count — 11 +- Meta description too long — 11 +- H1 tag missing/empty — 7 + +## Что работает + +- SSG отдаёт корректные HTML, sitemap гигиена в порядке (84% indexed из submitted) +- Indexing strategy `noindex` на chapter pages корректна — избегаем duplicate с Gutenberg +- Schema, breadcrumbs, FAQ enhancements активны (есть Breadcrumbs и FAQ секции в GSC nav) +- Контент на топ-страницах удерживает пользователей (bounce 1-33%) +- Brand search существует (`textstack` импрессий 134 за 3мес, медленно растёт) + +## Critical issues (P0) — блокеры роста + +### 1. Индексируемая поверхность слишком мала + +Sitemap = 391 URLs (вся индексируемая поверхность сайта), из них 327 проиндексировано = 84% coverage. Это здоровое соотношение, но **потолок 391 страниц = потолок ~3-5K clicks/month в лучшем случае**. Чтобы получать 50K clicks/mo, индексируемая поверхность должна быть 3000-5000 страниц. + +Этот фикс — про объём, не про техническую починку. Решается через Phase 1 (content scale): больше books published → больше editions/authors/genres страниц + добавление hub pages (themed lists, curated collections). + +85 Soft 404 — это страницы в индексируемой поверхности которые Google считает пустыми. Заполнить через SEO backfill или вернуть 410 = быстрый bump indexed count. + +### 2. Все ranking запросы на позиции 60+ + +`james joyce books` pos 75, `mary shelley books` pos 72, `complete novels of james joyce` pos 89. Это страница 8-9 Google. Google знает что страницы есть, но даёт им последний приоритет. Причины: +- Слабый authority сайта (новый домен, мало беклинков) +- Контент тоньше чем у конкурентов (Goodreads, OpenLibrary, Project Gutenberg) на тех же страницах +- Внутренняя перелинковка не дает достаточно signal + +**Фикс:** +- Усилить author overview pages: bio 300+ слов, список всех editions с descriptions, links to themes/genres, related authors. Author page для `james joyce` должен иметь больше депости чем стандартный книжный сайт. +- Internal linking: с каждой book editions ссылки на: author, genre, related editions, theme/topic. Сейчас 11 indexable страниц без outgoing links — починить. +- Беклинки в дальней перспективе. + +### 3. ~~Internal broken links~~ — LEGACY DEBT, не текущий баг + +SEO было сломано ~3 месяца назад (до фиксов). Ahrefs "New" колонка означает "URL впервые обнаружен в этом crawl", не "URL свежее сломался". Краулер просто переваривает бэклог старых URL. Health Score 87/100 подтверждает что текущее состояние сайта здоровое. Никаких действий — ждём пока Google и Ahrefs догонят реальность, числа естественно спадут. + +### 4. ~~GA4 anomaly 23 April~~ — RESOLVED + +Был добавлен в каталог, гнавший боты под видом Direct. После удаления из каталога — метрики нормализовались. Никаких действий не требуется. Урок на будущее: high Direct + 1s engagement = почти всегда бот-источник, проверять каталоги/листинги где сайт упомянут. + +## High-impact opportunities (P1) + +### 1. Hub pages под информационный intent + +Сейчас нет страниц-хабов которые ловят long-tail типа "best classic novels for software engineers", "free books about ethics in technology", "russian literature in english". Это контент который сам ранжируется, а потом линкует на book pages — двойной эффект. + +Кандидаты под dev/AI engineer аудиторию: +- "Books every software engineer should read" +- "Classic novels about AI and ethics" +- "Short classics you can finish in a weekend" +- "Free books for English language learners" +- "Russian classics in English translation" +- "Free books about war and humanity" +- "Best free SF/fantasy classics" (public domain) + +### 2. SEO backfill приоритизация по слабым страницам + +SEO backfill уже есть в инфраструктуре. Использовать его в первую очередь на: +- Soft 404 страницы (85) — заполнить контентом или 410 +- Crawled-not-indexed страницы — добавить уникальное value +- Authors с одной книгой (тонкие) +- Genres с малым числом editions + +### 3. Soft 404 фикс + +85 страниц Google считает пустыми. Скорее всего: authors без bio, genres без description, editions без relevance/themes. Эти страницы либо заполнить через SEO backfill, либо если они объективно не нужны — 410 Gone (не 404, а явное "удалено"). + +## Низкий приоритет (P2) + +- 11 Duplicate без canonical — проверить, скорее всего work/edition pair +- 11 Low word count — заполнить через SEO backfill +- 11 Meta description too long — обрезать в template (160 chars max) +- 7 H1 missing — проверить и поправить шаблон страницы +- 11 Pages without outgoing links — добавить internal links + +## Unknowns to investigate + +1. **23 April anomaly** — что произошло с USA трафиком и clean-code страницей. +2. **`?direct=1` URLs в индексе** — query string остался без canonical к чистому URL. Проверить что rel="canonical" на них стоит на чистый URL. +3. **Sitemap covers only 391 of 1200 indexable URLs** — почему не все индексируемые в sitemap? Возможно SSG-generated sitemap не подхватывает authors/genres. +4. **65 "Duplicate, Google chose different canonical than user"** — у нас canonical issue, надо посмотреть конкретные URL. +5. **Engagement time 11s avg в GA4** — это сильно искажено. Проверить что engagement events настроены (scroll, page_view с min duration). + +## Метрики для трекинга (еженедельно) + +- GSC: total clicks (target growth), impressions, indexed pages count, "Crawled-not-indexed" count +- GA4: Organic Search sessions, Engagement rate, top landing pages by organic +- Ahrefs: Health Score, Errors count (особенно 404 count), Referring domains +- Brand search: `textstack` impressions trend в GSC diff --git a/docs/seo/roadmap-50k.md b/docs/seo/roadmap-50k.md new file mode 100644 index 00000000..f1401a06 --- /dev/null +++ b/docs/seo/roadmap-50k.md @@ -0,0 +1,179 @@ +# SEO Roadmap → 50K Google clicks/month + +Стартовая точка (2026-05-14): 15 кликов/месяц. Цель: 50,000 кликов/месяц. +Множитель ~3,300×. Реалистичный таймлайн: 18-24 месяца при последовательном исполнении. + +Связанный документ: [audit-2026-05-14.md](./audit-2026-05-14.md). + +## Принципы + +1. **Контент = главный драйвер**, не беклинки. Беклинки усиливают то что уже хорошо ранжируется; они не вытащат тонкие страницы. +2. **Метадата only** — chapter pages остаются noindex (избегаем duplicate с Gutenberg). +3. **Long-tail прежде head terms** — `james joyce books` имеет тысячу конкурентов; `themes in joyce dubliners` имеет десятки. +4. **Hub pages > много отдельных страниц** — одна хорошая hub-страница ранжируется лучше чем 50 тонких author pages. +5. **Track impressions раньше чем clicks** — impressions это leading indicator, растёт за 2-3 мес до того как клики прорастут. +6. **Не покупать беклинки никогда** — в нише free books Penguin особенно агрессивен. + +## Траектория + +| Месяц | Indexed pages | Impressions/mo | Clicks/mo | +|-------|---------------|----------------|-----------| +| Now (2026-05) | 327 | ~360 | ~15 | +| +3 (2026-08) | 800 | 2K | 100 | +| +6 (2026-11) | 1,500 | 8K | 500 | +| +12 (2027-05) | 2,500 | 50K | 3,500 | +| +18 (2027-11) | 3,500 | 200K | 15,000 | +| +24 (2028-05) | 4,500 | 500K | 50,000 | + +Это **optimistic если исполнять последовательно**. Если выпадет 2-3 месяца — добавить +6 мес к каждой вехе. + +--- + +## Phase 0 — Текущие реальные блокеры (Now → +2 weeks) + +Технический фундамент в порядке: sitemap — корректный index с 4 sub-sitemaps (books/authors/genres/pages), 391 URLs всего, 84% indexed coverage. Health Score 87/100. Большинство Ahrefs/GSC ошибок — легаси-долг от старых поломок SEO (~3 мес назад), краулеры переваривают бэклог. Это выгорит само. + +Реально стоит сделать: + +- [x] ~~Разобраться с 23 April anomaly~~ — RESOLVED: каталог с ботами был удалён, метрики нормализовались. +- [x] ~~Sitemap coverage~~ — sitemap уже корректный index с 4 sub-sitemaps. Не требует фикса. +- [ ] **85 Soft 404 → заполнить или 410** — Google прямо сейчас считает эти страницы пустыми. Экспортнуть из GSC, классифицировать: (a) тонкая authors/genres → SEO backfill; (b) объективно удалённые → 410 Gone. Быстрый win, добавит ~50+ страниц в индекс. +- [ ] **GA4 engagement events** — добавить scroll и engagement события чтобы avg engagement time стал достоверным. Сейчас 11s искажает business reporting (не SEO напрямую). + +**Что НЕ делаем (легаси-долг, выгорит сам):** +- ~~SlugHistory + 301 для роста 404s~~ — Ahrefs "new" = переваривание бэклога, не свежие поломки. +- ~~65 duplicate canonical mismatch~~ — скорее всего старые URLs из периода поломок. +- ~~171 pages with broken internal links~~ — те же legacy URLs. +- ~~`?direct=1` canonical audit~~ — старые URLs, новые ссылки уже корректны. + +После Phase 0 главный рычаг это **Phase 1 (content scale)** — увеличить индексируемую поверхность с 391 до 3000-5000 страниц через publication + hub pages. Это и есть реальная работа на пути к 50K clicks/mo. + +## Phase 1 — Content base (Month 1-3) + +Цель: 800 indexed pages, ~100 clicks/mo. Это про объём + качество существующих метадата-страниц. + +- [ ] **Auto-publish 500+ books** через существующий pipeline. Приоритет: + - Public domain классика которая широко искомая (Project Gutenberg top 100) + - Books релевантные для dev/AI engineer аудитории + - Short classics (легко завершить, хорошие session metrics) +- [ ] **SEO backfill quality pass** на ВСЕ existing editions: + - Description (200+ слов, unique angle, не generic) + - Relevance (почему стоит читать сейчас) + - Themes (3-5 темы с расшифровкой) + - FAQs (5 вопросов с ответами 50+ слов каждый) + - SeoTitle + SeoDescription (60/160 chars, intent-matched) +- [ ] **Authors pages** — для всех authors с editions: + - Bio 300+ слов + - Список всех editions с teaser descriptions + - "Related authors" блок (3-5 ссылок) + - Schema.org Person + sameAs (Wikipedia, Wikidata) +- [ ] **Genres pages** — все genres: + - Description 300+ слов о жанре + - Top editions с teaser + - Related genres + - Sub-themes если есть +- [ ] **Internal linking pass** — каждый edition page должен иметь: + - Author link + - Genre link + - 3 "Related editions" (same author OR same genre OR same theme) + - Breadcrumb (уже есть, проверить) +- [ ] **Fix 171 pages with broken internal links** — найти source pages, удалить или обновить ссылки. + +## Phase 2 — Hub pages для информационного intent (Month 3-6) + +Цель: 1,500 indexed pages, ~500 clicks/mo. Это про новый тип трафика — информационный, не транзакционный. + +Hub pages сами ранжируются на long-tail и линкуют на book pages. Это даёт двойной эффект: hub получает clicks, book pages получают internal links. + +**Кандидаты на hub pages под dev/AI engineer аудиторию:** + +- [ ] Books every software engineer should read (curated 20-30) +- [ ] Classic novels about AI, ethics, and technology (10-15) +- [ ] Short classics you can finish in a weekend (15-20) +- [ ] Free books for English language learners (graded by level) +- [ ] Russian classics in English translation (15-20) +- [ ] Best free public domain SF and fantasy (20-30) +- [ ] Books about systems thinking and complexity (10-15) +- [ ] Classic philosophy free to read online (15-20) +- [ ] Free books about war and humanity (10-15) +- [ ] Books that shaped modern thought (curated essays) + +Каждая hub page = 800-1200 слов оригинального контента + список книг с teaser + internal links на каждую. Это контент типа "best of" listicle который Google и любит, и который часто получает беклинки естественно. + +**Технически**: либо как часть React app (`/en/lists/{slug}`), либо как admin-managed entity новой entity `Collection` с editions M2M. ADR нужен. + +## Phase 3 — Authority и беклинки (Month 6-12) + +Цель: 2,500 indexed pages, ~3,500 clicks/mo. Это про чтобы поднять existing pages на странице 1-2 Google. + +К этому моменту контент-машина работает; теперь добавляем authority signals. + +**Outreach каналы (ranked by ROI):** + +- [ ] **Hacker News пост** про техническую сторону TextStack (SSG, vocabulary SRS, Edge TTS WebSocket, extraction pipeline). Аудитория HN читает такое; даёт dofollow ссылку + долгий referral хвост. +- [ ] **Show HN запуск** — отдельно, когда будут метрики и история. +- [ ] **HARO / Qwoted / Help A B2B Writer** — отвечать на запросы по темам чтения, образования, language learning, productivity. Получать цитаты в крупных изданиях. ~1 hour/неделю, 2-3 backlink/мес ожидаемо. +- [ ] **Dev.to + Hashnode posts** — длинные технические статьи про building TextStack. Каждый пост = link to textstack.app. 5-10 постов даст 5-10 dofollow. +- [ ] **Reddit organic** — r/books, r/printSF, r/learnprogramming, r/languagelearning. Не self-promo, а полезные комменты с упоминанием когда уместно. 1-2 hour/неделю. +- [ ] **Product Hunt запуск** — когда будут полные feature set и story. Один-два дня большого трафика + долгий PH-backlink. +- [ ] **Guest posts на dev-блогах** — про deep reading, vocabulary, language learning. С естественной ссылкой на TextStack. +- [ ] **Listicles от себя** — "Best free reading apps for developers 2026" на vasyl.blog и Dev.to. Цитируют и линкуют другие блоги. + +**Чего НЕ делать:** +- Покупать беклинки (Penguin penalty) +- Mass outreach с шаблонами (не работает) +- PBNs (Private Blog Networks) +- Comment spam + +## Phase 4 — Scale (Month 12-24) + +Цель: 4,500 indexed pages, 50K clicks/mo. + +К этому моменту первые 3 фазы должны давать стабильный organic growth. Это фаза масштабирования того что работает. + +- [ ] **Chapter-by-chapter summaries** — UNIQUE контент для классики, которой мало в хорошем виде. Один edition = book overview + chapter summaries (каждый ~500 слов с критическим анализом). Это даёт unique value vs Goodreads/SparkNotes и оправдывает снятие noindex для chapter summary pages (но НЕ для самого текста). +- [ ] **Study guides** — для книг которые часто читают в школах/universities. Themes, characters, motifs, key quotes (короткие). +- [ ] **Multi-language** — добавить ru, uk если есть ресурс. Каждый язык = новая поверхность с минимумом конкурентов в нашей нише. +- [ ] **Audio TTS landing pages** — `Listen to {Book Title} in English (free)` — отдельный intent, мало конкурентов. +- [ ] **Vocabulary by book pages** — `Words from {Book Title}` — поверхность которой никто не покрывает. + +## Метрики и cadence + +**Еженедельно** (10 мин): +- GSC: total clicks, impressions, indexed pages count, "Crawled-not-indexed" count, average position +- Ahrefs: Health Score, Errors count (особенно 404s), Referring domains delta +- GA4: Organic Search sessions, engagement rate, top landing pages + +**Раз в 2 недели** (30 мин): +- Top 20 queries в GSC — есть ли движение position? +- Pages со средней позицией 11-20 — кандидаты на content refresh (добавить депости, internal links) +- CTR < 2% на impressions > 50 — переписывать title/description + +**Раз в месяц** (1-2 часа): +- Аудит auto-publish quality (sample 10 newly published, проверить descriptions/themes) +- Hub pages performance — какие ранжируются, какие не работают +- Backlinks audit — что появилось, какие из них качественные + +**Quarterly** (полдня): +- Полный Ahrefs audit re-run +- GSC review всех "Crawled-not-indexed" и Soft 404 +- Competitor analysis (Goodreads, Standard Ebooks, OpenLibrary, Z-library) — что у них ранжируется по target queries +- Strategy review: что работает быстрее ожиданий, что отстаёт + +## Социальные сети — отдельно + +Соцсети **не влияют на rankings напрямую**, но дают brand search (Google это засчитывает) и прямой трафик. + +- **Twitter** (@Rexetdeus): build in public, weekly metrics threads, feature launches, технические треды. Это работает для dev-аудитории. Уже идёт через `docs/marketing/x-routine/`. +- **Dev.to / Hashnode**: длинные технические статьи. Покрывается в Phase 3. +- **YouTube**: demos vocabulary SRS, reader UX, технические разборы. Низкий приоритет — нужно время на production. Откладываем до Phase 4. +- **BookTok / Bookstagram**: другая аудитория, другой контент-стиль. **Не делать** в обозримой перспективе — распыление ресурсов. + +## Что не делать + +- Не клепать AI-generated thin content без редактуры — Helpful Content Update убьёт. +- Не таргетировать коммерческие фразы (`buy ebook`) — intent не совпадает с бесплатной библиотекой. +- Не пытаться ранжироваться head terms (`free books`) — у Gutenberg DR 88, мы не пройдём. +- Не делать doorway pages под каждый ключевик — Google ловит давно. +- Не оптимизировать chapter pages — они noindex, и это правильно. +- Не пытаться "побыстрее" — Google sandbox для нового домена и низкий authority это органически 12+ месяцев работы. diff --git a/hackernews-launch-post.md b/hackernews-launch-post.md new file mode 100644 index 00000000..204847c1 --- /dev/null +++ b/hackernews-launch-post.md @@ -0,0 +1,174 @@ +# Show HN Launch Post — TextStack + +Submit at: https://news.ycombinator.com/submit + +Positioning anchor: README's hero — "Deep-reading tool for developers learning AI engineering. Tap an unknown term → context-aware explanation inline. A modern replacement for Kindle Word Wise and LingQ — built for technical books." + +Origin article (cite as "Why I built it" if asked): https://vasyl.blog/2026/04/21/i-quit-designing-data-intensive-applications-ddia-three-times-heres-what-i-build-on-the-fourth-try/ + +--- + +## URL field + +``` +https://textstack.app +``` + +## Title (pick one) + +**Recommended (personal hook — strongest for HN):** +``` +Show HN: I quit DDIA three times – built a reader that explains terms inline +``` + +Alternatives: +``` +Show HN: TextStack – Kindle Word Wise for technical books, but LLM-powered +Show HN: TextStack – Tap a term in a tech book, get a context-aware explanation +Show HN: A reader that knows "attention" means ML in an ML book and biology in a bio book +``` + +Title rules HN actually enforces: +- No "the best", no "amazing", no marketing fluff +- Lead with a specific claim, not a category +- Under 80 characters +- "Show HN:" prefix is required + +The recommended title works because (a) it's a personal admission HN respects, (b) DDIA is iconic enough that 80%+ of HN readers will recognize it instantly, and (c) "explains terms inline" is concrete. + +--- + +## First comment (post immediately after submitting) + +Hi HN, + +I quit *Designing Data-Intensive Applications* three times. Not because it was hard — I understood most of what was on the page. The problem was the rest: unfamiliar terms that broke the flow. Eventual consistency. Attention mechanism. B-tree. Writing each one down to look up later works until you have 40 of them and you've already lost the thread. + +Summarizing books away defeats the point. The only way to actually internalize something like DDIA or the Karpathy nanoGPT papers is to read them — but the friction has to go. + +So TextStack works like this: + +- Tap a term you don't know → 2-3 sentence LLM-powered explanation tied to the book's domain +- Tap "attention" in an ML textbook → ML meaning. Tap "attention" in a psychology book → cognitive meaning. Same word, different domain, different answer. +- Terms you didn't recognize go into a **capped weekly SRS queue** — no infinite backlog, no guilt spiral. Common words and the top 15K English words are filtered out, so only technical vocabulary surfaces. + +The thing this replaces is Kindle Word Wise (static dictionary, 2014, falls over on technical terms) and LingQ (built for natural languages, not technical ones). I tried both before building this. + +Stack: +- ASP.NET Core 10 (Minimal APIs, modular monolith) + PostgreSQL 16 + EF Core +- React 19 (web) + React Native / Expo 55 (mobile, Android live, iOS in TestFlight) +- OpenAI gpt-5-mini for explanations and translation; local Ollama qwen3:8b for SRS distractors +- Edge TTS over WebSocket for pronunciation (no API key, 200+ voices) +- Postgres FTS for search (Meilisearch swappable behind an interface) +- Puppeteer SSG for SEO pages — bot-detecting nginx routes crawlers to prerendered HTML, humans get the SPA +- OpenTelemetry → Aspire dashboard for traces +- Single docker compose, deploys via Cloudflare Tunnel + +Honest limitations: +- Curated technical corpus is small right now (~15-20 hand-picked titles plus 1500+ classics). Personal uploads (EPUB/PDF/FB2) are unlimited. +- Explanation latency is ~1-2s on first call (cached after). +- iOS app is TestFlight-only — App Store review pending. Android is live on Google Play. +- Source-available, not OSI open source — BUSL-1.1, auto-converts to Apache-2.0 in 2030. Self-hosting for personal/internal use is fully allowed; reselling as a hosted service is not. + +Try it without signing up: +https://textstack.app — sample chapters open without an account. Tap any unfamiliar term to see the explanation flow. + +Things I'd love feedback on: +1. The capped SRS queue is a strong opinion — most SRS tools push infinite Anki-style backlogs and people drown. Does the cap make sense or do you want to override it? +2. Is "tap a term" the right interaction on desktop, or should there be a hover-to-preview alternative? +3. Curated corpus: which technical books would you want most? I'm prioritizing DDIA, Karpathy/Stanford ML papers, type theory, distributed systems classics. What am I missing? + +Background article on the "why" if you want the longer version: https://vasyl.blog/2026/04/21/i-quit-designing-data-intensive-applications-ddia-three-times-heres-what-i-build-on-the-fourth-try/ + +— Vasyl (https://github.com/mrviduus, @Rexetdeus) + +--- + +## When to post + +**Best time for Show HN (US-centric audience):** +- Tuesday, Wednesday, or Thursday +- 8:00–10:00 AM Eastern Time (your local time, since you're in Toronto) +- NOT Monday morning (overflow from weekend), NOT Friday (lower attention) + +**Why timing matters:** Show HN posts need ~3-5 upvotes in the first 30-60 minutes to escape /newest and reach /show. If you post at 3 AM ET, it'll be buried before US devs wake up. + +--- + +## Pre-flight checklist + +Before hitting submit, verify: + +- [ ] textstack.app loads on first try (warm the cache) +- [ ] The "tap a term, get explanation" flow works on the chapter you'll link to +- [ ] No console errors on the demo page +- [ ] Sign-up via email/Google works end-to-end (test in incognito) +- [ ] Server has headroom — HN front page = 5-50K visitors in a few hours +- [ ] OpenAI billing has budget — explanations cost money per call, traffic spike could trigger a rate limit or 429 +- [ ] Rate limits are sane (you have nginx zones for `/api`, `/uploads`, `/translate`) +- [ ] Status page or graceful fallback if API goes down +- [ ] HN account has karma > 0 and is at least a few days old (new accounts get filtered) + +**OpenAI cost note**: at the worst case of 50K HN visitors × 5 explanations each × $0.0001/call, that's ~$25. Realistic case (5% try the demo, 3 explanations each) is ~$0.75. Fine, but watch the dashboard. + +--- + +## After posting + +**First hour is critical.** Do these in order: + +1. Drop the first comment (the body above) within 60 seconds of submitting. +2. Pin the submission tab open. Refresh `news.ycombinator.com/show` after 15 min — your post should appear there. +3. Reply to every comment within the first 2 hours. HN ranks posts partially on author engagement. +4. Don't ask friends to upvote — HN detects vote rings and will flag the post. +5. Do post the link in your own networks (Twitter @Rexetdeus, LinkedIn, vasyl.blog) — organic traffic is fine. + +**Common HN questions — prepared answers:** + +*"How is this different from Readwise / LingQ / Kindle Vocabulary Builder?"* +> Readwise focuses on highlight management — surfacing what you already marked, not explaining what you didn't understand. LingQ is built for natural-language learning, not technical vocabulary; it doesn't know what "attention mechanism" means in context. Kindle Word Wise is a 2014 dictionary lookup — fine for general English, useless for "B-tree" or "monad". TextStack's bet is that LLMs finally make context-aware explanations cheap enough to do per-term, per-book. + +*"Why BUSL and not just MIT?"* +> Because I want one paying customer by October. BUSL lets me self-host, lets you fork and modify, but blocks competitors from launching a hosted clone. In 2030 it auto-converts to Apache-2.0. If you don't agree with the license, the source is still on GitHub and you can read it. + +*"Why ASP.NET? Isn't C# weird for this?"* +> It's what I'm fastest in. .NET 10 + EF Core + a modular monolith with central package versioning makes the codebase cheap to maintain solo. The mobile and web layers are React, which is most of the user-facing complexity anyway. + +*"Have you tried [tool X]?"* +> Yes — I tried Kindle Word Wise (limited dictionary, no SRS), Anki + manual mining (the friction that broke me on DDIA), LingQ (wrong domain), and Readwise (different problem). The thing I couldn't find was "tap an unfamiliar term in a technical book and get a context-aware explanation". + +*"What's the cost to run this for me self-hosted?"* +> Postgres + .NET API + Worker fits in a $10-20/mo VPS for a single-user setup. The biggest variable cost is OpenAI API for the explanations — figure $0.10-0.50/month per active reader. Ollama for distractors is free and local. + +*"Will you add [feature]?"* +> The 6-month roadmap is in the README. Next up is iOS App Store, capped weekly SRS UX polish, and curating 15-20 AI-engineering titles (DDIA, ML papers). Beyond that, no commitments. + +*"Is the explanation accurate? LLMs hallucinate."* +> They do. Right now I'm relying on gpt-5-mini being good enough that the 2-3 sentence explanation is right >95% of the time on technical terms. Users can flag bad explanations; I haven't built that loop yet. If you spot a hallucination on the demo, tell me — that's a real research gap. + +--- + +## Backlinks angle (your secondary goal) + +A successful Show HN gives you: +- 1 dofollow link from `news.ycombinator.com` (high-authority domain) +- Often 5-20 secondary mentions from blogs and aggregators that scrape the HN front page (Hacker News Daily, hckrnews.com, indie newsletters) +- Twitter / LinkedIn pickups from HN regulars +- Often Lobste.rs cross-post (another high-authority dofollow) + +A flopped Show HN gives you: +- 1 nofollow link, no traffic, no backlinks +- And you can't repost the same title for 30 days + +Translation: pick the right time, warm the demo, prepare the canned answers above. You only get one shot with this title. + +--- + +## If it flops + +Show HN posts that don't catch fire in the first 90 minutes are usually dead. If that happens: + +- Don't repost the same title within 30 days — HN penalizes reposts. +- Wait 2-3 weeks, then submit a regular HN post (not "Show HN") with a different angle. Your DDIA blog article itself is HN-worthy as a standalone submission — title it something like *"I quit DDIA three times — here's what finally worked"* and link to vasyl.blog. The link to TextStack in the article does the work. +- Run Product Hunt launch first, then come back to HN with "We launched on PH last week, here's what we learned" — that's a fresh angle that usually performs. +- Lobste.rs is a smaller but higher-quality audience — needs an invite, but if you can get one, the developer-tools angle of TextStack will land well there. diff --git a/infra/scripts/__pycache__/pdf-cleanup-gate.cpython-314.pyc b/infra/scripts/__pycache__/pdf-cleanup-gate.cpython-314.pyc new file mode 100644 index 00000000..8645c66f Binary files /dev/null and b/infra/scripts/__pycache__/pdf-cleanup-gate.cpython-314.pyc differ diff --git a/lu421jrdb6.tmp b/lu421jrdb6.tmp new file mode 100644 index 00000000..c34dd20f Binary files /dev/null and b/lu421jrdb6.tmp differ diff --git a/lu47152jl.tmp b/lu47152jl.tmp new file mode 100644 index 00000000..6de3b59f Binary files /dev/null and b/lu47152jl.tmp differ diff --git a/lu971jrdrp.tmp b/lu971jrdrp.tmp new file mode 100644 index 00000000..f8c0d5fa Binary files /dev/null and b/lu971jrdrp.tmp differ diff --git a/publish-day-cheatsheet.md b/publish-day-cheatsheet.md new file mode 100644 index 00000000..337e2198 --- /dev/null +++ b/publish-day-cheatsheet.md @@ -0,0 +1,131 @@ +# Publish day cheat sheet — Monday, May 11, 2026 + +**Target publish time:** 08:30–08:35 ET (12:30–12:35 UTC) + +**Pre-flight check (do tonight, Sunday):** + +- [x] MCQ screenshot at `docs/marketing/srs-mcq-card.png` (extracted from your recording) +- [x] MCQ walkthrough gif at `docs/marketing/srs-mcq-demo.gif` (extracted from your recording, 37s, 2.0 MB) +- [ ] **Commit and push** the two new media files in `docs/marketing/` to `main` — required for the GitHub raw URLs in the article to resolve when Dev.to fetches them +- [ ] Claude Code SSH-prompt run, prod stats collected +- [ ] Final read-through of `devto-gemma4-article.md` done — any factual nits caught +- [ ] Phone alarm set for 08:00 ET + +--- + +## Sunday evening (tonight) — 20 min + +| When | Step | +|---|---| +| Now | `git add docs/marketing/srs-mcq-card.png docs/marketing/srs-mcq-demo.gif && git commit -m "docs: add MCQ vocab demo media for Gemma 4 challenge post" && git push` — needed before Dev.to can fetch the raw URLs | +| Now | Run the Claude Code SSH prompt (`claude-code-prod-stats-prompt.md`), save the report numbers | +| Tonight | Open the dev.to draft (see "Schedule the post tonight" below). Paste the article body — media is referenced by raw GitHub URL so no manual upload needed. Schedule for `2026-05-11 12:30 UTC` | + +### Schedule the post tonight (recommended) + +1. Open https://dev.to/new in a browser where you're logged in +2. Title: `I shipped local LLM features two months ago. Production never ran them once.` +3. Tags: `devchallenge` `gemmachallenge` `gemma` `ollama` +4. Cover image: click "Add a cover image" → "Generate image" → paste this prompt: + + ``` + Flat minimalist illustration: a server rack labeled "ollama" in the foreground, + its model slot drawn as an empty glass cylinder. On the right, a fresh model + container labeled "gemma4:e4b" sliding in. Faint code-trace lines glowing + underneath in soft teal and purple. Wide banner aspect ratio, no people, + no faces, dev.to-friendly clean style. + ``` + +5. Paste body from `devto-gemma4-article.md` (the markdown block between the triple-backticks under `## Article body (paste into Dev.to editor)`). Both image references already point to `raw.githubusercontent.com/mrviduus/textstack/main/docs/marketing/...` — Dev.to fetches them server-side at publish time +6. If you got a real distractor count from the SSH prompt, edit the "What's next" paragraph to mention it +7. Click ⋯ "More options" → set **Schedule for**: `2026-05-11 12:30 UTC` (= 08:30 ET) +8. Click **Schedule** +9. Verify the draft is now scheduled (status should read "Scheduled" not "Draft") +10. Open the post-preview URL once to confirm both images render — if either fails, the most likely cause is the commit not being pushed yet (`git status` to verify) + +If scheduling fails for any reason, fall back to: leave the draft saved, set a phone alarm for 08:00 ET, publish manually. + +--- + +## Monday morning — minute-by-minute + +| Time (ET) | Step | Reference | +|---|---|---| +| 08:00 | Wake, coffee, open laptop. Open: dev.to/dashboard, GitHub repo, Twitter, the social pack file | — | +| 08:25 | Verify scheduled post exists in dashboard. If not — publish manually NOW | — | +| 08:30 | Post auto-publishes. Copy the resulting URL. Refresh Dev.to to confirm it's live at https://dev.to/t/gemmachallenge/latest | — | +| 08:31 | React to your own post (👍 + 🦄 + 🔖) | DEV allows this | +| 08:32 | Open `social-media-pack.md`, find the URL placeholder in section 1 (Twitter), replace `[POST URL]` with the live URL | section 1 | +| 08:33–08:38 | Post the 5-tweet thread from `@Rexetdeus` | — | +| 08:38 | Pin the thread to your profile | — | +| 08:40 | Open `r/LocalLLaMA`, paste the post body from `social-media-pack.md` section 2. Submit | section 2 | +| 08:50 | r/selfhosted post (10-min gap to avoid cross-post detector) | section 3 | +| 09:00 | r/dotnet post | section 4 | +| 09:15 | HackerNews Show HN submission | section 5 | +| 09:30 | LinkedIn post | section 6 | +| 09:45 | Comment on the Gemma 4 Challenge launch post (Jess Lee thread) | section 7 | +| 10:00 | DM 5–10 friends from the personal-network template | section 8 | + +--- + +## First 4 hours — engagement watch + +| When | What | +|---|---| +| Continuous, every 15 min | Refresh dev.to post. Reply to every new comment within 10 min. Use templates from `comment-response-templates.md` if applicable | +| Continuous | Refresh r/LocalLLaMA + r/selfhosted + r/dotnet posts. Reply to every new comment within 15 min | +| Continuous | Refresh HN post. Reply within 10 min | +| 12:00 ET (lunch break in US East) | Check reaction count on the dev.to post. Compare against current `#gemmachallenge` Build leaderboard | +| 14:00 ET | Same check. If we're not yet in top-3 Build, do a second wave: ask 3 more personal-network contacts | + +--- + +## End of day — measure + plan + +By 18:00 ET, expect: + +- Dev.to post: **20–40 reactions** (target: top 3 in `#gemmachallenge` Build) +- Twitter thread: 100+ impressions, 5+ likes (this is small but normal for tech content) +- Reddit total karma: 50–200 across all 3 subs (depends heavily on subreddit reception) +- HN: either dead or trending (binary outcome — front page or invisible by 14:00 ET) +- GitHub stars: +5 to +20 (delta from where you start the day) + +Note your end-of-day numbers somewhere. They become the Day 0 baseline for the rest of the challenge. + +--- + +## Daily routine until May 24 (deadline) + +| Time | Daily task | Why | +|---|---|---| +| Morning | Refresh dev.to post, reply to overnight comments | Algorithm rewards reply velocity | +| Midday | Check `#gemmachallenge` Build leaderboard, note any new strong entries | Strategic awareness | +| Evening | If something worth riffing on appeared in the field, drop a substantive comment on it | Cross-pollinates readers | +| Daily | Note GitHub star delta | Tracks the secondary goal | + +--- + +## Failure modes to avoid + +- **Don't shadow-publish** — never publish at 02:00 ET to "get it out". Wasted boost window. +- **Don't reply with "thanks"** — reply with substance or skip. +- **Don't argue with bad-faith comments** — ignore. Real engagement comes from substantive replies, not flame wars. +- **Don't repost the same blurb across subs** — Reddit cross-post detector flags + each sub has its own tone (different bodies in `social-media-pack.md` for that reason). +- **Don't ask for stars/upvotes in comments** — only in the original post body or DMs. Asking in comments comes across as desperate. +- **Don't edit the post heavily after publish** — minor typo fixes OK; don't restructure or add new sections, you'll lose the engagement signal. +- **Don't promote Day 2+** — DEV's algorithm boost window is 24–48h. Don't post the same Reddit links again on Tuesday. + +--- + +## Quick links (have these in tabs Monday morning) + +- Dev.to dashboard: https://dev.to/dashboard +- The article (will be under your username): `https://dev.to/[your-username]` +- Challenge tag: https://dev.to/t/gemmachallenge/latest +- Build template URL (in case scheduled post failed and you need a fresh draft): https://dev.to/new?prefill=---%0Atitle%3A%20%0Apublished%3A%20%0Atags%3A%20devchallenge%2C%20gemmachallenge%2C%20gemma%0A--- +- Repo: https://github.com/mrviduus/textstack +- Live: https://textstack.app +- r/LocalLLaMA: https://www.reddit.com/r/LocalLLaMA/submit +- r/selfhosted: https://www.reddit.com/r/selfhosted/submit +- r/dotnet: https://www.reddit.com/r/dotnet/submit +- HN submit: https://news.ycombinator.com/submit diff --git a/release-notes-v0.1.0.md b/release-notes-v0.1.0.md new file mode 100644 index 00000000..1a612919 --- /dev/null +++ b/release-notes-v0.1.0.md @@ -0,0 +1,128 @@ +# v0.1.0 — First AGPL-3.0 release + +First tagged release of TextStack as a public open-source project under +**GNU Affero General Public License v3.0**. + +## Why this release + +This release marks two milestones: + +1. **TextStack is now real open-source software.** Earlier development + happened under a source-available license (BUSL-1.1). All code in v0.1.0 + and beyond is AGPL-3.0 — OSI-approved, listed in awesome-selfhosted + eligibility queue, and dual-licenseable for commercial customers. +2. **The product is feature-complete enough to use daily.** Reader, capped + weekly SRS, vocabulary builder, reading stats, EPUB/PDF/FB2 uploads, + offline mode, mobile apps — all working. See full changelog below for + the granular history. + +## Highlights + +### Reader — context-aware explanations +- Tap a technical term, get a 2-3 sentence LLM-powered explanation tied to + the book's domain (powered by OpenAI gpt-5-mini, swappable via + `ILlmService`). +- Tap "attention" in an ML book → ML meaning. Tap it in a psychology book → + cognitive meaning. Same word, different domain. +- Common words and the top 15K English words are filtered out — only + technical vocabulary surfaces into your queue. + +### Vocabulary SRS — capped weekly queue +- 5 stages: New → Recognition → Recall → Context cloze → Mastered. +- LLM-generated distractors and hints (Ollama qwen3:8b, runs locally). +- Review modes: multiple choice, classic flashcard. +- **Capped weekly queue** — no infinite Anki-style backlog, no guilt + spiral. + +### Library +- 1,500+ curated technical and classic books (starter corpus, self- + hostable). +- Personal uploads: EPUB / PDF / FB2 with auto-parsing, metadata + enrichment via local LLM. +- Reading progress sync, bookmarks, highlights, reading stats. + +### Mobile +- React Native (Expo 55). +- Android live on Google Play. +- iOS in TestFlight (App Store review pending). +- Offline-first, same UX as web. + +### Reading stats +- Heatmap calendar, streaks, daily/weekly goals. +- 20 achievements across milestone / streak / time / special categories. +- Session tracking with 30s heartbeat, 3min idle threshold. + +### Edge TTS — pronunciation without API keys +- 200+ voices via direct WebSocket to Microsoft Edge Read Aloud. +- Two-layer cache (server disk + client IndexedDB). +- 0.75× to 2.0× speed. + +## License + +This release is licensed under +[**GNU Affero General Public License v3.0**](https://github.com/mrviduus/textstack/blob/main/LICENSE). + +You may use, modify, and self-host TextStack freely for personal, +internal, or community purposes. If you modify TextStack and run it as a +network-accessible service, AGPL-3.0 requires you to publish your +modifications under the same license. + +**Commercial license available** for organizations that need to use +TextStack without AGPL obligations. Contact: mrviduus@gmail.com. + +## Tech stack + +- ASP.NET Core 10 (Minimal APIs, modular monolith) +- PostgreSQL 16 + EF Core (snake_case) +- React 19 (web), React Native / Expo 55 (mobile) +- OpenAI gpt-5-mini (explanations, translation) +- Ollama qwen3:8b (local distractor generation) +- Edge TTS (WebSocket, no API key) +- Puppeteer SSG for SEO pages +- Docker Compose, Cloudflare Tunnel, nginx + +## Self-hosting + +```bash +git clone https://github.com/mrviduus/textstack +cd textstack +git checkout v0.1.0 +cp .env.example .env # edit with real values +docker compose up --build +``` + +Full instructions in [README](https://github.com/mrviduus/textstack#readme). + +## Origin story + +I quit *Designing Data-Intensive Applications* three times. Not because it +was hard — I understood most of what was on the page. The problem was the +rest: unfamiliar terms that broke the flow. TextStack is the fourth +attempt — and the one that finally worked. + +Full story: [vasyl.blog/2026/04/21/...](https://vasyl.blog/2026/04/21/i-quit-designing-data-intensive-applications-ddia-three-times-heres-what-i-build-on-the-fourth-try/) + +## What's next + +- Submit to awesome-selfhosted (eligible after 2026-09-04 due to their + 4-month seasoning rule) +- iOS App Store release +- Capped weekly SRS queue UX polish +- Curated AI-engineering corpus (DDIA, ML papers, 15-20 titles) +- Goal: one paying customer by October 2026 + +## Try it + +- **Hosted demo**: https://textstack.app — sample chapters open without + signup +- **Source**: https://github.com/mrviduus/textstack +- **Mobile**: Google Play (Android), TestFlight (iOS) +- **Author**: [@Rexetdeus](https://twitter.com/Rexetdeus) / + [vasyl.blog](https://vasyl.blog) + +--- + +Star the repo if this resonates. That's the only signal I have right now +that I'm building the right thing. + +— Vasyl diff --git a/social-media-pack.md b/social-media-pack.md new file mode 100644 index 00000000..65c864a2 --- /dev/null +++ b/social-media-pack.md @@ -0,0 +1,261 @@ +# Social media pack — TextStack Gemma 4 launch + +All ready to copy-paste. Order of execution is in the cheat-sheet (`publish-day-cheatsheet.md`). Replace `[POST URL]` with the published Dev.to URL once you have it. + +GitHub-star ask is woven naturally into Twitter, Reddit, and LinkedIn. **Not** in HackerNews — HN downvotes star asks. The repo URL still gets visibility there. + +--- + +## 1. Twitter / X — thread (5 tweets) + +Post all 5 as a single thread from `@Rexetdeus`. Tweet 1 is the hook, tweet 5 has the CTAs. + +**Tweet 1/5** + +``` +3 GB used out of 30. The model that runs all my LLM features should be ~13 GB. + +I SSH'd in and ran `ollama list`. + +Empty. + +The container had been running for 60+ days without a single model pulled. Every distractor call had been silently failing. + +Post-mortem ↓ +``` + +**Tweet 2/5** + +``` +Production was running a hardcoded random-word fallback the whole time. The user sees distractors, just not LLM-generated ones — so I had no signal it was broken. + +The fix took 3 PRs and surfaced four production-only bugs that toy benchmarks would never have caught. +``` + +**Tweet 3/5** + +``` +Worst offender: floating Docker image tags. + +`image: ollama/ollama` froze at 0.22.x the day Docker pulled it. Two months later, upstream Ollama supports Gemma 4. My local "latest" doesn't. + +The lie: `docker image ls` shows the cached SHA, not whether the registry has moved. +``` + +**Tweet 4/5** + +``` +The other surface that bit me: the parser quietly dropped half of Gemma 4's output because it filters multi-word phrases. + +qwen3 (the model I'd planned for) emits single tokens by default. Gemma 4 prefers phrases. The parser was correct in spirit, hidden from the model. + +Defend at parse, every time. +``` + +**Tweet 5/5 (CTAs)** + +``` +Full write-up with real numbers (9.6 GB disk, 13 GiB RAM, 2.8s warm inference) on dev.to: +[POST URL] + +The product (open-source, AGPL-3.0, deployed): +https://github.com/mrviduus/textstack + +If the angle resonated, a ⭐ on the repo helps the next person abandoning DDIA find this thing. +``` + +--- + +## 2. Reddit — r/LocalLLaMA + +**Title:** + +``` +Production was empty for 2 months: lessons from actually shipping local Gemma 4 e4b on a $20 VPS +``` + +**Body:** + +``` +Two months ago I shipped local-LLM features in TextStack (open-source reader for technical books). Yesterday I checked production RAM and noticed the Ollama container was using 3 GB out of 30. The model should be 13. + +`ollama list`: empty. The container had been running 60+ days without a single pull. + +Wrote up the full post-mortem of the swap to Gemma 4 e4b — the four production-only bugs that surfaced (floating image tags, cgroup limits guessed for the wrong model, cold-load timeout vs API timeout, parser dropping multi-word output), the real numbers from a single-CPU 30 GB VPS (no GPU), and the cloud-vs-local cost split per task. + +Post: [POST URL] +Repo (AGPL-3.0): https://github.com/mrviduus/textstack +PRs that wired it in: #232 (model swap) / #233 (parser fix) / #234 (timeouts) + +Genuine ask: if anyone here has compared E4B vs E2B on technical-domain prompts, I'd value a sanity check on my "E4B is the smallest model that produces plausible distractors for terms like 'linearizability'" claim. That's the conclusion my testing reached but it's a small sample. +``` + +**Subreddit etiquette notes:** + +- Don't post to multiple subs within 60 minutes of each other (cross-post detector flag) +- Don't reply with "Thanks!" — reply with substance or skip +- If someone says "this is just an ad", reply with one of: a specific technical detail from the post, a screenshot of the bug log, or a "fair, here's the part I think you'd actually find useful: [link to specific section]" + +--- + +## 3. Reddit — r/selfhosted + +**Title:** + +``` +Open-source reader with local LLM-generated vocabulary cards (Gemma 4 e4b on a $20 VPS, no GPU) +``` + +**Body:** + +``` +Made an open-source AGPL-3.0 reader for finishing dense English technical books in your native language. Tap any term → context-aware translation that knows the book's domain. Words you don't recognize feed a capped weekly SRS queue with LLM-generated distractor questions. + +Two months ago I shipped the local-LLM side and immediately discovered the Ollama container had been silently empty since deploy — production was returning hardcoded random words instead of model output, and the user-facing failure mode was invisible. Just wrote up the post-mortem of swapping in Gemma 4 e4b that finally got the features working. + +Stack: docker-compose, .NET 10, Postgres 16, Ollama (Gemma 4 e4b for distractors/hints/explanations + book metadata enrichment), OpenAI gpt-4.1-nano for translation. Everything runs on a single-CPU 30 GB VPS (no GPU). Deploy is a `git pull` and `docker compose up`. + +Post: [POST URL] +Repo: https://github.com/mrviduus/textstack +Live deploy: https://textstack.app — sample chapters open without signup + +The post goes into specifics on what broke when I actually flipped local LLM on (floating image tags, cgroup limits, cold-load timeouts, parser quirks). Hopefully useful to anyone planning to go local-LLM in their self-hosted stack. + +Star helps if you'd use a tool like this — repo's open to PRs and the AGPL is real. +``` + +--- + +## 4. Reddit — r/dotnet + +**Title:** + +``` +ASP.NET Core 10 + Ollama (Gemma 4 e4b) for fire-and-forget LLM jobs — production lessons +``` + +**Body:** + +``` +Wrote up the integration story of plugging Ollama into an ASP.NET Core 10 worker for vocabulary-related LLM jobs (distractor generation, hint generation, book metadata enrichment). + +Architecture is fire-and-forget via `IServiceScopeFactory` — the API endpoint returns immediately and the LLM call happens in the background, with a fallback to a hardcoded random-word picker if Gemma fails or times out. Discovered after two months that the fallback path had been the only path running in production — silent fallback is the worst kind of bug. + +Specific .NET-relevant bits in the post: +- Why I use `IServiceScopeFactory` for the fire-and-forget pattern (avoid disposed scope bugs) +- Bumping `Ollama:TimeoutSeconds` config from 10s → 30s after seeing 60s cold-load times +- The C# parser snippet that silently dropped half Gemma's output because of a `!Contains(' ')` filter that worked for qwen3 but not Gemma 4 + +Post: [POST URL] +Repo (AGPL-3.0): https://github.com/mrviduus/textstack +PRs: #232 (model swap), #233 (parser fix), #234 (timeouts) + +Project is a self-hostable open-source reader for technical books (textstack.app). Stack: ASP.NET Core 10 / Postgres 16 / React 19 / docker-compose / Cloudflare Tunnel. + +Stars on the repo help — it's not a SaaS, just AGPL code I run for myself and anyone else who wants it. +``` + +--- + +## 5. HackerNews — Show HN + +**Title:** + +``` +Show HN: I rebuilt Kindle Word Wise on local Gemma 4 – production was empty for 2 months +``` + +**Body:** + +``` +TextStack is an open-source (AGPL-3.0) reader for developers who want to finish dense English technical books in their native language. Tap any term to get a context-aware translation that knows the book's domain ("attention" in an ML chapter gets the ML meaning, not the everyday one). Capped weekly SRS for terms you save. + +Local Gemma 4 e4b runs the vocabulary-related LLM jobs (distractors, hints, explanations, book metadata) on a single-CPU 30 GB VPS with no GPU. OpenAI gpt-4.1-nano stays for multilingual translation where local models are weak. + +Wrote up the swap to Gemma 4 e4b after discovering the Ollama container had been silently empty in production for 60+ days — the fallback path was a hardcoded random-word picker, indistinguishable to the user. Four production-only bugs surfaced when I flipped it on; the post has the diff for each. + +Live: https://textstack.app +Code: https://github.com/mrviduus/textstack +Post: [POST URL] + +Happy to answer questions on the .NET + Ollama stack, the model selection trade-off (E2B vs E4B vs 31B vs 26B MoE), or the SRS design. +``` + +**HN etiquette:** + +- Submit between 7–9 AM ET on a weekday (max chance of front-page traction window) +- Title must start with `Show HN:` and use a hyphen-dash, not em-dash +- No emoji, no marketing language, no star asks +- Reply to every comment within 30 min for the first 2 hours — HN's algorithm rewards engagement velocity +- If someone calls it ad-bait, the substance of the post-mortem story carries the rebuttal + +--- + +## 6. LinkedIn — single post + +Less casual than Twitter, more "professional retrospective" tone. Star CTA is appropriate here — LinkedIn devs respond well to "support open source". + +**Body:** + +``` +Two months of silent production failures, and what swapping to Gemma 4 surfaced about local LLM ops. + +I shipped local-LLM features in TextStack (an open-source reader for technical books) two months ago. Last week I noticed the production server was using 3 GB of RAM out of 30. The model that powers all those features should be 13. + +I SSH'd in. Ollama container: no models installed. The container had been running for 60+ days, every LLM call had been quietly hitting a hardcoded random-word fallback, and I had no signal because the failure mode was indistinguishable to users. + +The post-mortem covers the swap to Gemma 4 e4b that finally got the features running, plus the four production-only bugs that surfaced along the way: + +→ Floating Docker image tags lie about being "latest" +→ cgroup memory limits never re-evaluated when the model changed +→ Cold-load takes 60s, but my API timeout was 10s +→ The parser silently dropped half of Gemma 4's output because qwen3's behavior had hidden a constraint + +Real numbers from a $20/month consumer VPS (no GPU): 9.6 GB on disk, 13 GiB RAM resident, 2.8 s warm inference. + +Full write-up: [POST URL] + +TextStack is open-source (AGPL-3.0) at https://github.com/mrviduus/textstack — if you've ever shipped local LLM features in production, a star helps the next person discover this story before they hit the same bugs. + +#opensource #selfhosted #localllm #gemma4 #dotnet #llmops +``` + +--- + +## 7. Comment on the Gemma 4 Challenge launch post + +Drop on https://dev.to/devteam/join-the-gemma-4-challenge-3000-prize-pool-for-ten-winners-23in within 30 min of publishing the article. Jess Lee actively reads that thread. + +``` +Submitted my entry today: a post-mortem of swapping qwen3 → Gemma 4 e4b in production, after discovering the Ollama container had been silently empty for two months. Honest numbers from a $20 VPS (no GPU), and the four production bugs that surfaced when I actually flipped local LLM on. + +Build category — TextStack is the project: https://textstack.app + +Post: [POST URL] + +Thanks for organizing this challenge. The "intentional model selection" judging criterion was actually a useful prompt to write down why I picked E4B specifically vs. the other Gemma 4 variants — that's the kind of decision I usually don't document. +``` + +This works because: (a) it's substantive, not just "check out my post", (b) it credits the challenge for surfacing useful thinking, (c) Jess sees it. + +--- + +## 8. Personal-network ask (DM template) + +For sending to 5–15 friends/colleagues you actually know who care about local LLM, .NET, or open-source. Don't spray-paste — adjust to each person. + +``` +Hey [Name], + +Published a post-mortem on Dev.to about silently shipping local LLM features that hadn't worked for two months in production — the swap to Gemma 4 e4b that finally got them running, and four bugs that surfaced. + +Submitted to the Gemma 4 Challenge ($500 prize, judged on tie-break by reactions) so a 👍 + 🦄 on the post helps real money: +[POST URL] + +If you'd star the repo too, that's the higher-value signal for me long-term: +https://github.com/mrviduus/textstack + +No worries if you don't have time. Cheers. +``` + +Personal asks convert at 5-10× cold reach. Send within 60 min of publishing while the boost window is open. diff --git a/tests/TextStack.Extraction.Tests/FrontMatterFilterTests.cs b/tests/TextStack.Extraction.Tests/FrontMatterFilterTests.cs index 22cc9481..61031e2c 100644 --- a/tests/TextStack.Extraction.Tests/FrontMatterFilterTests.cs +++ b/tests/TextStack.Extraction.Tests/FrontMatterFilterTests.cs @@ -38,4 +38,122 @@ public void IsTableOfContents_DoesNotMatch_OtherTitles(string? title) { Assert.False(FrontMatterFilter.IsTableOfContents(title)); } + + // --- LooksLikeTableOfContentsBody --- + + [Fact] + public void LooksLikeTableOfContentsBody_LeaderDottedEntries_MatchEvenWithoutTitle() + { + var paragraphs = new[] + { + "Preface ............ xi", + "Chapter 1 Introduction .......... 1", + "Chapter 2 Foundation Models ..... 49", + "Chapter 3 Evaluation ............ 111", + "Chapter 4 Inference ............. 145", + "Chapter 5 Production ............ 193", + "Index ........................... 271", + }; + + Assert.True(FrontMatterFilter.LooksLikeTableOfContentsBody(paragraphs)); + } + + [Fact] + public void LooksLikeTableOfContentsBody_EllipsisLeader_IsDetected() + { + var paragraphs = new[] + { + "Preface … xi", + "Chapter 1 Introduction … 1", + "Chapter 2 Foundation Models … 49", + "Chapter 3 Evaluation … 111", + "Chapter 4 Inference … 145", + }; + + Assert.True(FrontMatterFilter.LooksLikeTableOfContentsBody(paragraphs)); + } + + [Fact] + public void LooksLikeTableOfContentsBody_PlainProse_DoesNotMatch() + { + var paragraphs = new[] + { + "This book is geared toward technical roles.", + "It is for AI engineers, ML engineers, data scientists, and others.", + "You can also benefit if you work in tool development.", + "We will cover use cases, evaluation, and production deployment.", + "Reading this front matter gives you the lay of the land.", + "Each chapter ends with summaries and references for further study.", + }; + + Assert.False(FrontMatterFilter.LooksLikeTableOfContentsBody(paragraphs)); + } + + [Fact] + public void LooksLikeTableOfContentsBody_TooShort_DoesNotMatch() + { + // Conservative: under 5 substantive paragraphs we abstain rather than + // risk dropping a real short chapter that happens to end with a page-number. + var tooShort = new[] { "Preface ............ xi", "Chapter 1 .......... 1" }; + Assert.False(FrontMatterFilter.LooksLikeTableOfContentsBody(tooShort)); + } + + [Fact] + public void LooksLikeTableOfContentsBody_NullOrEmpty_DoesNotMatch() + { + Assert.False(FrontMatterFilter.LooksLikeTableOfContentsBody(null)); + Assert.False(FrontMatterFilter.LooksLikeTableOfContentsBody(Array.Empty<string>())); + Assert.False(FrontMatterFilter.LooksLikeTableOfContentsBody(new[] { "", " " })); + } + + // --- IsKnownBackMatter --- + + [Theory] + // en + [InlineData("Index")] + [InlineData("INDEX")] + [InlineData("Glossary")] + [InlineData("Bibliography")] + [InlineData("References")] + [InlineData("Notes")] + [InlineData("Appendix")] + // ru + [InlineData("Индекс")] + [InlineData("Глоссарий")] + [InlineData("Приложение")] + // uk + [InlineData("Бібліографія")] + [InlineData("Додаток")] + // de + [InlineData("Glossar")] + [InlineData("Literaturverzeichnis")] + [InlineData("Anhang")] + // fr + [InlineData("Glossaire")] + [InlineData("Références")] + [InlineData("Annexe")] + // es + [InlineData("Bibliografía")] + [InlineData("Apéndice")] + // it + [InlineData("Glossario")] + [InlineData("Appendice")] + // pt + [InlineData("Glossário")] + [InlineData("Apêndice")] + public void IsKnownBackMatter_Matches_BackMatterTitles(string title) + { + Assert.True(FrontMatterFilter.IsKnownBackMatter(title)); + } + + [Theory] + [InlineData(null)] + [InlineData("")] + [InlineData("Chapter 1")] + [InlineData("Preface")] + [InlineData("Index of Refraction")] // not the back-matter sense + public void IsKnownBackMatter_DoesNotMatch_OtherTitles(string? title) + { + Assert.False(FrontMatterFilter.IsKnownBackMatter(title)); + } } diff --git a/tests/TextStack.Extraction.Tests/PdfPageTextExtractorTests.cs b/tests/TextStack.Extraction.Tests/PdfPageTextExtractorTests.cs index bddb8e1e..68a3bc62 100644 --- a/tests/TextStack.Extraction.Tests/PdfPageTextExtractorTests.cs +++ b/tests/TextStack.Extraction.Tests/PdfPageTextExtractorTests.cs @@ -7,13 +7,25 @@ namespace TextStack.Extraction.Tests; public class PdfPageTextExtractorTests { [Theory] + // Body bullets [InlineData("•")] [InlineData("●")] [InlineData("▪")] [InlineData("◦")] [InlineData("○")] + [InlineData("◆")] + [InlineData("◇")] + [InlineData("❖")] + // Triangles / pointers [InlineData("‣")] [InlineData("⁃")] + [InlineData("►")] + [InlineData("▶")] + [InlineData("➤")] + // Checkmarks & stars (modern textbook list markers) + [InlineData("★")] + [InlineData("✓")] + [InlineData("✗")] [InlineData("•You're")] // bullet glued to first word — still a list item public void IsBulletPrefix_RecognizesBulletGlyphs(string firstWord) { @@ -32,6 +44,36 @@ public void IsBulletPrefix_RejectsNonBulletStarts(string? firstWord) Assert.False(PdfPageTextExtractor.IsBulletPrefix(firstWord)); } + [Theory] + // Unicode "Symbol, Other" glyphs that AREN'T in our hardcoded BulletGlyphs + // set but should still be treated as list markers when they're the sole + // first "word" of a line. This covers custom dingbat-font bullets in + // modern textbooks without us having to hardcode every shape. + [InlineData("☑")] // U+2611 BALLOT BOX WITH CHECK + [InlineData("☐")] // U+2610 BALLOT BOX + [InlineData("✦")] // U+2726 BLACK FOUR POINTED STAR + [InlineData("✺")] // U+273A SIXTEEN POINTED ASTERISK + [InlineData("♦")] // U+2666 BLACK DIAMOND SUIT + [InlineData("☑Item")] // glued — symmetric with the "•You're" whitelist case + public void IsBulletPrefix_RecognizesUnicodeSymbolOther(string firstWord) + { + Assert.True(PdfPageTextExtractor.IsBulletPrefix(firstWord)); + } + + [Theory] + // Punctuation, Other (Po) — NOT bullets. Daggers / section signs / pilcrows + // are footnote markers, not paragraph starts. Verifies the deliberate + // narrower category check. + [InlineData("†")] // U+2020 DAGGER + [InlineData("‡")] // U+2021 DOUBLE DAGGER + [InlineData("§")] // U+00A7 SECTION SIGN + [InlineData("¶")] // U+00B6 PILCROW SIGN + [InlineData("※")] // U+203B REFERENCE MARK + public void IsBulletPrefix_RejectsFootnoteMarkers(string firstWord) + { + Assert.False(PdfPageTextExtractor.IsBulletPrefix(firstWord)); + } + [Fact] public void StartsWithIndent_EmptyLine_ReturnsFalse() { diff --git a/textstack-gemma4-submission-package.docx b/textstack-gemma4-submission-package.docx new file mode 100644 index 00000000..d35cdef2 Binary files /dev/null and b/textstack-gemma4-submission-package.docx differ diff --git a/textstack-gemma4-submission-package.pdf b/textstack-gemma4-submission-package.pdf new file mode 100644 index 00000000..6de3b59f Binary files /dev/null and b/textstack-gemma4-submission-package.pdf differ diff --git a/textstack-v1.0.0-build6-adi-nolf.apk b/textstack-v1.0.0-build6-adi-nolf.apk new file mode 100644 index 00000000..990596c9 Binary files /dev/null and b/textstack-v1.0.0-build6-adi-nolf.apk differ diff --git a/tweet-drafts.md b/tweet-drafts.md new file mode 100644 index 00000000..2e0622a2 --- /dev/null +++ b/tweet-drafts.md @@ -0,0 +1,127 @@ +# Twitter/X drafts for @Rexetdeus + +Three options, in order of recommendation. Pick one or post the thread. + +--- + +## Option A — single tweet (recommended for first announcement) + +``` +TextStack v0.1.0 is out 🚀 + +A reader for technical books with LLM-powered context-aware term explanations and a capped weekly SRS queue. Built it after I quit DDIA three times. + +Now AGPL-3.0, self-hostable. + +https://github.com/mrviduus/textstack/releases/tag/v0.1.0 +``` + +Char count: 251 / 280 — fits. + +Why it works: +- Specific, not vague ("a reader for technical books" not "a learning tool") +- Personal hook ("after I quit DDIA three times") — devs relate +- Clear CTA (GitHub release link) +- Mentions AGPL-3.0 — signals real open source + +--- + +## Option B — license-focused (use 1-2 days after Option A) + +``` +Just relicensed @textstack from BUSL-1.1 to AGPL-3.0. + +The "AWS forks my project" scenario felt 1% likely vs. real costs of being source-available: locked out of awesome-selfhosted, contributor friction, brand confusion. + +Wrote about why: [link to vasyl.blog post] +``` + +Char count: ~270. + +This one drives traffic to the blog post. Post after the blog post is published. + +--- + +## Option C — Show HN style thread (for a second wave) + +Tweet 1/4: +``` +1/ I quit Designing Data-Intensive Applications three times. + +Not because it's hard. Because of the unfamiliar terms — eventual consistency, attention mechanism, B-tree — that broke my flow until I lost the thread. + +So I built a reader that fixes that. v0.1.0 ships today. +``` + +Tweet 2/4: +``` +2/ Tap an unfamiliar term, get a 2-3 sentence LLM-powered explanation tied to the book's domain. + +"Attention" in an ML book → ML meaning. +"Attention" in a psych book → cognitive meaning. + +Same word, different domain, different answer. +``` + +Tweet 3/4: +``` +3/ Surfaced terms enter a *capped weekly* spaced repetition queue. + +No infinite Anki backlog. No guilt spiral. The cap forces curation. + +5 stages: New → Recognition → Recall → Context cloze → Mastered. +``` + +Tweet 4/4: +``` +4/ TextStack v0.1.0: +✓ Self-hosted, AGPL-3.0 +✓ Web + Android (iOS in TestFlight) +✓ 1500+ books in the public library +✓ Your own EPUB / PDF / FB2 uploads +✓ ASP.NET Core + React + Expo + +Try without signup: https://textstack.app +Source: https://github.com/mrviduus/textstack +``` + +This thread is HN-bait — could be screenshot-cross-posted later. + +--- + +## Hashtags strategy + +Don't add hashtags to Option A — they dilute the reach on X's algorithm now. For Option B and C, max 1-2 hashtags AT THE END: + +`#opensource #buildinpublic` + +(Not in the middle of the tweet, not more than 2.) + +--- + +## When to post + +- **Option A**: post within 24h of v0.1.0 release while it's fresh +- **Option B**: 1-2 days after blog post is published on vasyl.blog +- **Option C**: 3-5 days after Option A, on a Tuesday/Wednesday morning ET when dev Twitter is most active + +--- + +## Mention strategy + +People to consider tagging in replies (not in the main tweet — looks spammy): + +- @plausiblehq — if they engage with Option B (you mentioned them in the blog post about AGPL) +- @PostHog, @cal_com — same reasoning +- Indie hackers you know + +Don't tag celebrities or big accounts you don't know — looks like begging. + +--- + +## After posting Option A + +- Pin the tweet to your profile +- Reply with the link to the blog post when it's published +- Reply with a short demo GIF if you can record one (use the `gif_creator` tool from Claude in Chrome later) +- DM 5-10 indie devs you know personally with a "would love your feedback" note linking the tweet