feat: receipt scan → chat summary + semantic emoji matching#75
Open
alex-mextner wants to merge 8 commits into
Open
feat: receipt scan → chat summary + semantic emoji matching#75alex-mextner wants to merge 8 commits into
alex-mextner wants to merge 8 commits into
Conversation
When a user scans and confirms a receipt through the Mini App, a brief summary is now sent to the group chat showing categories, amounts, and total — matching the behavior of the bot photo handler flow. https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
…m list - Extract shared buildReceiptSummaryMessage helper used by both receipt paths (bot photo handler and Mini App confirm endpoint) — eliminates divergent notification code between the two flows - Add expandable <blockquote> with full item list (name, qty×price=total) to the chat message - Replace fire-and-forget notification with awaited call wrapped in try/catch so errors surface synchronously without failing confirm - Add pluralize for "позиция/позиции/позиций" - Remove duplicate getCategoryEmoji from receipt-summarizer.ts; the universal function in config/category-emojis.ts is now used everywhere - Expand universal CATEGORY_EMOJIS map with ~100 additional emojis covering food, transport, health, finance, work, and more - Add 16 unit tests covering pluralize, aggregation, truncation, HTML escaping, and emoji fallback https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
formatSummaryMessage was a second implementation of the same summary format that buildReceiptSummaryMessage already produces. Flatten the ReceiptSummary into ReceiptSummaryItem[] and delegate to the shared helper so the AI-correction flow produces the same output as the bot photo handler and the Mini App confirm endpoint. ReceiptSummary items only carry name+total (qty/price are lost when the LLM round-trips the JSON during correction), so ReceiptSummaryItem now has qty/price as optional fields and the item line omits the "qty × price =" prefix when they are absent. Drops the now-unused itemCount parameter at all three call sites.
User-defined expense category names rarely land on the exact keys in CATEGORY_EMOJIS, so receipts fell back to the default 💰 for most categories. Add a two-step resolver: exact match first, then a HF multilingual sentence-similarity lookup against the known keys with a 0.5 cosine threshold. Results persist in a new category_emoji_cache table so each unique category hits HF at most once. Also expand the category emoji map to cover realistic budget lines (Каршеринг, Страховка, Мебель, Ветеринар) and drop the weird item-level entries (Фрукты, Завтрак, Экскурсия) that don't belong in a category list. buildReceiptSummaryMessage and formatSummaryMessage become async since resolution may hit HF; all five call sites now await them. https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
Script runs against real prod DB categories, resolves each via exact match or HF semantic similarity, and outputs a full table comparing actual results against an expectations map. Useful for tuning the CATEGORY_EMOJIS list and the similarity threshold. Usage: bun run scripts/test-emoji-resolution.ts [--dry-run] https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
- Separate emoji per pet species (🐈 cats, 🐕🦺 dogs, 🐇 rabbits, etc.) - Аренда → Квартира, Быт, Apartment - Экскурсии/Экскурсия → 🧭, Билеты → 🎫 - Кредитка → 💳, Резерв → 📈, Личные → 👤 - Другое/Прочее/Разное/Затраты → 💸 (money-with-wings, less boxy) Coverage on real prod data: 20/36 → 31/36 exact categories. Remaining 5 (people names, custom terms) are LLM resolver territory.
…ache - Replaces HF sentence-similarity (quality was poor on real prod data) with Qwen/Qwen3-32B via Cerebras chatCompletion. Model picks one of ~250 allowed keys (CATEGORY_EMOJIS + virtual keys for people & fallback). - Resolver now reads the group's custom_prompt so mappings like "Ку → коммуналка", "Алекс → __person_man__", "Кис → кот" work. - Cache keyed on (group_id, category) — same category name in different groups can resolve to different emojis. Migration 045 rewritten in place (composite PK, not deployed yet). No invalidation on /prompt change: accepting stale emoji for the edge case of mid-life prompt edits. - Virtual keys: __person_man__/woman/boy/girl/baby for names, __fallback__ → 💰 when nothing fits. - Thread groupId through buildReceiptSummaryMessage and formatSummaryMessage, pass from all 5 callsites (photo-processor, message.handler, callback.handler, expense-saver, miniapp-api).
Two pre-existing issues that combined to freeze the machine during the full test suite: 1. @huggingface/inference v4 retries 503 responses with unbounded async recursion — `return innerRequest(...)` in utils/request.js has no attempt counter. The 503-test mocked fetch to always return 503, so the SDK recursed until the process was killed by OOM. Fix: make the mock return 503 once, then throw, so the retry path terminates. 2. startTempImageCleanup spawns a 5-minute setInterval with no .unref(), keeping bun's event loop alive after tests ran. Added .unref() so test files terminate cleanly and prod still works (process exit ends the timer anyway).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
buildReceiptSummaryMessageinsummary-message.ts— single source of truth for both flows (bot photo handler + Mini App confirm). Expandable blockquote with item details, per-category totals, grand total.CATEGORY_EMOJIS, the resolver calls HuggingFacesentence-transformers/paraphrase-multilingual-MiniLM-L12-v2to find the closest known category by cosine similarity (threshold 0.5). Results are cached in a newcategory_emoji_cacheSQLite table — each unique category hits HF at most once.scripts/test-emoji-resolution.ts— runs against real prod DB, resolves every category, and outputs a full table comparing actual results vs an expectations map. Usage:bun run scripts/test-emoji-resolution.ts [--dry-run]Key files
src/services/receipt/summary-message.tssrc/services/receipt/category-emoji-resolver.tssrc/config/category-emojis.tsDEFAULT_CATEGORY_EMOJIsrc/database/schema.tscategory_emoji_cachetablesrc/database/repositories/category-emoji-cache.repository.tssrc/web/miniapp-api.ts/api/receipt/confirmsends summary to chatsrc/bot/services/expense-saver.tsscripts/test-emoji-resolution.tsTest plan
bun run type-check— cleanbun run lint— cleanbun run test— 1816/1816 passingbun run scripts/test-emoji-resolution.tson prod to verify HF matching against real categorieshttps://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik