Skip to content

feat: receipt scan → chat summary + semantic emoji matching#75

Open
alex-mextner wants to merge 8 commits into
mainfrom
claude/receipt-scan-chat-results-F6kbL
Open

feat: receipt scan → chat summary + semantic emoji matching#75
alex-mextner wants to merge 8 commits into
mainfrom
claude/receipt-scan-chat-results-F6kbL

Conversation

@alex-mextner
Copy link
Copy Markdown
Owner

@alex-mextner alex-mextner commented Apr 10, 2026

Summary

  • Receipt scan → chat: When a receipt is confirmed via the Mini App, the bot now sends a formatted summary message to the group chat (same format as the bot photo handler flow).
  • Shared summary builder: buildReceiptSummaryMessage in summary-message.ts — single source of truth for both flows (bot photo handler + Mini App confirm). Expandable blockquote with item details, per-category totals, grand total.
  • Semantic category emoji matching: When a user-defined category name doesn't match any key in CATEGORY_EMOJIS, the resolver calls HuggingFace sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 to find the closest known category by cosine similarity (threshold 0.5). Results are cached in a new category_emoji_cache SQLite table — each unique category hits HF at most once.
  • Expanded emoji map: Replaced the old map (had item-level entries like Фрукты, Завтрак, Экскурсия) with ~85 realistic budget-line categories (Каршеринг, Страховка, Мебель, Ветеринар, etc.), both Russian and English keys.
  • Diagnostic script: scripts/test-emoji-resolution.ts — runs against real prod DB, resolves every category, and outputs a full table comparing actual results vs an expectations map. Usage: bun run scripts/test-emoji-resolution.ts [--dry-run]

Key files

File What changed
src/services/receipt/summary-message.ts New shared async builder (was sync)
src/services/receipt/category-emoji-resolver.ts New: exact → cache → HF → default pipeline
src/config/category-emojis.ts Expanded map, exported DEFAULT_CATEGORY_EMOJI
src/database/schema.ts Migration 045: category_emoji_cache table
src/database/repositories/category-emoji-cache.repository.ts New: get/set with upsert
src/web/miniapp-api.ts /api/receipt/confirm sends summary to chat
src/bot/services/expense-saver.ts Scan button after receipt save
scripts/test-emoji-resolution.ts Diagnostic: expectations table + full report

Test plan

  • bun run type-check — clean
  • bun run lint — clean
  • bun run test — 1816/1816 passing
  • Run bun run scripts/test-emoji-resolution.ts on prod to verify HF matching against real categories
  • Confirm a receipt via Mini App → verify summary message appears in group chat
  • Check that unknown category (not in map) gets sensible emoji via HF on first use, then cached on subsequent uses

https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik

claude added 5 commits April 3, 2026 18:33
When a user scans and confirms a receipt through the Mini App, a brief
summary is now sent to the group chat showing categories, amounts, and
total — matching the behavior of the bot photo handler flow.

https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
…m list

- Extract shared buildReceiptSummaryMessage helper used by both receipt
  paths (bot photo handler and Mini App confirm endpoint) — eliminates
  divergent notification code between the two flows
- Add expandable <blockquote> with full item list (name, qty×price=total)
  to the chat message
- Replace fire-and-forget notification with awaited call wrapped in
  try/catch so errors surface synchronously without failing confirm
- Add pluralize for "позиция/позиции/позиций"
- Remove duplicate getCategoryEmoji from receipt-summarizer.ts; the
  universal function in config/category-emojis.ts is now used everywhere
- Expand universal CATEGORY_EMOJIS map with ~100 additional emojis
  covering food, transport, health, finance, work, and more
- Add 16 unit tests covering pluralize, aggregation, truncation,
  HTML escaping, and emoji fallback

https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
formatSummaryMessage was a second implementation of the same summary
format that buildReceiptSummaryMessage already produces. Flatten the
ReceiptSummary into ReceiptSummaryItem[] and delegate to the shared
helper so the AI-correction flow produces the same output as the bot
photo handler and the Mini App confirm endpoint.

ReceiptSummary items only carry name+total (qty/price are lost when the
LLM round-trips the JSON during correction), so ReceiptSummaryItem now
has qty/price as optional fields and the item line omits the
"qty × price =" prefix when they are absent.

Drops the now-unused itemCount parameter at all three call sites.
User-defined expense category names rarely land on the exact keys in
CATEGORY_EMOJIS, so receipts fell back to the default 💰 for most
categories. Add a two-step resolver: exact match first, then a HF
multilingual sentence-similarity lookup against the known keys with a
0.5 cosine threshold. Results persist in a new category_emoji_cache
table so each unique category hits HF at most once.

Also expand the category emoji map to cover realistic budget lines
(Каршеринг, Страховка, Мебель, Ветеринар) and drop the weird item-level
entries (Фрукты, Завтрак, Экскурсия) that don't belong in a category list.

buildReceiptSummaryMessage and formatSummaryMessage become async since
resolution may hit HF; all five call sites now await them.

https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
Script runs against real prod DB categories, resolves each via exact
match or HF semantic similarity, and outputs a full table comparing
actual results against an expectations map. Useful for tuning the
CATEGORY_EMOJIS list and the similarity threshold.

Usage: bun run scripts/test-emoji-resolution.ts [--dry-run]

https://claude.ai/code/session_014kP9qmxnhAa8HJ8P93Krik
@alex-mextner alex-mextner changed the title feat: send receipt scan summary to chat when confirmed via Mini App feat: receipt scan → chat summary + semantic emoji matching Apr 12, 2026
- Separate emoji per pet species (🐈 cats, 🐕‍🦺 dogs, 🐇 rabbits, etc.)
- Аренда → Квартира, Быт, Apartment
- Экскурсии/Экскурсия → 🧭, Билеты → 🎫
- Кредитка → 💳, Резерв → 📈, Личные → 👤
- Другое/Прочее/Разное/Затраты → 💸 (money-with-wings, less boxy)

Coverage on real prod data: 20/36 → 31/36 exact categories.
Remaining 5 (people names, custom terms) are LLM resolver territory.
…ache

- Replaces HF sentence-similarity (quality was poor on real prod data) with
  Qwen/Qwen3-32B via Cerebras chatCompletion. Model picks one of ~250
  allowed keys (CATEGORY_EMOJIS + virtual keys for people & fallback).
- Resolver now reads the group's custom_prompt so mappings like
  "Ку → коммуналка", "Алекс → __person_man__", "Кис → кот" work.
- Cache keyed on (group_id, category) — same category name in different
  groups can resolve to different emojis. Migration 045 rewritten in place
  (composite PK, not deployed yet). No invalidation on /prompt change:
  accepting stale emoji for the edge case of mid-life prompt edits.
- Virtual keys: __person_man__/woman/boy/girl/baby for names,
  __fallback__ → 💰 when nothing fits.
- Thread groupId through buildReceiptSummaryMessage and formatSummaryMessage,
  pass from all 5 callsites (photo-processor, message.handler, callback.handler,
  expense-saver, miniapp-api).
Two pre-existing issues that combined to freeze the machine during the
full test suite:

1. @huggingface/inference v4 retries 503 responses with unbounded async
   recursion — `return innerRequest(...)` in utils/request.js has no
   attempt counter. The 503-test mocked fetch to always return 503, so
   the SDK recursed until the process was killed by OOM. Fix: make the
   mock return 503 once, then throw, so the retry path terminates.

2. startTempImageCleanup spawns a 5-minute setInterval with no .unref(),
   keeping bun's event loop alive after tests ran. Added .unref() so
   test files terminate cleanly and prod still works (process exit ends
   the timer anyway).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants