feat: enable OpenAI prompt caching for conversation processing#4664
feat: enable OpenAI prompt caching for conversation processing#4664
Conversation
Add _build_conversation_context() shared helper and restructure get_transcript_structure() + extract_action_items() to use two system messages. The first (context) message is byte-identical across both calls, enabling OpenAI's automatic prompt caching for up to 50% input token savings. Also unifies calendar context to always include meeting_link. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 tests covering _build_conversation_context() determinism, calendar field inclusion (meeting_link, notes, participants), ordering guarantees, and edge cases (empty inputs, missing fields). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a smart optimization for OpenAI prompt caching by refactoring how conversation context is built. The new _build_conversation_context helper and the two-system-message approach are excellent changes that should improve performance and reduce costs. The added unit tests are thorough and cover many edge cases. I found one critical issue with how the calendar context string is constructed, which could lead to non-deterministic output and defeat the purpose of caching. My review includes a suggested fix for this.
|
Can you run integration tests before and after to make sure the new changes help with prompt caching? |
|
Integration test results — prompt caching verified: What this means:
Both calls produce correct outputs — structure gives a summary, action items gives a bullet list. by AI for @beastoin |
|
@beastoin Reviewed the diff; no issues found and the shared context builder/prompt split looks consistent between structure and action-item paths. I ran by AI for @beastoin |
|
Prompt cache baseline (Feb 1-7, pre-deploy): Overall cache hit rate: 22.0%. Daily trend: 19.9-26.1% (Feb 7 highest at 26.1%, could be natural variance). By model:
Will re-check 48h post-deploy to measure the delta against this baseline. by AI for @beastoin |
|
lgtm |
|
Deployed to prod — Monitoring plan:
by AI for @beastoin |
|
Please deploy Pusher too; these changes were also used by Pusher. |
|
Pusher deploy triggered — run by AI for @beastoin |
Post-deploy monitoring results (PR #4664 + #4670)Hour-by-hour comparison (14:00–22:00 UTC, today vs yesterday):
Every single hour is cheaper than yesterday's equivalent. 🤖 Generated with Claude Code |
Summary
_build_conversation_context()helper that produces byte-identical context strings for the same inputsget_transcript_structure()andextract_action_items()to use two system messages: shared context prefix + task-specific instructionsmeeting_link(was previously missing inextract_action_items)Closes #4654
How it works
OpenAI automatically caches the KV computation for message prefixes that are byte-identical across API calls. By moving the conversation content (transcript + photos + calendar) into a dedicated first system message, both
get_transcript_structureandextract_action_itemsshare the same prefix. The second call gets a cache hit on the expensive transcript tokens.Changes
backend/utils/llm/conversation_processing.py— new_build_conversation_context()helper, refactored both prompt functionsbackend/tests/unit/test_prompt_caching.py— 13 unit tests for determinism, calendar field coverage, orderingbackend/test.sh— registered new test fileTest plan
bash backend/test.sh— all tests pass (13 new + existing suite)by AI for @beastoin