Skip to content

docs: add prompt caching FAQ (RTK does not break cache)#560

Open
FlorianBruniaux wants to merge 2 commits intodevelopfrom
docs/prompt-caching-faq
Open

docs: add prompt caching FAQ (RTK does not break cache)#560
FlorianBruniaux wants to merge 2 commits intodevelopfrom
docs/prompt-caching-faq

Conversation

@FlorianBruniaux
Copy link
Collaborator

Summary

Closes a recurring question (raised publicly by Mathieu GRENIER): does RTK break Claude's prompt cache and make input 10x more expensive?

  • Add docs/PROMPT_CACHING.md: canonical reference with a component table (what RTK touches vs what it doesn't), a turn-by-turn cache example, cost comparison based on the README's 30-min session numbers, and rtk gain as self-serve proof
  • Add section at top of docs/TROUBLESHOOTING.md: Problem/Short Answer/Details pattern, links to full doc
  • Add FAQ blockquote in README.md after "How It Works": one-liner answer with link for scanners
  • Fix ARCHITECTURE.md module count (60 → 64) to pass pre-push validation

Test plan

  • docs/PROMPT_CACHING.md renders correctly on GitHub (tables, code blocks)
  • docs/TROUBLESHOOTING.md new section appears before the first existing problem
  • README.md blockquote renders as expected
  • All links between the three files resolve correctly

🤖 Generated with Claude Code

FlorianBruniaux and others added 2 commits March 13, 2026 10:47
Users (e.g., Mathieu GRENIER) asked whether RTK breaks Claude's prompt
cache and makes input 10x more expensive. Short answer: no. This commit
documents why, with a table, a turn-by-turn example, and cost math.

- Add docs/PROMPT_CACHING.md: canonical reference with component table,
  cache turn-by-turn walkthrough, cost comparison (118k vs 23.9k tokens),
  and rtk gain as self-serve proof point
- Add TROUBLESHOOTING.md section at top: Problem/Short Answer/Details
  following existing pattern, links to full doc
- Add README.md FAQ blockquote after "How It Works": one-liner answer
  with link for anyone scanning

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Matches main.rs which already had 64 modules. Fixes pre-push validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Copilot AI review requested due to automatic review settings March 13, 2026 09:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated FAQ-style explanation to address whether RTK interferes with Claude prompt caching, and cross-links it from the main docs and README so the answer is easy to find.

Changes:

  • Added a new canonical doc (docs/PROMPT_CACHING.md) explaining what RTK changes, how caching works turn-by-turn, and how to verify via rtk gain.
  • Added a new top troubleshooting entry linking to the full prompt-caching explanation.
  • Added a one-line README FAQ blurb near “How It Works”, plus corrected the architecture module total.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
docs/TROUBLESHOOTING.md Adds a new top “Problem/Short Answer/Details” entry and links to the full prompt caching doc.
docs/PROMPT_CACHING.md New standalone explanation of prompt caching behavior and RTK’s impact, with examples and a cost comparison.
README.md Adds a prominent FAQ blockquote linking to the new prompt caching doc.
ARCHITECTURE.md Updates the module count total line.
Comments suppressed due to low confidence (1)

ARCHITECTURE.md:301

  • The updated total (64 = 38 command + 26 infrastructure) conflicts with the breakdown immediately below (Command Modules: 34, Infrastructure Modules: 20). Please reconcile these counts so the section is internally consistent.
**Total: 64 modules** (38 command modules + 26 infrastructure modules)

### Module Count Breakdown

- **Command Modules**: 34 (directly exposed to users)
- **Infrastructure Modules**: 20 (utils, filter, tracking, tee, config, init, gain, toml_filter, verify_cmd, etc.)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

→ Cache WRITE on turn2 result (1.25x, one time)
```

The key point: RTK filters `cargo test` output from 25,000 tokens to 2,500 tokens. That result goes into history at 2,500 tokens, gets written to cache once (1.25x on 2,500), and read from cache every subsequent turn (0.1x on 2,500). Without RTK, the same slot costs 12.5x more to write and 10x more to read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants