Skip to content

docs(blog): Anthropic /v1/messages streaming performance improvements#245

Closed
oss-agent-shin wants to merge 1 commit into
BerriAI:mainfrom
oss-agent-shin:shin/lit-3333-anthropic-streaming-perf-blog
Closed

docs(blog): Anthropic /v1/messages streaming performance improvements#245
oss-agent-shin wants to merge 1 commit into
BerriAI:mainfrom
oss-agent-shin:shin/lit-3333-anthropic-streaming-perf-blog

Conversation

@oss-agent-shin

@oss-agent-shin oss-agent-shin commented May 28, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Adds a performance blog post for litellm-docs covering the Anthropic /v1/messages streaming hot-path optimizations that shipped in BerriAI/litellm#28289. The post walks through the four buckets of overhead the optimization removed (no-op hooks, double-work, O(tokens) end-of-stream reconstruction, hot-path debug logging), the parity guarantees / tests, the headline benchmark numbers, and how to reproduce the benchmark with the new scripts/benchmark_anthropic_messages_perf.py harness.

  • Location: blog/anthropic_messages_streaming_perf/index.md
  • Slug: anthropic-messages-streaming-perf
  • Author: yassin (matches the source-PR author)
  • Date: 2026-05-28
  • Tags: performance, anthropic, streaming, proxy
  • Mirrors the structure of the existing componentized_deployment post (frontmatter shape, {/* truncate */} cut, key-takeaways + conclusion sections) per the Linear spec
  • Image placeholders included as HTML comments referencing /img/blog/anthropic_messages_streaming_perf/... paths so the assets can be dropped in later without churning the prose

Linear ticket

Resolves LIT-3333

Why pushed via Contents API

The current agent GITHUB_TOKEN lacks repo + workflow scopes, so git push is not available. The single new file was uploaded through PUT /repos/{owner}/{repo}/contents/{path} (one commit, one file). The merge-base diff is exactly the blog directory addition.

Evidence

This change is pure documentation — one new markdown file under blog/, no executable code touched, no JS/TS, no Python, no config, no schema. There is no runtime surface to capture before/after on. Stating that explicitly per the Step 3 rule (no silent omissions).

Source content verified against the source PR body:

PR #28289 — "perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths"
  base: litellm_internal_staging
  head: litellm_fix/optimize-v1-messages-streaming
  merged: 2026-05-23
  merge sha: 2eab9ee2c0caf66b6ed51c3f3cb9b41d59cd1001

Benchmark numbers in the post are quoted verbatim from PR #28289 's benchmark table:
  TTFT p50:   241.88 -> 90.89 ms  (-62.4%)
  TTFT p95:   463.86 -> 148.23 ms (-68.0%)
  TTFT p99:  1313.46 -> 155.77 ms (-88.1%)
  Full p50:   242.26 -> 91.32 ms  (-62.3%)
  Tokens/s:  4394.5  -> 12504.4   (+184.6%)
  Req/s:     68.66   -> 195.38    (+184.6%)

Reproduce command in post matches scripts/benchmark_anthropic_messages_perf.py 's flags
in the source PR diff (--label / --proxy-port / --provider-port / --requests / --concurrency
/ --warmup / --repeats).

The four optimization sections in the post each map 1:1 to a bullet under "What this PR does" in source PR #28289:

  • "Awaiting hooks that have nothing to do" -> "Skip work that's a no-op in the default config"
  • "Doing the same work twice per request" -> "Stop doing the same work twice per request"
  • "End-of-stream reconstruction was O(output_tokens)" -> "Cheaper end-of-stream response reconstruction"
  • "Logging on the hot path" -> "Cheaper logging on the hot path"

Pre-Submission checklist

  • Mirrors an existing accepted blog format (componentized_deployment) — no new template needed
  • All technical claims sourced from PR #28289 description
  • Author entry (yassin) already exists in blog/authors.yml — no auth change needed
  • No executable code in scope; ship-pr Evidence rule satisfied via explicit no-runtime-surface statement
  • No customer name referenced
  • Greptile review requested (post-file)

Session: https://litellm-agent-platform.onrender.com/sessions/eeb578f7-75b8-4877-a237-33efdf1b158c


Verification (ship-pr)

Manual run because BerriAI/litellm-docs has neither Greptile nor Veria installed (verified by inspecting reviews on 5 prior merged PRs incl. shin PR #234vercel[bot] is the only bot author on the repo's recent PRs).

  • No customer name in PR title / body / file content. grep -i -E "(rocket money|tempus|barracuda|cornell|verizon|nvidia|netapp|adobe|playtika|kraken)" returns no matches. PASS.
  • No secrets in PR content. Pure docs; no env vars / tokens / keys referenced. PASS.
  • Targets main (the correct base for litellm-docs). The litellm_oss_agent_shin_daily_branch rule applies to BerriAI/litellm, not litellm-docs — the docs repo has no .github/workflows/ and no "Verify PR source branch" check, and recently-merged shin PR docs(blog): incident report for Prisma reconnect freezing event loop (LIT-2614) #234 also used base=main. PASS.
  • Pushed via Contents API. Current GITHUB_TOKEN lacks repo+workflow scopes, so a single PUT /contents/{path} was used to commit the blog post. One file, one commit. Noted up-thread under "Why pushed via Contents API". PASS.
  • Docusaurus build succeeded. Vercel preview deploy 4847349364 reports state=success for commit 78133e27. That confirms the new blog file passed the docusaurus blog plugin (frontmatter + MDX-in-md), the YAML frontmatter parses, {/* truncate */} is recognized, and the author handle resolves against blog/authors.yml. PASS.
  • Vercel-only CI is the historical baseline. Of the 5 most recent merged PRs (docs(a2a): add LangGraph agent card registration guide #236-blog: three calls we'd have gotten wrong building our background agent #239 + shin PR docs(blog): incident report for Prisma reconnect freezing event loop (LIT-2614) #234), only Vercel Preview Comments is the recorded CI check. This PR matches that baseline. PASS.
  • Frontmatter parses as YAML and matches the componentized_deployment template shape (slug / title / date / authors / description / tags / hide_table_of_contents). PASS.
  • No live image references. Image placeholders are HTML comments (<!-- TODO(yassin): replace with ... -->), so docusaurus does not try to resolve any missing PNGs and there are no broken-image warnings. Real assets can be dropped into static/img/blog/anthropic_messages_streaming_perf/ later by uncommenting the placeholders. PASS.
  • All technical claims traced to source PR #28289. The four optimization sections map 1:1 to the four bullets of the source PR description; the benchmark table is quoted verbatim; the reproduce-command flags match scripts/benchmark_anthropic_messages_perf.py in the source diff. PASS.
  • Linear ticket linked. Resolves LIT-3333 in PR body; PR link posted back to the Linear ticket via commentCreate. PASS.
  • Author already exists in blog/authors.yml as yassin — no auth-config change needed. PASS.

Greptile / Veria gate

Greptile and Veria are not installed on BerriAI/litellm-docs. @greptileai please review was posted as a no-op for parity with the BerriAI/litellm flow; no Greptile review will arrive. Past doc PRs (#234, #239, etc.) shipped through direct human approval — same path here. Filed for reviewer transparency, not as a gate-bypass.

Slack post

Slack MCP (mcp__lap-slack__*) is not exposed to this agent session — known platform constraint, tracked across many previously-filed issues. The #eng-pr-reviews Step-5 post cannot be made from this session; reviewer is being requested via this PR's GitHub review-request mechanism instead.

@vercel

vercel Bot commented May 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment May 28, 2026 10:43am

Request Review

@oss-agent-shin

Copy link
Copy Markdown
Contributor Author

@greptileai please review

@oss-agent-shin

Copy link
Copy Markdown
Contributor Author

Manual review requested from @yassin-kortam (ticket owner / source-PR author #28289) and @mubashir1osmani (approver on prior shin docs PR #234). Greptile & Veria are not installed on this repo, so the standard automated review gate cannot run — manual approval is the established pattern for litellm-docs. Full ship-pr verification checklist is appended at the bottom of the PR body.

@mubashir1osmani

Copy link
Copy Markdown
Collaborator

duplicate of #223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants