docs(blog): Anthropic /v1/messages streaming performance improvements by oss-agent-shin · Pull Request #245 · BerriAI/litellm-docs

oss-agent-shin · 2026-05-28T10:41:43Z

What this PR does

Adds a performance blog post for litellm-docs covering the Anthropic /v1/messages streaming hot-path optimizations that shipped in BerriAI/litellm#28289. The post walks through the four buckets of overhead the optimization removed (no-op hooks, double-work, O(tokens) end-of-stream reconstruction, hot-path debug logging), the parity guarantees / tests, the headline benchmark numbers, and how to reproduce the benchmark with the new scripts/benchmark_anthropic_messages_perf.py harness.

Location: blog/anthropic_messages_streaming_perf/index.md
Slug: anthropic-messages-streaming-perf
Author: yassin (matches the source-PR author)
Date: 2026-05-28
Tags: performance, anthropic, streaming, proxy
Mirrors the structure of the existing componentized_deployment post (frontmatter shape, {/* truncate */} cut, key-takeaways + conclusion sections) per the Linear spec
Image placeholders included as HTML comments referencing /img/blog/anthropic_messages_streaming_perf/... paths so the assets can be dropped in later without churning the prose

Linear ticket

Resolves LIT-3333

Why pushed via Contents API

The current agent GITHUB_TOKEN lacks repo + workflow scopes, so git push is not available. The single new file was uploaded through PUT /repos/{owner}/{repo}/contents/{path} (one commit, one file). The merge-base diff is exactly the blog directory addition.

Evidence

This change is pure documentation — one new markdown file under blog/, no executable code touched, no JS/TS, no Python, no config, no schema. There is no runtime surface to capture before/after on. Stating that explicitly per the Step 3 rule (no silent omissions).

Source content verified against the source PR body:

PR #28289 — "perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths"
  base: litellm_internal_staging
  head: litellm_fix/optimize-v1-messages-streaming
  merged: 2026-05-23
  merge sha: 2eab9ee2c0caf66b6ed51c3f3cb9b41d59cd1001

Benchmark numbers in the post are quoted verbatim from PR #28289 's benchmark table:
  TTFT p50:   241.88 -> 90.89 ms  (-62.4%)
  TTFT p95:   463.86 -> 148.23 ms (-68.0%)
  TTFT p99:  1313.46 -> 155.77 ms (-88.1%)
  Full p50:   242.26 -> 91.32 ms  (-62.3%)
  Tokens/s:  4394.5  -> 12504.4   (+184.6%)
  Req/s:     68.66   -> 195.38    (+184.6%)

Reproduce command in post matches scripts/benchmark_anthropic_messages_perf.py 's flags
in the source PR diff (--label / --proxy-port / --provider-port / --requests / --concurrency
/ --warmup / --repeats).

The four optimization sections in the post each map 1:1 to a bullet under "What this PR does" in source PR #28289:

"Awaiting hooks that have nothing to do" -> "Skip work that's a no-op in the default config"
"Doing the same work twice per request" -> "Stop doing the same work twice per request"
"End-of-stream reconstruction was O(output_tokens)" -> "Cheaper end-of-stream response reconstruction"
"Logging on the hot path" -> "Cheaper logging on the hot path"

Pre-Submission checklist

Mirrors an existing accepted blog format (componentized_deployment) — no new template needed
All technical claims sourced from PR #28289 description
Author entry (yassin) already exists in blog/authors.yml — no auth change needed
No executable code in scope; ship-pr Evidence rule satisfied via explicit no-runtime-surface statement
No customer name referenced
Greptile review requested (post-file)

Session: https://litellm-agent-platform.onrender.com/sessions/eeb578f7-75b8-4877-a237-33efdf1b158c

Verification (ship-pr)

Manual run because BerriAI/litellm-docs has neither Greptile nor Veria installed (verified by inspecting reviews on 5 prior merged PRs incl. shin PR #234 — vercel[bot] is the only bot author on the repo's recent PRs).

No customer name in PR title / body / file content. grep -i -E "(rocket money|tempus|barracuda|cornell|verizon|nvidia|netapp|adobe|playtika|kraken)" returns no matches. PASS.
No secrets in PR content. Pure docs; no env vars / tokens / keys referenced. PASS.
Targets main (the correct base for litellm-docs). The litellm_oss_agent_shin_daily_branch rule applies to BerriAI/litellm, not litellm-docs — the docs repo has no .github/workflows/ and no "Verify PR source branch" check, and recently-merged shin PR docs(blog): incident report for Prisma reconnect freezing event loop (LIT-2614) #234 also used base=main. PASS.
Pushed via Contents API. Current GITHUB_TOKEN lacks repo+workflow scopes, so a single PUT /contents/{path} was used to commit the blog post. One file, one commit. Noted up-thread under "Why pushed via Contents API". PASS.
Docusaurus build succeeded. Vercel preview deploy 4847349364 reports state=success for commit 78133e27. That confirms the new blog file passed the docusaurus blog plugin (frontmatter + MDX-in-md), the YAML frontmatter parses, {/* truncate */} is recognized, and the author handle resolves against blog/authors.yml. PASS.
Vercel-only CI is the historical baseline. Of the 5 most recent merged PRs (docs(a2a): add LangGraph agent card registration guide #236-blog: three calls we'd have gotten wrong building our background agent #239 + shin PR docs(blog): incident report for Prisma reconnect freezing event loop (LIT-2614) #234), only Vercel Preview Comments is the recorded CI check. This PR matches that baseline. PASS.
Frontmatter parses as YAML and matches the componentized_deployment template shape (slug / title / date / authors / description / tags / hide_table_of_contents). PASS.
No live image references. Image placeholders are HTML comments (), so docusaurus does not try to resolve any missing PNGs and there are no broken-image warnings. Real assets can be dropped into static/img/blog/anthropic_messages_streaming_perf/ later by uncommenting the placeholders. PASS.
All technical claims traced to source PR #28289. The four optimization sections map 1:1 to the four bullets of the source PR description; the benchmark table is quoted verbatim; the reproduce-command flags match scripts/benchmark_anthropic_messages_perf.py in the source diff. PASS.
Linear ticket linked. Resolves LIT-3333 in PR body; PR link posted back to the Linear ticket via commentCreate. PASS.
Author already exists in blog/authors.yml as yassin — no auth-config change needed. PASS.

Greptile / Veria gate

Greptile and Veria are not installed on BerriAI/litellm-docs. @greptileai please review was posted as a no-op for parity with the BerriAI/litellm flow; no Greptile review will arrive. Past doc PRs (#234, #239, etc.) shipped through direct human approval — same path here. Filed for reviewer transparency, not as a gate-bypass.

Slack post

Slack MCP (mcp__lap-slack__*) is not exposed to this agent session — known platform constraint, tracked across many previously-filed issues. The #eng-pr-reviews Step-5 post cannot be made from this session; reviewer is being requested via this PR's GitHub review-request mechanism instead.

vercel · 2026-05-28T10:41:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	May 28, 2026 10:43am

oss-agent-shin · 2026-05-28T10:42:02Z

@greptileai please review

oss-agent-shin · 2026-05-28T10:49:40Z

Manual review requested from @yassin-kortam (ticket owner / source-PR author #28289) and @mubashir1osmani (approver on prior shin docs PR #234). Greptile & Veria are not installed on this repo, so the standard automated review gate cannot run — manual approval is the established pattern for litellm-docs. Full ship-pr verification checklist is appended at the bottom of the PR body.

mubashir1osmani · 2026-05-28T21:39:10Z

duplicate of #223

docs(blog): Anthropic /v1/messages streaming perf post (LIT-3333)

78133e2

vercel Bot deployed to Preview May 28, 2026 10:43 View deployment

mubashir1osmani closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(blog): Anthropic /v1/messages streaming performance improvements#245

docs(blog): Anthropic /v1/messages streaming performance improvements#245
oss-agent-shin wants to merge 1 commit into
BerriAI:mainfrom
oss-agent-shin:shin/lit-3333-anthropic-streaming-perf-blog

oss-agent-shin commented May 28, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 28, 2026 •

edited

Loading

Uh oh!

oss-agent-shin commented May 28, 2026

Uh oh!

oss-agent-shin commented May 28, 2026

Uh oh!

mubashir1osmani commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oss-agent-shin commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Linear ticket

Why pushed via Contents API

Evidence

Pre-Submission checklist

Verification (ship-pr)

Greptile / Veria gate

Slack post

Uh oh!

vercel Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oss-agent-shin commented May 28, 2026

Uh oh!

oss-agent-shin commented May 28, 2026

Uh oh!

mubashir1osmani commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oss-agent-shin commented May 28, 2026 •

edited

Loading

vercel Bot commented May 28, 2026 •

edited

Loading