feat: inject backend quota rate limit filter for QuotaPolicy by yuzisun · Pull Request #1869 · envoyproxy/ai-gateway

yuzisun · 2026-02-16T14:39:38Z

Description
This is the first step for quota aware routing by injecting the backend quota rate limit filter when there is QuotaPolicy attached to the AIServiceBackend specified in the upstream cluster.

End-to-End Data Flow

User creates QuotaPolicy CR
↓
QuotaPolicy Controller reconciles
↓ translator.BuildRateLimitConfigs()
RateLimitConfig protobuf built (descriptor tree)
↓ runner.UpdateConfigs()
xDS snapshot pushed to rate limit service via gRPC
↓
Extension Server injects filter + actions into Envoy xDS

Request arrives at Envoy
↓ (request-time rate limit check)
If no available quota → 429; else continue
↓
ext_proc processes request/response, extracts token usage
↓ (stream-done rate limit action)
Rate limit filter sends HitsAddend to service
↓
Rate limit service increments Redis counter by token count

Related Issues/PRs (if applicable)
Related PR: #1709

codecov-commenter · 2026-02-17T16:33:45Z

Codecov Report

❌ Patch coverage is 87.58357% with 130 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.58%. Comparing base (edef4c5) to head (c2986f0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/controller/gateway.go	21.56%	38 Missing and 2 partials ⚠️
internal/extensionserver/quota_ratelimit.go	93.21%	20 Missing and 17 partials ⚠️
internal/controller/quota_policy.go	82.08%	18 Missing and 6 partials ⚠️
internal/ratelimit/translator/translator.go	92.57%	11 Missing and 4 partials ⚠️
internal/ratelimit/runner/runner.go	84.00%	4 Missing and 4 partials ⚠️
internal/extensionserver/post_translate_modify.go	33.33%	1 Missing and 1 partial ⚠️
internal/extproc/processor_impl.go	80.00%	1 Missing and 1 partial ⚠️
internal/ratelimit/translator/merge.go	95.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1869      +/-   ##
==========================================
+ Coverage   84.42%   84.58%   +0.15%     
==========================================
  Files         134      139       +5     
  Lines       19162    20203    +1041     
==========================================
+ Hits        16177    17088     +911     
- Misses       1998     2092      +94     
- Partials      987     1023      +36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yuzisun · 2026-02-23T19:03:56Z

@yanavlasov @johnugeorge @nacx

johnugeorge · 2026-03-17T19:58:12Z

Correct me if I am wrong

Currently, it doesn't handle multiple quotaPolicies on the same backend. We will have to merge descriptors under the same domain. Can you create an issue to track it?
We should keep new rate limit service optional under a quota helm flag.

Question: How much latency is added if the request passes through the second ratelimit filter(if enabled)?

yuzisun · 2026-03-18T14:12:58Z

Correct me if I am wrong

Currently, it doesn't handle multiple quotaPolicies on the same backend. We will have to merge descriptors under the same domain. Can you create an issue to track it?

created #1971

We should keep new rate limit service optional under a quota helm flag.

will address

Question: How much latency is added if the request passes through the second ratelimit filter(if enabled)?

The second ratelimit filter adds 5ms 99% and 3ms 50% to the total latency.

nacx

Did another review. Here are some additional notes to the comments:

The CostExpression seems not yet to be implemented. Should we add a comment in the API saying the field will be implemented in the future, so users know the field is not taken into account yet?
The mode seems not to be used, but it is required in the API. Is this a leftover or something that needs to be taken into account?

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

**Commit Message** * adding e2e test making sure that we can use a tool and return the proper response from the LLM This PR also contains a few fixes that allow this test to work * added validateToolCallID as its needed for the bedrock message or it fails * Validate and cast the openai content value into bedrock content block - before we assumed it was always a string but it can also be an array * Merge the content blocks if one has text and the other has toolcall but no text - bedrockResp.Output.Message.Content is not 1:1 with openai choice.Message.Content.. we get the result in chunks that should be merged, for example the text and it's toolconfig were in seperate elements but openai expects them to be in the same * in openAIMessageToBedrockMessageRoleAssistant we assume that either there is a refusal or a content text, but we actually dont pass in a text. this was causing an error as the length of the array was set to 1 so the first was empty and there must be a key specified in each content element. note: i think this is another bug that im trying to look into/create unit test for, but since there is already a lot in this PR, maybe its best to follow up with that one the next one. basically, the assistant openai param message's content doesn't seem to be translating to this method, and we have no content. this doesn't prevent anything else from working though, we are just missing a text field like `{"content":[{"text":"Certainly! I can help you get the weather information for New York City. To do that, I'll use the available weather tool. Let me fetch that information for you right away."}` the rests are tests --------- Signed-off-by: Alexa Griffith <agriffith96@gmail.com> Signed-off-by: Alexa Griffith <agriffith96@gmail.com> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net>

…nvoyproxy#720)

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Signed-off-by: achoo30 <achoo30@bloomberg.net>

aabchoo · 2026-05-29T15:44:56Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces backend-level quota rate limiting using the new QuotaPolicy CRD, implementing a controller, an xDS config runner, and translation logic to configure Envoy's rate limit service. Key feedback includes addressing non-deterministic map iteration in the controller's config merging to prevent random xDS snapshot flips, performing deep copies of descriptors during merging to avoid cache corruption, and adding validation to ensure parsed rule indices are non-negative to prevent potential out-of-bounds panics.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

Signed-off-by: achoo30 <achoo30@bloomberg.net>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

Signed-off-by: achoo30 <achoo30@bloomberg.net>

aabchoo · 2026-06-01T23:07:27Z

Quota Policy's model name is now based on the model name override. Updated documentation to outline that.

RateLimitPerRoute is based on the httproute rule and cross-references the backend name + model name override. Multiple backends for a rule means the route will send multiple descriptors for the potential "match". Response path will only update the override+backend routed.

Signed-off-by: achoo30 <achoo30@bloomberg.net>

yuzisun requested a review from a team as a code owner February 16, 2026 14:39

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 16, 2026

yuzisun force-pushed the backend_quota_ratelimit branch 2 times, most recently from 8a9cfb3 to 4841c05 Compare February 17, 2026 00:24

xiaolin593 reviewed Feb 17, 2026

View reviewed changes

Comment thread internal/extensionserver/quota_ratelimit.go Outdated

Comment thread internal/extensionserver/quota_ratelimit.go Outdated

yuzisun force-pushed the backend_quota_ratelimit branch from 3580b41 to 2e78aec Compare February 28, 2026 23:32

yuzisun mentioned this pull request Mar 2, 2026

fix: support per-route LLMRequestCosts for different cost formulas #1780

Closed

nacx reviewed Mar 13, 2026

View reviewed changes

Comment thread cmd/controller/main.go

Comment thread internal/extensionserver/quota_ratelimit.go

johnugeorge reviewed Mar 17, 2026

View reviewed changes

Comment thread internal/controller/quota_policy.go Outdated

yuzisun mentioned this pull request Mar 18, 2026

support merging QuotaPolicy to the same AIServiceBackend #1971

Open

yuzisun force-pushed the backend_quota_ratelimit branch from daf7b79 to 9fcbcad Compare March 26, 2026 10:59

dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. size:XS This PR changes 0-9 lines, ignoring generated files. labels Mar 26, 2026

johnugeorge reviewed Mar 30, 2026

View reviewed changes

Comment thread internal/controller/quota_policy.go

nacx reviewed Mar 31, 2026

View reviewed changes

yuzisun force-pushed the backend_quota_ratelimit branch from 33c823e to 2ee6e5a Compare March 31, 2026 20:53

johnugeorge added this to the l milestone Apr 1, 2026

yuzisun and others added 7 commits April 6, 2026 08:20

extproc: fix tool role message

81609df

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.36.3 to 1.36.4 (e…

33a95af

…nvoyproxy#720)

inject rate limit filter in upstream filter chain

03c41a9

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

add ratelimit dependency

d1d602e

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

add backend dynamic metadata

01d137f

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

add quota limit test

bb21dd4

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

aabchoo added 2 commits May 28, 2026 03:05

dont block xDS if QP fails

6c857bf

Signed-off-by: achoo30 <achoo30@bloomberg.net>

logging and testing

29ffe6b

Signed-off-by: achoo30 <achoo30@bloomberg.net>

aabchoo force-pushed the backend_quota_ratelimit branch from 32907aa to 29ffe6b Compare May 28, 2026 19:17

aabchoo added 5 commits May 28, 2026 15:22

Merge branch 'main' into backend_quota_ratelimit

a6fde7d

fix typo

431929d

Signed-off-by: achoo30 <achoo30@bloomberg.net>

test

552c036

Signed-off-by: achoo30 <achoo30@bloomberg.net>

add/skip change

9da7afa

Signed-off-by: achoo30 <achoo30@bloomberg.net>

remove comments

715bc02

Signed-off-by: achoo30 <achoo30@bloomberg.net>

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

aabchoo and others added 17 commits May 29, 2026 16:43

Update internal/extensionserver/quota_ratelimit.go

94273f4

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

check if negative

34eb4c5

Signed-off-by: achoo30 <achoo30@bloomberg.net>

Update internal/ratelimit/translator/merge.go

104e9f3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

Update internal/ratelimit/translator/merge.go

dec8c0f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

address comments

f3ac13e

Signed-off-by: achoo30 <achoo30@bloomberg.net>

Merge branch 'main' into backend_quota_ratelimit

98d8763

prevent hard block to extension servers

12cde12

Signed-off-by: achoo30 <achoo30@bloomberg.net>

return nil if quota policy is not found

6bed557

Signed-off-by: achoo30 <achoo30@bloomberg.net>

missing duration validation

b587ff1

Signed-off-by: achoo30 <achoo30@bloomberg.net>

clean up rate limit svc

0ffeea4

Signed-off-by: achoo30 <achoo30@bloomberg.net>

update the quota policy model name to use override

f4df815

Signed-off-by: achoo30 <achoo30@bloomberg.net>

test name to use override

ecc74d0

Signed-off-by: achoo30 <achoo30@bloomberg.net>

fix

24a1a36

Signed-off-by: achoo30 <achoo30@bloomberg.net>

prevent cost from leaking into other routes

af2f625

Signed-off-by: achoo30 <achoo30@bloomberg.net>

Merge branch 'main' into backend_quota_ratelimit

a430ada

bump count by one

f4b8d89

Signed-off-by: achoo30 <achoo30@bloomberg.net>

fix test case

a59e616

Signed-off-by: achoo30 <achoo30@bloomberg.net>

precommit

c2986f0

Signed-off-by: achoo30 <achoo30@bloomberg.net>

aabchoo requested a review from a team June 2, 2026 01:05

Conversation

yuzisun commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yuzisun commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnugeorge commented Mar 17, 2026

Uh oh!

Uh oh!

yuzisun commented Mar 18, 2026

Uh oh!

Uh oh!

nacx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aabchoo commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aabchoo commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yuzisun commented Feb 16, 2026 •

edited

Loading

codecov-commenter commented Feb 17, 2026 •

edited

Loading

yuzisun commented Feb 23, 2026 •

edited

Loading