Skip to content

feat: inject backend quota rate limit filter for QuotaPolicy#1869

Open
yuzisun wants to merge 83 commits into
envoyproxy:mainfrom
yuzisun:backend_quota_ratelimit
Open

feat: inject backend quota rate limit filter for QuotaPolicy#1869
yuzisun wants to merge 83 commits into
envoyproxy:mainfrom
yuzisun:backend_quota_ratelimit

Conversation

@yuzisun
Copy link
Copy Markdown
Contributor

@yuzisun yuzisun commented Feb 16, 2026

Description
This is the first step for quota aware routing by injecting the backend quota rate limit filter when there is QuotaPolicy attached to the AIServiceBackend specified in the upstream cluster.

End-to-End Data Flow

User creates QuotaPolicy CR

QuotaPolicy Controller reconciles
↓ translator.BuildRateLimitConfigs()
RateLimitConfig protobuf built (descriptor tree)
↓ runner.UpdateConfigs()
xDS snapshot pushed to rate limit service via gRPC

Extension Server injects filter + actions into Envoy xDS

Request arrives at Envoy
↓ (request-time rate limit check)
If no available quota → 429; else continue

ext_proc processes request/response, extracts token usage
↓ (stream-done rate limit action)
Rate limit filter sends HitsAddend to service

Rate limit service increments Redis counter by token count

Related Issues/PRs (if applicable)
Related PR: #1709

@yuzisun yuzisun requested a review from a team as a code owner February 16, 2026 14:39
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 16, 2026
@yuzisun yuzisun force-pushed the backend_quota_ratelimit branch 2 times, most recently from 8a9cfb3 to 4841c05 Compare February 17, 2026 00:24
Comment thread internal/extensionserver/quota_ratelimit.go Outdated
Comment thread internal/extensionserver/quota_ratelimit.go Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 87.58357% with 130 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.58%. Comparing base (edef4c5) to head (c2986f0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/gateway.go 21.56% 38 Missing and 2 partials ⚠️
internal/extensionserver/quota_ratelimit.go 93.21% 20 Missing and 17 partials ⚠️
internal/controller/quota_policy.go 82.08% 18 Missing and 6 partials ⚠️
internal/ratelimit/translator/translator.go 92.57% 11 Missing and 4 partials ⚠️
internal/ratelimit/runner/runner.go 84.00% 4 Missing and 4 partials ⚠️
internal/extensionserver/post_translate_modify.go 33.33% 1 Missing and 1 partial ⚠️
internal/extproc/processor_impl.go 80.00% 1 Missing and 1 partial ⚠️
internal/ratelimit/translator/merge.go 95.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1869      +/-   ##
==========================================
+ Coverage   84.42%   84.58%   +0.15%     
==========================================
  Files         134      139       +5     
  Lines       19162    20203    +1041     
==========================================
+ Hits        16177    17088     +911     
- Misses       1998     2092      +94     
- Partials      987     1023      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yuzisun
Copy link
Copy Markdown
Contributor Author

yuzisun commented Feb 23, 2026

Comment thread cmd/controller/main.go
Comment thread internal/extensionserver/quota_ratelimit.go
@johnugeorge
Copy link
Copy Markdown
Contributor

Correct me if I am wrong

  1. Currently, it doesn't handle multiple quotaPolicies on the same backend. We will have to merge descriptors under the same domain. Can you create an issue to track it?
  2. We should keep new rate limit service optional under a quota helm flag.

Question: How much latency is added if the request passes through the second ratelimit filter(if enabled)?

Comment thread internal/controller/quota_policy.go Outdated
@yuzisun
Copy link
Copy Markdown
Contributor Author

yuzisun commented Mar 18, 2026

Correct me if I am wrong

  1. Currently, it doesn't handle multiple quotaPolicies on the same backend. We will have to merge descriptors under the same domain. Can you create an issue to track it?

created #1971

  1. We should keep new rate limit service optional under a quota helm flag.

will address

Question: How much latency is added if the request passes through the second ratelimit filter(if enabled)?

The second ratelimit filter adds 5ms 99% and 3ms 50% to the total latency.

@yuzisun yuzisun force-pushed the backend_quota_ratelimit branch from daf7b79 to 9fcbcad Compare March 26, 2026 10:59
@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. size:XS This PR changes 0-9 lines, ignoring generated files. labels Mar 26, 2026
Comment thread internal/controller/quota_policy.go
Copy link
Copy Markdown
Member

@nacx nacx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another review. Here are some additional notes to the comments:

  • The CostExpression seems not yet to be implemented. Should we add a comment in the API saying the field will be implemented in the future, so users know the field is not taken into account yet?
  • The mode seems not to be used, but it is required in the API. Is this a leftover or something that needs to be taken into account?

Comment thread internal/extensionserver/quota_ratelimit.go
Comment thread internal/controller/controller.go Outdated
Comment thread internal/ratelimit/translator/translator.go Outdated
Comment thread internal/extensionserver/quota_ratelimit.go Outdated
Comment thread internal/ratelimit/translator/translator_test.go Outdated
Comment thread internal/ratelimit/translator/translator.go
Comment thread internal/ratelimit/translator/translator.go
Comment thread internal/extensionserver/quota_ratelimit.go
Comment thread cmd/controller/main.go
@yuzisun yuzisun force-pushed the backend_quota_ratelimit branch from 33c823e to 2ee6e5a Compare March 31, 2026 20:53
@johnugeorge johnugeorge added this to the l milestone Apr 1, 2026
yuzisun and others added 7 commits April 6, 2026 08:20
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
**Commit Message**

* adding e2e test making sure that we can use a tool and return the
proper response from the LLM

This PR also contains a few fixes that allow this test to work
* added validateToolCallID as its needed for the bedrock message or it
fails
* Validate and cast the openai content value into bedrock content block
- before we assumed it was always a string but it can also be an array
* Merge the content blocks if one has text and the other has toolcall
but no text - bedrockResp.Output.Message.Content is not 1:1 with openai
choice.Message.Content.. we get the result in chunks that should be
merged, for example the text and it's toolconfig were in seperate
elements but openai expects them to be in the same
* in openAIMessageToBedrockMessageRoleAssistant we assume that either
there is a refusal or a content text, but we actually dont pass in a
text. this was causing an error as the length of the array was set to 1
so the first was empty and there must be a key specified in each content
element.

note: i think this is another bug that im trying to look into/create
unit test for, but since there is already a lot in this PR, maybe its
best to follow up with that one the next one. basically, the assistant
openai param message's content doesn't seem to be translating to this
method, and we have no content. this doesn't prevent anything else from
working though, we are just missing a text field like
`{"content":[{"text":"Certainly! I can help you get the weather
information for New York City. To do that, I'll use the available
weather tool. Let me fetch that information for you right away."}`

the rests are tests

---------

Signed-off-by: Alexa Griffith <agriffith96@gmail.com>
Signed-off-by: Alexa Griffith  <agriffith96@gmail.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
aabchoo added 2 commits May 28, 2026 03:05
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
@aabchoo aabchoo force-pushed the backend_quota_ratelimit branch from 32907aa to 29ffe6b Compare May 28, 2026 19:17
aabchoo added 5 commits May 28, 2026 15:22
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
@aabchoo
Copy link
Copy Markdown
Contributor

aabchoo commented May 29, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces backend-level quota rate limiting using the new QuotaPolicy CRD, implementing a controller, an xDS config runner, and translation logic to configure Envoy's rate limit service. Key feedback includes addressing non-deterministic map iteration in the controller's config merging to prevent random xDS snapshot flips, performing deep copies of descriptors during merging to avoid cache corruption, and adding validation to ensure parsed rule indices are non-negative to prevent potential out-of-bounds panics.

Comment thread internal/controller/quota_policy.go
Comment thread internal/controller/quota_policy.go
Comment thread internal/ratelimit/translator/merge.go
Comment thread internal/ratelimit/translator/merge.go
Comment thread internal/extensionserver/quota_ratelimit.go
Comment thread internal/extensionserver/quota_ratelimit.go
Comment thread internal/extensionserver/quota_ratelimit.go
aabchoo and others added 17 commits May 29, 2026 16:43
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
Signed-off-by: achoo30 <achoo30@bloomberg.net>
@aabchoo
Copy link
Copy Markdown
Contributor

aabchoo commented Jun 1, 2026

Quota Policy's model name is now based on the model name override. Updated documentation to outline that.

RateLimitPerRoute is based on the httproute rule and cross-references the backend name + model name override. Multiple backends for a rule means the route will send multiple descriptors for the potential "match". Response path will only update the override+backend routed.

Signed-off-by: achoo30 <achoo30@bloomberg.net>
@aabchoo aabchoo requested a review from a team June 2, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants