fix: support per-route LLMRequestCosts for different cost formulas#1780
fix: support per-route LLMRequestCosts for different cost formulas#1780jonasHanhan wants to merge 7 commits into
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (71.24%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1780 +/- ##
==========================================
- Coverage 84.38% 84.25% -0.13%
==========================================
Files 130 130
Lines 17985 18086 +101
==========================================
+ Hits 15176 15238 +62
- Misses 1867 1896 +29
- Partials 942 952 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // Assign the route's LLMRequestCosts to this backend. | ||
| // This ensures each backend has its own cost calculation configuration, | ||
| // allowing different routes to use different CEL expressions for the same metadataKey. | ||
| b.LLMRequestCosts, err = llmRequestCostsToFilterAPI(aiGatewayRoute.Spec.LLMRequestCosts) |
There was a problem hiding this comment.
let's construct this once instead of repeating it inside the for loop here
There was a problem hiding this comment.
Done! Moved the llmRequestCostsToFilterAPI call outside the backendRef loop. Thanks for the review!
| // Use backend-level request costs if available, otherwise fall back to global config. | ||
| requestCosts := u.backendRequestCosts | ||
| if len(requestCosts) == 0 { | ||
| requestCosts = u.parent.config.RequestCosts | ||
| } |
There was a problem hiding this comment.
when this happens? We can delete the global level ones if we merge this after v0.5 which comes with the proper config version checking which allows us to avoid breaking changes
There was a problem hiding this comment.
The global fallback happens when the backend doesn't have any LLMRequestCosts configured (for backward compatibility with existing configurations that only use the global LLMRequestCosts).
If you'd like me to remove the global level fallback since this will be merged after v0.5 with the config version checking, I'm happy to do that. Just let me know!
nacx
left a comment
There was a problem hiding this comment.
Can you add a section on the website page that explains cost-based rate limiting, with a small example on how to achieve different costs per backend? This could be a useful feature but it's not easy to spot that the way of doing that with the current API is creating N AIGatewayRoutes.
| BodyMutation *HTTPBodyMutation `json:"httpBodyMutation,omitempty"` | ||
| // LLMRequestCosts configures the cost calculation for this backend. Optional. | ||
| // This allows different routes/backends to have different cost calculation formulas. | ||
| LLMRequestCosts []LLMRequestCost `json:"llmRequestCosts,omitempty"` |
There was a problem hiding this comment.
If you want to track per backend then it should use the new Quota API. The BackendTrafficPolicy is applied at the route level.
There was a problem hiding this comment.
Thanks for the feedback! I see there's a TODO in ai_service_backend.go:78-79 suggesting backend-level LLMRequestCost configuration:
// TODO: maybe add backend-level LLMRequestCost configuration that overrides the AIGatewayRoute-level LLMRequestCost.My current implementation copies the route's LLMRequestCosts to each backend to match @aabchoo's per-backendRef proposal mentioned in #1688.
Would you prefer I implement this at the AIServiceBackend level instead (as the TODO suggests)? I'm happy to refactor if that's the preferred direction. Could you also point me to the Quota API you mentioned?
There was a problem hiding this comment.
Thanks for pointing to PR #1709! The QuotaPolicy approach looks comprehensive. I noticed it's been in progress for a while now — looking forward to seeing it land.
|
@jonasHanhan @nacx i do not think this is the right way to implement per backend quota which should use the new Quota API designed for this purpose. The current token rate limit is at the route level so we can’t copy it from route to backend. |
|
|
||
| - Each `AIGatewayRoute` has its own `llmRequestCosts` configuration | ||
| - Different routes can use the same `metadataKey` (e.g., `billing_charges`) with different CEL expressions | ||
| - The cost calculation is automatically applied per-backend, allowing fine-grained cost control |
There was a problem hiding this comment.
If the quota needs to be applied at backend level then it should attach the policy to AIServiceBackend instead of AIGatewayRoute.
5cb084b to
03b6a0b
Compare
|
@jonasHanhan i think we still need this PR but the fix is to aggregate the cost expressions at the route level instead of backend in filter config. |
|
Thanks for the clarification - makes sense to keep this at route level. I will rework this PR to remove backend-level LLMRequestCosts in filter config and instead aggregate cost expressions at route level. Proposed behavior:
Before I implement, can you confirm this is the expected aggregation model? If this matches your expectation, I will push the refactor and update docs accordingly. |
fc61e06 to
767f6c7
Compare
Hi @jonasHanhan, per backend cost tracking is handled in my PR #1869 with QuotaPolicy API(applies to AIServiceBackend). This one here is for tracking quota at the route level defined with BackendTrafficPolicy (only applies to HTTPRoute). We can aggregate per-route costs across all the routes and maintain a map, we still need a request attribute that tells which route is matched and currently we only have the information of the backend. A few items we need to do here:
|
9b12588 to
c0db777
Compare
| expr := "uint(0)" | ||
| // Keep "last definition wins" semantics for duplicate metadata keys by | ||
| // layering conditions in declaration order. | ||
| for i := 0; i < len(costs); i++ { |
There was a problem hiding this comment.
could use the simpler for _, cost := range costs since it doesn't need the index
| return false | ||
| } | ||
|
|
||
| func routeNameFromRouteConfigName(routeConfigName string) string { |
There was a problem hiding this comment.
Consider documenting the route name format assumption, the assumption that route names follow httproute/<namespace>/<name>/... should be documented.
| if !hasRouteBackend || len(routeCosts) == 0 { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
hasRouteBackend can be checked outside of the function instead of passing as function parameters.
c9a584b to
bcec26d
Compare
44ea243 to
d4b2545
Compare
| return false | ||
| } | ||
|
|
||
| func routeNameFromRouteConfigName(routeConfigName string) string { |
d4b2545 to
9109ff9
Compare
Signed-off-by: jonasHanhan <zqhmusic121@gmail.com>
53e31f2 to
b01ab63
Compare
|
@jonasHanhan could you please not force push each time you make a change? It's very difficult to track what has been modified. |
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
|
Hey @jonasHanhan, I'm going to take over this PR as this change is urgent. |
| } | ||
| if len(routeBackendNames) > 0 { | ||
| if err = aggregateRouteLLMRequestCosts(llmCostsByMetadata, &llmCostMetadataOrder, routeName, aiGatewayRoute.Spec.LLMRequestCosts); err != nil { | ||
| return false, fmt.Errorf("failed to aggregate LLMRequestCosts for route %s: %w", aiGatewayRoute.Name, err) |
There was a problem hiding this comment.
once bad llm request cost can prevent the entire filter config from updating
thoughts cc @yuzisun
There was a problem hiding this comment.
extproc also can not come up if there is bad cel expression
https://github.com/envoyproxy/ai-gateway/blob/main/internal/extproc/server.go#L82. Either we need to validate the cel expression upfront or user need to fix the cel expression to unblock the reconcilation.
|
hey, @jonasHanhan thanks for working on this! I've been looking at the per-backend LLM cost implementation and noticed a potential issue with the current approach of aggregating costs into a single CEL expression per metadataKey. workshopped my suggestion in jonasHanhan#1 The issueWhen multiple routes define costs for the same metadataKey, the controller builds one nested ternary CEL expression like: This grows linearly with the number of routes. In clusters with many routes sharing the same metadata key, this can hit CEL-Go compile limits or make expressions unwieldy to debug. Alternative approachI've pushed an alternative implementation to my fork and a PR to your branch (See jonasHanhan#1) that stores per-route costs instead of aggregating them into a single expression. ChangesFilter API (
Controller (
Extproc (
API docs (
Benefits
Trade-offs
Backward compatibilityThe new TestsUpdated all controller and extproc tests to reflect the new structure:
Happy to discuss if this direction makes sense or if there are aspects of the aggregated approach I'm missing. Let me know if you'd like me to open a separate PR or if you'd prefer to incorporate these changes here. |
|
Notes from Community meeting on 6th April
|
**Description** This PR is an alternative to #1780 's implementation. It fixes pre-route LLMRequestCosts configuration, allowing different routes to use different CEL calculation formulas for the same metadataKey. Previously even if a user configures different LLMRequestsCosts for each route with same metadataKey, the implementation would pick one of the LLMRequestsCosts and apply it globally to all the routes. this PR fixes that implementation #1780 also implemented the same thing but by combining CEL expressions from all routes into a single CEL expression. this ran into CEL length limitations. This PR instead stores individual route-specific CELs in extproc's filterconfig and allows extrpoc to choose among them based on the route used by the current requests. This PR also introduces GlobalLLMRequestCosts in GatewayConfig allowing to configure LLMRequestCosts that apply to all routes. during conflicts route-specific costs are prioritized over global costs. **Breaking Change:** This PR fixes the existing issue where if one ai-gateway route CR had LLMRequestCosts configured they will be applied to all route (even if they are not configured with it). After this PR is merged, costs will apply only to the route on which they are configured. Users can instead configure GatewayConfig.GlobalLLMRequestCosts to configure costs that apply to all routes. **Related Issues/PRs (if applicable)** Fixes: #1688 Related PR: #1780 --------- Signed-off-by: jonasHanhan <zqhmusic121@gmail.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net> Co-authored-by: jonasHanhan <zqhmusic121@gmail.com> Co-authored-by: Aaron Choo <achoo30@bloomberg.net>
|
we can close this PR as above PR was merged |
Description
This commit enables per-route LLMRequestCosts configuration, allowing different routes to use different CEL calculation formulas for the same metadataKey.
Previously, LLMRequestCosts were gateway-scoped and deduplicated by metadataKey, causing incorrect cost calculations when multiple routes needed different formulas (e.g., free model with
cost=0vs paid model withcost=input_tokens*30+output_tokens*150).Changes:
LLMRequestCostsfield toBackendstructLLMRequestCoststo each backend during reconciliationNewRuntimeConfigProcessResponseBodywith fallback to global configRelated Issues/PRs (if applicable)
Fixes #1688
Special notes for reviewers (if applicable)
make precommitpasses)