Skip to content

fix: support per-route LLMRequestCosts for different cost formulas#1780

Closed
jonasHanhan wants to merge 7 commits into
envoyproxy:mainfrom
jonasHanhan:fix/per-backend-llm-request-costs
Closed

fix: support per-route LLMRequestCosts for different cost formulas#1780
jonasHanhan wants to merge 7 commits into
envoyproxy:mainfrom
jonasHanhan:fix/per-backend-llm-request-costs

Conversation

@jonasHanhan
Copy link
Copy Markdown
Contributor

@jonasHanhan jonasHanhan commented Jan 16, 2026

Description

This commit enables per-route LLMRequestCosts configuration, allowing different routes to use different CEL calculation formulas for the same metadataKey.

Previously, LLMRequestCosts were gateway-scoped and deduplicated by metadataKey, causing incorrect cost calculations when multiple routes needed different formulas (e.g., free model with cost=0 vs paid model with cost=input_tokens*30+output_tokens*150).

Changes:

  • Add LLMRequestCosts field to Backend struct
  • Copy route's LLMRequestCosts to each backend during reconciliation
  • Compile backend-level CEL programs in NewRuntimeConfig
  • Use backend-level costs in ProcessResponseBody with fallback to global config

Related Issues/PRs (if applicable)

Fixes #1688

Special notes for reviewers (if applicable)

  • AI tools were used to assist with test generation and code review
  • All unit tests pass (make precommit passes)
  • Backward compatible: if backend has no LLMRequestCosts, falls back to global config

@jonasHanhan jonasHanhan requested a review from a team as a code owner January 16, 2026 06:26
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 16, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 71.24183% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.25%. Comparing base (7610c0e) to head (9e10044).

Files with missing lines Patch % Lines
internal/extensionserver/post_translate_modify.go 41.30% 23 Missing and 4 partials ⚠️
internal/controller/gateway.go 83.33% 7 Missing and 5 partials ⚠️
internal/extproc/server.go 62.50% 2 Missing and 1 partial ⚠️
internal/extproc/processor.go 0.00% 1 Missing ⚠️
internal/extproc/processor_impl.go 94.73% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (71.24%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1780      +/-   ##
==========================================
- Coverage   84.38%   84.25%   -0.13%     
==========================================
  Files         130      130              
  Lines       17985    18086     +101     
==========================================
+ Hits        15176    15238      +62     
- Misses       1867     1896      +29     
- Partials      942      952      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread internal/controller/gateway.go Outdated
// Assign the route's LLMRequestCosts to this backend.
// This ensures each backend has its own cost calculation configuration,
// allowing different routes to use different CEL expressions for the same metadataKey.
b.LLMRequestCosts, err = llmRequestCostsToFilterAPI(aiGatewayRoute.Spec.LLMRequestCosts)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's construct this once instead of repeating it inside the for loop here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Moved the llmRequestCostsToFilterAPI call outside the backendRef loop. Thanks for the review!

Comment thread internal/extproc/processor_impl.go Outdated
Comment on lines +436 to +440
// Use backend-level request costs if available, otherwise fall back to global config.
requestCosts := u.backendRequestCosts
if len(requestCosts) == 0 {
requestCosts = u.parent.config.RequestCosts
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this happens? We can delete the global level ones if we merge this after v0.5 which comes with the proper config version checking which allows us to avoid breaking changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global fallback happens when the backend doesn't have any LLMRequestCosts configured (for backward compatibility with existing configurations that only use the global LLMRequestCosts).

If you'd like me to remove the global level fallback since this will be merged after v0.5 with the config version checking, I'm happy to do that. Just let me know!

Copy link
Copy Markdown
Member

@nacx nacx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a section on the website page that explains cost-based rate limiting, with a small example on how to achieve different costs per backend? This could be a useful feature but it's not easy to spot that the way of doing that with the current API is creating N AIGatewayRoutes.

Comment thread internal/filterapi/filterconfig.go Outdated
BodyMutation *HTTPBodyMutation `json:"httpBodyMutation,omitempty"`
// LLMRequestCosts configures the cost calculation for this backend. Optional.
// This allows different routes/backends to have different cost calculation formulas.
LLMRequestCosts []LLMRequestCost `json:"llmRequestCosts,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to track per backend then it should use the new Quota API. The BackendTrafficPolicy is applied at the route level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I see there's a TODO in ai_service_backend.go:78-79 suggesting backend-level LLMRequestCost configuration:

// TODO: maybe add backend-level LLMRequestCost configuration that overrides the AIGatewayRoute-level LLMRequestCost.

My current implementation copies the route's LLMRequestCosts to each backend to match @aabchoo's per-backendRef proposal mentioned in #1688.

Would you prefer I implement this at the AIServiceBackend level instead (as the TODO suggests)? I'm happy to refactor if that's the preferred direction. Could you also point me to the Quota API you mentioned?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at this PR #1709

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing to PR #1709! The QuotaPolicy approach looks comprehensive. I noticed it's been in progress for a while now — looking forward to seeing it land.

@yuzisun
Copy link
Copy Markdown
Contributor

yuzisun commented Jan 19, 2026

@jonasHanhan @nacx i do not think this is the right way to implement per backend quota which should use the new Quota API designed for this purpose. The current token rate limit is at the route level so we can’t copy it from route to backend.


- Each `AIGatewayRoute` has its own `llmRequestCosts` configuration
- Different routes can use the same `metadataKey` (e.g., `billing_charges`) with different CEL expressions
- The cost calculation is automatically applied per-backend, allowing fine-grained cost control
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the quota needs to be applied at backend level then it should attach the policy to AIServiceBackend instead of AIGatewayRoute.

@jonasHanhan jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from 5cb084b to 03b6a0b Compare January 19, 2026 00:33
@yuzisun
Copy link
Copy Markdown
Contributor

yuzisun commented Mar 1, 2026

@jonasHanhan i think we still need this PR but the fix is to aggregate the cost expressions at the route level instead of backend in filter config.

@jonasHanhan
Copy link
Copy Markdown
Contributor Author

jonasHanhan commented Mar 2, 2026

Thanks for the clarification - makes sense to keep this at route level.

I will rework this PR to remove backend-level LLMRequestCosts in filter config and instead aggregate cost expressions at route level.

Proposed behavior:

  • Keep llmRequestCosts defined on AIGatewayRoute.
  • During filter-config generation, aggregate per-route costs by metadataKey into route-level expressions.
  • Use the selected backend context only inside the aggregated expression (instead of storing costs on each backend entry).

Before I implement, can you confirm this is the expected aggregation model?
Also, should we keep or remove the global fallback path after v0.5 config version checks?

If this matches your expectation, I will push the refactor and update docs accordingly.

@jonasHanhan jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch 2 times, most recently from fc61e06 to 767f6c7 Compare March 2, 2026 08:10
@yuzisun
Copy link
Copy Markdown
Contributor

yuzisun commented Mar 2, 2026

Thanks for the clarification - makes sense to keep this at route level.

I will rework this PR to remove backend-level LLMRequestCosts in filter config and instead aggregate cost expressions at route level.

Proposed behavior:

  • Keep llmRequestCosts defined on AIGatewayRoute.
  • During filter-config generation, aggregate per-route costs by metadataKey into route-level expressions.
  • Use the selected backend context only inside the aggregated expression (instead of storing costs on each backend entry).

Before I implement, can you confirm this is the expected aggregation model? Also, should we keep or remove the global fallback path after v0.5 config version checks?

If this matches your expectation, I will push the refactor and update docs accordingly.

Hi @jonasHanhan, per backend cost tracking is handled in my PR #1869 with QuotaPolicy API(applies to AIServiceBackend). This one here is for tracking quota at the route level defined with BackendTrafficPolicy (only applies to HTTPRoute). We can aggregate per-route costs across all the routes and maintain a map, we still need a request attribute that tells which route is matched and currently we only have the information of the backend.

A few items we need to do here:

  1. internal/internalapi/internalapi.go — Add route name constants
  • Add InternalMetadataRouteNameKey = "route_name" constant
  • Add XDSRouteMetadataRouteNamePath = "xds.route_metadata.filter_metadata['aigateway.envoy.io']['route_name']" constant
  1. internal/llmcostcel/cel.go — Add route_name CEL variable
  2. internal/controller/gateway.go — Route-level cost aggregation
  3. internal/extensionserver/post_translate_modify.go — Add route metadata to xDS routes
  • In enableRouterLevelAIGatewayExtProcOnRoute: for each AI-Gateway-generated route, add route name to route.Metadata.FilterMetadata['aigateway.envoy.io']['route_name']
  • Parse the AIGatewayRoute namespace/name from route.Name (format: httproute///rule/...)
  • Store as namespace/name in the metadata
  • In maybeModifyCluster (upstream ext_proc config, ~line 295): add internalapi.XDSRouteMetadataRouteNamePath to extProcConfig.RequestAttributes
  1. internal/extproc/server.go — Resolve route name from attributes
  2. internal/extproc/processor_impl.go — Store and use route name

@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Mar 3, 2026
@jonasHanhan jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch 3 times, most recently from 9b12588 to c0db777 Compare March 3, 2026 03:39
Comment thread internal/controller/gateway.go Outdated
expr := "uint(0)"
// Keep "last definition wins" semantics for duplicate metadata keys by
// layering conditions in declaration order.
for i := 0; i < len(costs); i++ {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use the simpler for _, cost := range costs since it doesn't need the index

return false
}

func routeNameFromRouteConfigName(routeConfigName string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider documenting the route name format assumption, the assumption that route names follow httproute/<namespace>/<name>/... should be documented.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasHanhan could you please address this?

Comment thread internal/controller/gateway.go Outdated
Comment on lines +252 to +254
if !hasRouteBackend || len(routeCosts) == 0 {
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasRouteBackend can be checked outside of the function instead of passing as function parameters.

@jonasHanhan jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch 5 times, most recently from c9a584b to bcec26d Compare March 9, 2026 04:49
@jonasHanhan jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from 44ea243 to d4b2545 Compare March 17, 2026 00:47
Comment thread internal/extensionserver/post_translate_modify.go Outdated
return false
}

func routeNameFromRouteConfigName(routeConfigName string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasHanhan could you please address this?

@jonasHanhan jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from d4b2545 to 9109ff9 Compare March 19, 2026 00:07
Signed-off-by: jonasHanhan <zqhmusic121@gmail.com>
@jonasHanhan jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from 53e31f2 to b01ab63 Compare March 19, 2026 01:00
@aabchoo
Copy link
Copy Markdown
Contributor

aabchoo commented Mar 20, 2026

@jonasHanhan could you please not force push each time you make a change? It's very difficult to track what has been modified.

aabchoo added 5 commits March 30, 2026 16:38
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
@aabchoo
Copy link
Copy Markdown
Contributor

aabchoo commented Apr 1, 2026

Hey @jonasHanhan, I'm going to take over this PR as this change is urgent.

}
if len(routeBackendNames) > 0 {
if err = aggregateRouteLLMRequestCosts(llmCostsByMetadata, &llmCostMetadataOrder, routeName, aiGatewayRoute.Spec.LLMRequestCosts); err != nil {
return false, fmt.Errorf("failed to aggregate LLMRequestCosts for route %s: %w", aiGatewayRoute.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once bad llm request cost can prevent the entire filter config from updating

thoughts cc @yuzisun

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extproc also can not come up if there is bad cel expression
https://github.com/envoyproxy/ai-gateway/blob/main/internal/extproc/server.go#L82. Either we need to validate the cel expression upfront or user need to fix the cel expression to unblock the reconcilation.

@sukumargaonkar
Copy link
Copy Markdown
Contributor

hey, @jonasHanhan thanks for working on this! I've been looking at the per-backend LLM cost implementation and noticed a potential issue with the current approach of aggregating costs into a single CEL expression per metadataKey. workshopped my suggestion in jonasHanhan#1

The issue

When multiple routes define costs for the same metadataKey, the controller builds one nested ternary CEL expression like:

route_name == 'ns/route1' ? (uint(input_tokens)) : (route_name == 'ns/route2' ? (uint(input_tokens)) : (uint(0)))

This grows linearly with the number of routes. In clusters with many routes sharing the same metadata key, this can hit CEL-Go compile limits or make expressions unwieldy to debug.

Alternative approach

I've pushed an alternative implementation to my fork and a PR to your branch (See jonasHanhan#1) that stores per-route costs instead of aggregating them into a single expression.

Changes

Filter API (internal/filterapi/filterconfig.go)

  • Added RouteName field to LLMRequestCost (empty = wildcard). The controller sets this to namespace/name for each route's cost rules.

Controller (internal/controller/gateway.go)

  • Removed the aggregation helpers (aggregateRouteLLMRequestCosts, aggregatedLLMRequestCosts, routeLLMRequestCost, routeNameMatchCondition, llmRequestCostToCELExpression).
  • Replaced with aigwLLMRequestCostToFilterAPI that converts each CRD cost to a filter-config row with its RouteName, preserving the original type (InputToken, OutputToken, etc.) instead of converting everything to synthetic CEL.
  • Same-route duplicate metadataKeys are deduped at config-build time (last definition wins), so the extproc never needs to resolve duplicates at request time.

Extproc (internal/extproc/processor_impl.go)

  • buildDynamicMetadata is now a single forward scan: for each cost row, if the route matches, evaluate and write to metadata. Rows scoped to a different route are skipped entirely (no zero default emitted for non-matching routes).
  • Extracted evalRuntimeRequestCost and routeMatchesCost helpers for clarity.

API docs (api/v1alpha1/ai_gateway_route.go, api/v1beta1/ai_gateway_route.go)

  • Updated the LLMRequestCosts field comment to describe per-route scoping instead of the old "pick one and ignore the rest" language.

Benefits

  • No expression size limits regardless of route count
  • Simpler per-route CEL expressions (user-authored only, no synthetic routing logic)
  • Easier validation errors (points at a specific route's rule)
  • Route dispatch in Go is straightforward and testable
  • Non-matching routes don't emit spurious zero-valued metadata keys

Trade-offs

  • More rows in filter config (one per route-cost pair vs. one per unique metadataKey)

Backward compatibility

The new routeName field doesn't require coordinated deployment. The existing Config.Version guard in the extproc config watcher rejects any config where the version doesn't match its own binary version, so during a rolling upgrade the extproc simply keeps its previous config until both components are at the same version.

Tests

Updated all controller and extproc tests to reflect the new structure:

  • Controller tests assert flat per-route rows with routeName instead of aggregated CEL strings
  • Duplicate metadataKey test verifies controller-side dedup (last wins)
  • New extproc tests for route-scoped matching, wildcard routes, and cross-route key isolation

Happy to discuss if this direction makes sense or if there are aspects of the aggregated approach I'm missing. Let me know if you'd like me to open a separate PR or if you'd prefer to incorporate these changes here.

@aabchoo aabchoo changed the title fix: support per-backend LLMRequestCosts for different cost formulas fix: support per-route LLMRequestCosts for different cost formulas Apr 6, 2026
@sukumargaonkar
Copy link
Copy Markdown
Contributor

Notes from Community meeting on 6th April

yuzisun pushed a commit that referenced this pull request Apr 20, 2026
**Description**
This PR is an alternative to
#1780 's implementation. It
fixes pre-route LLMRequestCosts configuration, allowing different routes
to use different CEL calculation formulas for the same metadataKey.

Previously even if a user configures different LLMRequestsCosts for each
route with same metadataKey, the implementation would pick one of the
LLMRequestsCosts and apply it globally to all the routes. this PR fixes
that implementation

#1780 also implemented the
same thing but by combining CEL expressions from all routes into a
single CEL expression. this ran into CEL length limitations. This PR
instead stores individual route-specific CELs in extproc's filterconfig
and allows extrpoc to choose among them based on the route used by the
current requests.

This PR also introduces GlobalLLMRequestCosts in GatewayConfig allowing
to configure LLMRequestCosts that apply to all routes. during conflicts
route-specific costs are prioritized over global costs.

**Breaking Change:**
This PR fixes the existing issue where if one ai-gateway route CR had
LLMRequestCosts configured they will be applied to all route (even if
they are not configured with it). After this PR is merged, costs will
apply only to the route on which they are configured. Users can instead
configure GatewayConfig.GlobalLLMRequestCosts to configure costs that
apply to all routes.


**Related Issues/PRs (if applicable)**
Fixes: #1688
Related PR: #1780

---------

Signed-off-by: jonasHanhan <zqhmusic121@gmail.com>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Co-authored-by: jonasHanhan <zqhmusic121@gmail.com>
Co-authored-by: Aaron Choo <achoo30@bloomberg.net>
@sukumargaonkar
Copy link
Copy Markdown
Contributor

we can close this PR as above PR was merged

@aabchoo aabchoo closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Different CEL calculation formulas are used across multiple models.

7 participants