fix: support per-route LLMRequestCosts for different cost formulas by jonasHanhan · Pull Request #1780 · envoyproxy/ai-gateway

jonasHanhan · 2026-01-16T06:26:37Z

Description

This commit enables per-route LLMRequestCosts configuration, allowing different routes to use different CEL calculation formulas for the same metadataKey.

Previously, LLMRequestCosts were gateway-scoped and deduplicated by metadataKey, causing incorrect cost calculations when multiple routes needed different formulas (e.g., free model with cost=0 vs paid model with cost=input_tokens*30+output_tokens*150).

Changes:

Add LLMRequestCosts field to Backend struct
Copy route's LLMRequestCosts to each backend during reconciliation
Compile backend-level CEL programs in NewRuntimeConfig
Use backend-level costs in ProcessResponseBody with fallback to global config

Related Issues/PRs (if applicable)

Fixes #1688

Special notes for reviewers (if applicable)

AI tools were used to assist with test generation and code review
All unit tests pass (make precommit passes)
Backward compatible: if backend has no LLMRequestCosts, falls back to global config

codecov-commenter · 2026-01-16T06:29:24Z

Codecov Report

❌ Patch coverage is 71.24183% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.25%. Comparing base (7610c0e) to head (9e10044).

Files with missing lines	Patch %	Lines
internal/extensionserver/post_translate_modify.go	41.30%	23 Missing and 4 partials ⚠️
internal/controller/gateway.go	83.33%	7 Missing and 5 partials ⚠️
internal/extproc/server.go	62.50%	2 Missing and 1 partial ⚠️
internal/extproc/processor.go	0.00%	1 Missing ⚠️
internal/extproc/processor_impl.go	94.73%	1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (71.24%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1780      +/-   ##
==========================================
- Coverage   84.38%   84.25%   -0.13%     
==========================================
  Files         130      130              
  Lines       17985    18086     +101     
==========================================
+ Hits        15176    15238      +62     
- Misses       1867     1896      +29     
- Partials      942      952      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mathetake · 2026-01-16T16:36:51Z

+				// Assign the route's LLMRequestCosts to this backend.
+				// This ensures each backend has its own cost calculation configuration,
+				// allowing different routes to use different CEL expressions for the same metadataKey.
+				b.LLMRequestCosts, err = llmRequestCostsToFilterAPI(aiGatewayRoute.Spec.LLMRequestCosts)


let's construct this once instead of repeating it inside the for loop here

Done! Moved the llmRequestCostsToFilterAPI call outside the backendRef loop. Thanks for the review!

mathetake · 2026-01-16T16:39:07Z

+	// Use backend-level request costs if available, otherwise fall back to global config.
+	requestCosts := u.backendRequestCosts
+	if len(requestCosts) == 0 {
+		requestCosts = u.parent.config.RequestCosts
+	}


when this happens? We can delete the global level ones if we merge this after v0.5 which comes with the proper config version checking which allows us to avoid breaking changes

The global fallback happens when the backend doesn't have any LLMRequestCosts configured (for backward compatibility with existing configurations that only use the global LLMRequestCosts).

If you'd like me to remove the global level fallback since this will be merged after v0.5 with the config version checking, I'm happy to do that. Just let me know!

nacx

Can you add a section on the website page that explains cost-based rate limiting, with a small example on how to achieve different costs per backend? This could be a useful feature but it's not easy to spot that the way of doing that with the current API is creating N AIGatewayRoutes.

yuzisun · 2026-01-19T00:11:57Z

 	BodyMutation *HTTPBodyMutation `json:"httpBodyMutation,omitempty"`
+	// LLMRequestCosts configures the cost calculation for this backend. Optional.
+	// This allows different routes/backends to have different cost calculation formulas.
+	LLMRequestCosts []LLMRequestCost `json:"llmRequestCosts,omitempty"`


If you want to track per backend then it should use the new Quota API. The BackendTrafficPolicy is applied at the route level.

Thanks for the feedback! I see there's a TODO in ai_service_backend.go:78-79 suggesting backend-level LLMRequestCost configuration:

// TODO: maybe add backend-level LLMRequestCost configuration that overrides the AIGatewayRoute-level LLMRequestCost.

My current implementation copies the route's LLMRequestCosts to each backend to match @aabchoo's per-backendRef proposal mentioned in #1688.

Would you prefer I implement this at the AIServiceBackend level instead (as the TODO suggests)? I'm happy to refactor if that's the preferred direction. Could you also point me to the Quota API you mentioned?

Take a look at this PR #1709

Thanks for pointing to PR #1709! The QuotaPolicy approach looks comprehensive. I noticed it's been in progress for a while now — looking forward to seeing it land.

yuzisun · 2026-01-19T00:15:45Z

@jonasHanhan @nacx i do not think this is the right way to implement per backend quota which should use the new Quota API designed for this purpose. The current token rate limit is at the route level so we can’t copy it from route to backend.

yuzisun · 2026-01-19T00:26:04Z

+
+- Each `AIGatewayRoute` has its own `llmRequestCosts` configuration
+- Different routes can use the same `metadataKey` (e.g., `billing_charges`) with different CEL expressions
+- The cost calculation is automatically applied per-backend, allowing fine-grained cost control


If the quota needs to be applied at backend level then it should attach the policy to AIServiceBackend instead of AIGatewayRoute.

yuzisun · 2026-03-01T17:04:00Z

@jonasHanhan i think we still need this PR but the fix is to aggregate the cost expressions at the route level instead of backend in filter config.

jonasHanhan · 2026-03-02T02:17:51Z

Thanks for the clarification - makes sense to keep this at route level.

I will rework this PR to remove backend-level LLMRequestCosts in filter config and instead aggregate cost expressions at route level.

Proposed behavior:

Keep llmRequestCosts defined on AIGatewayRoute.
During filter-config generation, aggregate per-route costs by metadataKey into route-level expressions.
Use the selected backend context only inside the aggregated expression (instead of storing costs on each backend entry).

Before I implement, can you confirm this is the expected aggregation model?
Also, should we keep or remove the global fallback path after v0.5 config version checks?

If this matches your expectation, I will push the refactor and update docs accordingly.

yuzisun · 2026-03-02T12:49:00Z

Thanks for the clarification - makes sense to keep this at route level.

I will rework this PR to remove backend-level LLMRequestCosts in filter config and instead aggregate cost expressions at route level.

Proposed behavior:

Keep llmRequestCosts defined on AIGatewayRoute.

During filter-config generation, aggregate per-route costs by metadataKey into route-level expressions.

Use the selected backend context only inside the aggregated expression (instead of storing costs on each backend entry).

Before I implement, can you confirm this is the expected aggregation model? Also, should we keep or remove the global fallback path after v0.5 config version checks?

If this matches your expectation, I will push the refactor and update docs accordingly.

Hi @jonasHanhan, per backend cost tracking is handled in my PR #1869 with QuotaPolicy API(applies to AIServiceBackend). This one here is for tracking quota at the route level defined with BackendTrafficPolicy (only applies to HTTPRoute). We can aggregate per-route costs across all the routes and maintain a map, we still need a request attribute that tells which route is matched and currently we only have the information of the backend.

A few items we need to do here:

internal/internalapi/internalapi.go — Add route name constants

Add InternalMetadataRouteNameKey = "route_name" constant
Add XDSRouteMetadataRouteNamePath = "xds.route_metadata.filter_metadata['aigateway.envoy.io']['route_name']" constant

internal/llmcostcel/cel.go — Add route_name CEL variable
internal/controller/gateway.go — Route-level cost aggregation
internal/extensionserver/post_translate_modify.go — Add route metadata to xDS routes

In enableRouterLevelAIGatewayExtProcOnRoute: for each AI-Gateway-generated route, add route name to route.Metadata.FilterMetadata['aigateway.envoy.io']['route_name']
Parse the AIGatewayRoute namespace/name from route.Name (format: httproute///rule/...)
Store as namespace/name in the metadata
In maybeModifyCluster (upstream ext_proc config, ~line 295): add internalapi.XDSRouteMetadataRouteNamePath to extProcConfig.RequestAttributes

internal/extproc/server.go — Resolve route name from attributes
internal/extproc/processor_impl.go — Store and use route name

yuzisun · 2026-03-03T12:32:36Z

+		expr := "uint(0)"
+		// Keep "last definition wins" semantics for duplicate metadata keys by
+		// layering conditions in declaration order.
+		for i := 0; i < len(costs); i++ {


could use the simpler for _, cost := range costs since it doesn't need the index

yuzisun · 2026-03-03T12:35:51Z

 	return false
 }

+func routeNameFromRouteConfigName(routeConfigName string) string {


Consider documenting the route name format assumption, the assumption that route names follow httproute/<namespace>/<name>/... should be documented.

@jonasHanhan could you please address this?

yuzisun · 2026-03-03T12:46:50Z

+	if !hasRouteBackend || len(routeCosts) == 0 {
+		return nil
+	}


hasRouteBackend can be checked outside of the function instead of passing as function parameters.

aabchoo · 2026-03-18T09:49:39Z

 	return false
 }

+func routeNameFromRouteConfigName(routeConfigName string) string {


@jonasHanhan could you please address this?

Signed-off-by: jonasHanhan <zqhmusic121@gmail.com>

aabchoo · 2026-03-20T16:53:36Z

@jonasHanhan could you please not force push each time you make a change? It's very difficult to track what has been modified.

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

aabchoo · 2026-04-01T18:35:10Z

Hey @jonasHanhan, I'm going to take over this PR as this change is urgent.

aabchoo · 2026-04-02T21:40:36Z

 		}
+		if len(routeBackendNames) > 0 {
+			if err = aggregateRouteLLMRequestCosts(llmCostsByMetadata, &llmCostMetadataOrder, routeName, aiGatewayRoute.Spec.LLMRequestCosts); err != nil {
+				return false, fmt.Errorf("failed to aggregate LLMRequestCosts for route %s: %w", aiGatewayRoute.Name, err)


once bad llm request cost can prevent the entire filter config from updating

thoughts cc @yuzisun

extproc also can not come up if there is bad cel expression
https://github.com/envoyproxy/ai-gateway/blob/main/internal/extproc/server.go#L82. Either we need to validate the cel expression upfront or user need to fix the cel expression to unblock the reconcilation.

sukumargaonkar · 2026-04-04T15:46:05Z

hey, @jonasHanhan thanks for working on this! I've been looking at the per-backend LLM cost implementation and noticed a potential issue with the current approach of aggregating costs into a single CEL expression per metadataKey. workshopped my suggestion in jonasHanhan#1

The issue

When multiple routes define costs for the same metadataKey, the controller builds one nested ternary CEL expression like:

route_name == 'ns/route1' ? (uint(input_tokens)) : (route_name == 'ns/route2' ? (uint(input_tokens)) : (uint(0)))

This grows linearly with the number of routes. In clusters with many routes sharing the same metadata key, this can hit CEL-Go compile limits or make expressions unwieldy to debug.

Alternative approach

I've pushed an alternative implementation to my fork and a PR to your branch (See jonasHanhan#1) that stores per-route costs instead of aggregating them into a single expression.

Changes

Filter API (internal/filterapi/filterconfig.go)

Added RouteName field to LLMRequestCost (empty = wildcard). The controller sets this to namespace/name for each route's cost rules.

Controller (internal/controller/gateway.go)

Removed the aggregation helpers (aggregateRouteLLMRequestCosts, aggregatedLLMRequestCosts, routeLLMRequestCost, routeNameMatchCondition, llmRequestCostToCELExpression).
Replaced with aigwLLMRequestCostToFilterAPI that converts each CRD cost to a filter-config row with its RouteName, preserving the original type (InputToken, OutputToken, etc.) instead of converting everything to synthetic CEL.
Same-route duplicate metadataKeys are deduped at config-build time (last definition wins), so the extproc never needs to resolve duplicates at request time.

Extproc (internal/extproc/processor_impl.go)

buildDynamicMetadata is now a single forward scan: for each cost row, if the route matches, evaluate and write to metadata. Rows scoped to a different route are skipped entirely (no zero default emitted for non-matching routes).
Extracted evalRuntimeRequestCost and routeMatchesCost helpers for clarity.

API docs (api/v1alpha1/ai_gateway_route.go, api/v1beta1/ai_gateway_route.go)

Updated the LLMRequestCosts field comment to describe per-route scoping instead of the old "pick one and ignore the rest" language.

Benefits

No expression size limits regardless of route count
Simpler per-route CEL expressions (user-authored only, no synthetic routing logic)
Easier validation errors (points at a specific route's rule)
Route dispatch in Go is straightforward and testable
Non-matching routes don't emit spurious zero-valued metadata keys

Trade-offs

More rows in filter config (one per route-cost pair vs. one per unique metadataKey)

Backward compatibility

The new routeName field doesn't require coordinated deployment. The existing Config.Version guard in the extproc config watcher rejects any config where the version doesn't match its own binary version, so during a rolling upgrade the extproc simply keeps its previous config until both components are at the same version.

Tests

Updated all controller and extproc tests to reflect the new structure:

Controller tests assert flat per-route rows with routeName instead of aggregated CEL strings
Duplicate metadataKey test verifies controller-side dedup (last wins)
New extproc tests for route-scoped matching, wildcard routes, and cross-route key isolation

Happy to discuss if this direction makes sense or if there are aspects of the aggregated approach I'm missing. Let me know if you'd like me to open a separate PR or if you'd prefer to incorporate these changes here.

sukumargaonkar · 2026-04-07T23:05:00Z

Notes from Community meeting on 6th April

changes from fix: support per-route LLMRequestCosts for different cost formulas #1780 (comment) to be incorporated in this branch
add field in GatewayConfig CR to allow configuring globalRequestCosts that apply to all routes

**Description** This PR is an alternative to #1780 's implementation. It fixes pre-route LLMRequestCosts configuration, allowing different routes to use different CEL calculation formulas for the same metadataKey. Previously even if a user configures different LLMRequestsCosts for each route with same metadataKey, the implementation would pick one of the LLMRequestsCosts and apply it globally to all the routes. this PR fixes that implementation #1780 also implemented the same thing but by combining CEL expressions from all routes into a single CEL expression. this ran into CEL length limitations. This PR instead stores individual route-specific CELs in extproc's filterconfig and allows extrpoc to choose among them based on the route used by the current requests. This PR also introduces GlobalLLMRequestCosts in GatewayConfig allowing to configure LLMRequestCosts that apply to all routes. during conflicts route-specific costs are prioritized over global costs. **Breaking Change:** This PR fixes the existing issue where if one ai-gateway route CR had LLMRequestCosts configured they will be applied to all route (even if they are not configured with it). After this PR is merged, costs will apply only to the route on which they are configured. Users can instead configure GatewayConfig.GlobalLLMRequestCosts to configure costs that apply to all routes. **Related Issues/PRs (if applicable)** Fixes: #1688 Related PR: #1780 --------- Signed-off-by: jonasHanhan <zqhmusic121@gmail.com> Signed-off-by: Aaron Choo <achoo30@bloomberg.net> Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net> Co-authored-by: jonasHanhan <zqhmusic121@gmail.com> Co-authored-by: Aaron Choo <achoo30@bloomberg.net>

sukumargaonkar · 2026-04-27T14:07:04Z

we can close this PR as above PR was merged

jonasHanhan requested a review from a team as a code owner January 16, 2026 06:26

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 16, 2026

mathetake reviewed Jan 16, 2026

View reviewed changes

nacx reviewed Jan 16, 2026

View reviewed changes

yuzisun reviewed Jan 19, 2026

View reviewed changes

jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from 5cb084b to 03b6a0b Compare January 19, 2026 00:33

jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch 2 times, most recently from fc61e06 to 767f6c7 Compare March 2, 2026 08:10

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Mar 3, 2026

jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch 3 times, most recently from 9b12588 to c0db777 Compare March 3, 2026 03:39

yuzisun reviewed Mar 3, 2026

View reviewed changes

jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch 5 times, most recently from c9a584b to bcec26d Compare March 9, 2026 04:49

jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from 44ea243 to d4b2545 Compare March 17, 2026 00:47

aabchoo reviewed Mar 18, 2026

View reviewed changes

jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from d4b2545 to 9109ff9 Compare March 19, 2026 00:07

fix: support route_name based LLMRequestCosts

b01ab63

Signed-off-by: jonasHanhan <zqhmusic121@gmail.com>

jonasHanhan force-pushed the fix/per-backend-llm-request-costs branch from 53e31f2 to b01ab63 Compare March 19, 2026 01:00

aabchoo added 5 commits March 30, 2026 16:38

increase map memory allocation size due to new metadata

430e163

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

Merge branch 'main' into fix/per-backend-llm-request-costs

b74f52b

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

update to use aigv1b1

a96f633

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

add documentation on assumption

7e1f99c

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

Update usage-based-ratelimiting.md

f042ba2

Signed-off-by: Aaron Choo <achoo30@bloomberg.net>

Merge branch 'main' into fix/per-backend-llm-request-costs

9e10044

aabchoo reviewed Apr 2, 2026

View reviewed changes

aabchoo changed the title ~~fix: support per-backend LLMRequestCosts for different cost formulas~~ fix: support per-route LLMRequestCosts for different cost formulas Apr 6, 2026

sukumargaonkar mentioned this pull request Apr 8, 2026

fix: route-scoped LLM request costs and add global defaults #2029

Merged

aabchoo closed this Apr 27, 2026

Conversation

jonasHanhan commented Jan 16, 2026 • edited by aabchoo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nacx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzisun commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzisun commented Mar 1, 2026

Uh oh!

jonasHanhan commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuzisun commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aabchoo commented Mar 20, 2026

Uh oh!

aabchoo commented Apr 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sukumargaonkar commented Apr 4, 2026

The issue

Alternative approach

Changes

Benefits

Trade-offs

Backward compatibility

Tests

Uh oh!

sukumargaonkar commented Apr 7, 2026

Uh oh!

sukumargaonkar commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jonasHanhan commented Jan 16, 2026 •

edited by aabchoo

Loading

codecov-commenter commented Jan 16, 2026 •

edited

Loading

yuzisun commented Jan 19, 2026 •

edited

Loading

jonasHanhan commented Mar 2, 2026 •

edited

Loading

yuzisun commented Mar 2, 2026 •

edited

Loading