Skip to content

Configurable Request Rate Limiting (RPM/RPS/RPH) #691

@Unbeaten4350

Description

@Unbeaten4350

Your idea

Implement a per-model rate-limiting setting that allows users to define a maximum number of requests allowed within a specific time unit (Seconds, Minutes, or Hours).

Instead of allowing requests to fail when a provider's limit is reached, ThunderAI should automatically queue or delay outgoing requests to match the user-defined threshold. For example, a user could set a "1 Request per 1 Minute" limit for specific models to ensure continuous operation without manual retries.

Value

Error Prevention: Eliminates disruptive 429 (Too Many Requests) errors from API providers, leading to a much smoother user experience.

Workflow Automation: Enables "set and forget" batch processing for users on restricted or free-tier plans (such as Mistral’s 1 RPS limit) who would otherwise have to trigger requests manually.

Cost Control: Helps users stay within specific usage quotas to avoid unexpected overage charges from paid API providers.

Additional information

Specific Use Case: Users utilizing Mistral AI’s free tier are currently limited to 1 request per minute. Without this feature, sending two prompts back-to-back results in an immediate failure.

Suggested UI: A simple numeric input field next to a dropdown menu containing "per Second," "per Minute," and "per Hour" within the model connection settings.

Backend Logic: Ideally, this would use a "leaky bucket" or "fixed window" queuing algorithm to pause the execution of the next request until the appropriate time window has passed.

Edit: Love the add-on, btw, appreciate your work on this!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions