Your idea
Implement a per-model rate-limiting setting that allows users to define a maximum number of requests allowed within a specific time unit (Seconds, Minutes, or Hours).
Instead of allowing requests to fail when a provider's limit is reached, ThunderAI should automatically queue or delay outgoing requests to match the user-defined threshold. For example, a user could set a "1 Request per 1 Minute" limit for specific models to ensure continuous operation without manual retries.
Value
Error Prevention: Eliminates disruptive 429 (Too Many Requests) errors from API providers, leading to a much smoother user experience.
Workflow Automation: Enables "set and forget" batch processing for users on restricted or free-tier plans (such as Mistral’s 1 RPS limit) who would otherwise have to trigger requests manually.
Cost Control: Helps users stay within specific usage quotas to avoid unexpected overage charges from paid API providers.
Additional information
Specific Use Case: Users utilizing Mistral AI’s free tier are currently limited to 1 request per minute. Without this feature, sending two prompts back-to-back results in an immediate failure.
Suggested UI: A simple numeric input field next to a dropdown menu containing "per Second," "per Minute," and "per Hour" within the model connection settings.
Backend Logic: Ideally, this would use a "leaky bucket" or "fixed window" queuing algorithm to pause the execution of the next request until the appropriate time window has passed.
Edit: Love the add-on, btw, appreciate your work on this!
Your idea
Implement a per-model rate-limiting setting that allows users to define a maximum number of requests allowed within a specific time unit (Seconds, Minutes, or Hours).
Instead of allowing requests to fail when a provider's limit is reached, ThunderAI should automatically queue or delay outgoing requests to match the user-defined threshold. For example, a user could set a "1 Request per 1 Minute" limit for specific models to ensure continuous operation without manual retries.
Value
Error Prevention: Eliminates disruptive 429 (Too Many Requests) errors from API providers, leading to a much smoother user experience.
Workflow Automation: Enables "set and forget" batch processing for users on restricted or free-tier plans (such as Mistral’s 1 RPS limit) who would otherwise have to trigger requests manually.
Cost Control: Helps users stay within specific usage quotas to avoid unexpected overage charges from paid API providers.
Additional information
Specific Use Case: Users utilizing Mistral AI’s free tier are currently limited to 1 request per minute. Without this feature, sending two prompts back-to-back results in an immediate failure.
Suggested UI: A simple numeric input field next to a dropdown menu containing "per Second," "per Minute," and "per Hour" within the model connection settings.
Backend Logic: Ideally, this would use a "leaky bucket" or "fixed window" queuing algorithm to pause the execution of the next request until the appropriate time window has passed.
Edit: Love the add-on, btw, appreciate your work on this!