Skip to content

Add rate limiting to protect CPU-intensive ML endpoints#103

Open
midaa1 wants to merge 3 commits into
ruxailab:mainfrom
midaa1:rate-limmiter
Open

Add rate limiting to protect CPU-intensive ML endpoints#103
midaa1 wants to merge 3 commits into
ruxailab:mainfrom
midaa1:rate-limmiter

Conversation

@midaa1
Copy link
Copy Markdown
Contributor

@midaa1 midaa1 commented Apr 8, 2026

What is this PR about?

Right now, the API has no rate limiting at all. This means anyone can send hundreds of requests in a short time, which
is a problem because our ML endpoints (calibration and prediction) use a lot of CPU. Without any protection, the
server can easily get overloaded or even crash if someone sends too many requests on purpose or by accident.

This PR adds rate limiting using flask-limiter so each IP address can only make a certain number of requests per
minute.

What did I change?

app/main.py

  • Added flask-limiter to the app. It tracks requests by IP address.
  • The calibration endpoint (/api/session/calib_validation) now allows only 10 requests per minute because it trains ML
    models and uses the most CPU.
  • The prediction endpoint (/api/session/batch_predict) allows 30 requests per minute since it is lighter but still
    needs protection.
  • There is also a global default limit of 200 requests per day and 50 per hour for all endpoints.
  • The health check endpoint (/api/session/health) has no limit because monitoring tools need to check it frequently
    without getting blocked.
  • When a client goes over the limit, they get a 429 Too Many Requests response.

requirements.txt

  • Added Flask-Limiter==3.12 as a new dependency.

tests/test_rate_limiting.py

  • Added 4 new tests:
    • Health endpoint is never rate limited even after many requests
    • Calibration endpoint returns 429 after 10 requests in a minute
    • Prediction endpoint returns 429 after 30 requests in a minute
    • The 429 response body contains a clear message about rate limiting

tests/conftest.py

  • Added a fixture that resets the rate limiter before each test so the limits don't carry over between tests and cause
    false failures.

Important note

Right now the rate limiter uses in-memory storage, which works fine for a single server. If we deploy with multiple
workers or multiple instances later, we will need to switch to Redis storage so the limits are shared across all
instances.

How to test

  • Run pytest tests/ -v and check that all 51 tests pass
  • You can also test manually by sending more than 10 POST requests to /api/session/calib_validation in one minute and
    checking that you get a 429 response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant