Skip to content

perf(redis): resilience, Redis-backed rate limiter, and health metrics (#278)#296

Open
Abhushan187 wants to merge 2 commits into
geturbackend:mainfrom
Abhushan187:perf/redis-resilience-rate-limiter
Open

perf(redis): resilience, Redis-backed rate limiter, and health metrics (#278)#296
Abhushan187 wants to merge 2 commits into
geturbackend:mainfrom
Abhushan187:perf/redis-resilience-rate-limiter

Conversation

@Abhushan187

@Abhushan187 Abhushan187 commented Jun 9, 2026

Copy link
Copy Markdown

Closes #278

Changes by file

File Change
packages/common/src/config/redis.js Exponential+jitter retry strategy, connection timeouts, keepalive, offline queue
apps/public-api/src/middlewares/api_usage.js Replace in-memory projectRateLimiter with rate-limit-redis store; keeps global IP limiter
apps/public-api/src/controllers/health.controller.js Add getRedisHealth with INFO stats/memory/clients metrics
apps/public-api/src/routes/health.js Mount GET /api/health/redis
apps/public-api/package.json Add rate-limit-redis dependency, autocannon devDependency
docker-compose.yml Redis maxclients 10000, tcp-keepalive 60, sysctls tuning
apps/public-api/tests/load/rate-limit.load.test.js Autocannon load test verifying 429 behavior under concurrency
apps/public-api/tests/resilience/redis-failure.resilience.test.js Jest tests for graceful degradation on Redis disconnect

Technical Notes

  • rate-limit-redis uses redis.call(...) for pipelining compatibility with ioredis 5.x.
  • The existing global IP-based limiter is untouched; only the project-scoped rate limiter is migrated to Redis.
  • Retry strategy caps at 30s base delay + 100ms jitter to avoid thundering herd on Redis restart.
  • maxRetriesPerRequest: 3 prevents hung requests during transient outages.
  • Health endpoint parses INFO output with regex to avoid extra dependencies.

Summary by CodeRabbit

  • New Features

    • Redis health monitoring endpoint for system status checks
    • Project-specific rate limiting with configurable per-project limits
  • Improvements

    • Improved Redis reconnection/backoff and connection behavior for greater resilience
    • Docker Redis configuration tuned for higher client capacity and stability
  • Tests

    • Added load and resilience tests for rate limiting and Redis failure scenarios
  • Chores

    • Added runtime and dev tooling dependencies for rate limiting and load testing

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds Redis resilience and tuning, a Redis-specific health endpoint, Redis-backed per-project rate limiting, Docker Redis config updates, and load/resilience tests; introduces rate-limit-redis and autocannon dependencies.

Changes

Redis Connection Resilience and Rate Limiting

Layer / File(s) Summary
Redis client resilience configuration and Docker setup
packages/common/src/config/redis.js, docker-compose.yml
Retry strategy now uses exponential backoff with jitter and stops after >10 attempts; ioredis options set (maxRetriesPerRequest=3, ready checks, timeouts, keep-alive, offlineQueue); Docker Redis pinned to redis:7-alpine with redis-server --maxclients 10000 --tcp-keepalive 60 and sysctls tuned.
Redis health check endpoint
apps/public-api/src/controllers/health.controller.js, apps/public-api/src/routes/health.js
Adds getRedisHealth which validates Redis readiness, obtains stats/memory/clients INFO, parses fields, and returns HTTP 200 with structured Redis health or HTTP 503 on failure; route mounted at GET /api/health/redis.
Project-specific Redis rate limiter
apps/public-api/package.json, apps/public-api/src/middlewares/api_usage.js
Adds rate-limit-redis dependency and projectRateLimiter middleware using a RedisStore with rl:project: prefix, 15-minute window, dynamic per-project max from req.project.rateLimit (default 500), project-ID key generation, and JSON over-limit responses.
Load and resilience testing suite
apps/public-api/package.json, apps/public-api/tests/load/rate-limit.load.test.js, apps/public-api/tests/resilience/redis-failure.resilience.test.js
Adds autocannon dev dependency; load test clears/verifies Redis rate-limit keys and runs a 100-connection 30s benchmark reporting 2xx/429 counts and latency metrics; resilience tests simulate Redis down, assert /api/health/redis returns 503, and verify API degrades without crashing.

Sequence Diagrams

sequenceDiagram
  participant Client
  participant PublicAPI
  participant projectRateLimiter
  participant RedisClient
  Client->>PublicAPI: request (x-api-key, project)
  PublicAPI->>projectRateLimiter: enforce per-project limit
  projectRateLimiter->>RedisClient: INCR/GET on rl:project:{id}
  RedisClient->>projectRateLimiter: counter + ttl
  alt below limit
    projectRateLimiter->>PublicAPI: allow request
    PublicAPI->>Client: 2xx
  else above limit
    projectRateLimiter->>Client: 429 JSON response
  end
Loading
sequenceDiagram
  participant Client
  participant HealthRoute
  participant getRedisHealth
  participant RedisClient
  Client->>HealthRoute: GET /api/health/redis
  HealthRoute->>getRedisHealth: invoke
  getRedisHealth->>RedisClient: check ready
  alt ready
    getRedisHealth->>RedisClient: redis.info('stats'|'memory'|'clients')
    RedisClient->>getRedisHealth: INFO text
    getRedisHealth->>Client: 200 + parsed health object
  else not ready/error
    getRedisHealth->>Client: 503 + error payload
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • geturbackend/urBackend#140: Modifies the same apps/public-api/src/middlewares/api_usage.js middleware (analytics vs. this PR's Redis-backed rate limiter).

Suggested labels

nsoc26, level-3, mentor:yash-pouranik

Suggested reviewers

  • yash-pouranik

Poem

🐰 In tunnels of code where the packets all play,

Redis now rests with retries that sway.
Limits by project keep burrows from throng,
Tests run the stampede to see nothing's wrong.
Hop, chew a carrot — the cache sings along!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changes: Redis resilience improvements, Redis-backed rate limiter implementation, and health metrics endpoint addition.
Linked Issues check ✅ Passed The PR addresses most core requirements from #278: shared Redis connection with retries, Redis-backed rate limiter, health metrics endpoint, Docker tuning, and load/resilience tests. However, some elements like explicit connection pooling (min/max connection sizing) and per-request overhead reduction for checkLockout/recordFailedAttempt are not fully demonstrated.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing Redis resilience, rate limiting, and health monitoring as outlined in issue #278. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/public-api/src/controllers/health.controller.js`:
- Around line 73-78: The catch block in the health controller currently returns
raw err.message to clients; update the handler in the health check function (the
try/catch that references res and err) to wrap or normalize the error using the
project's AppError (or an equivalent sanitizer) and return a safe public message
(e.g., "Service unavailable") instead of err.message; ensure the response still
includes redis.connected: false and an optional non-sensitive error code or
AppError.safeMessage but never expose MongoDB or raw backend messages directly.
- Around line 35-39: The getRedisHealth controller currently returns a raw
object ({ status, timestamp, redis }) — update all return responses in the
getRedisHealth function (and the other similar response blocks at the other two
places referenced) to conform to the required response envelope: return
res.status(...).json({ success: boolean, data: { timestamp: string, redis: {
connected: boolean, error?: string } }, message: string }); ensure success is
false for 503/500 errors and true for 200, place timestamp and redis under data,
and provide a concise message string (e.g., "Redis not ready" or "Redis
healthy") so clients receive { success, data, message }.

In `@apps/public-api/src/middlewares/api_usage.js`:
- Around line 22-48: The projectRateLimiter constant is never exported so
downstream code that imports { limiter, logger } can't use it; update the
module.exports object to include projectRateLimiter (alongside limiter and
logger) so the project-scoped Redis rate limiter is available to middleware
wiring; locate projectRateLimiter in this file and add its identifier to the
exported properties (ensure export naming matches how other modules import it).

In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js`:
- Around line 10-20: The test titled "health endpoint returns 503 when Redis is
unreachable" disconnects Redis with redis.disconnect() but only reconnects on
the happy path; update the test so the disconnect/reconnect is enclosed in a
try/finally (or use finally semantics) around the request and assertions,
calling redis.connect() in the finally block to guarantee reconnection even if
assertions fail; locate the test block (the async test function that calls
redis.disconnect(), request(app).get('/api/health/redis'), and redis.connect())
and modify it to always call redis.connect() in finally.
- Around line 6-8: The test teardown in afterEach currently calls
redis.flushdb(), which is too destructive for shared Redis; change it to only
remove keys created by this suite by tracking and deleting keys by prefix or
pattern instead. Modify the afterEach handler that references redis.flushdb() to
use a safe deletion strategy (e.g., SCAN/KEYS with the test-specific prefix or
maintain a list/set of created keys during tests and call DEL on those entries)
and ensure any helper that creates keys (tests that call redis.set/redis.hset)
uses the agreed prefix so keys can be reliably targeted for cleanup.
- Around line 29-31: The test currently accepts any of [200, 401, 403, 500]
which masks regressions; update the assertion on the response (the expect([...])
that checks res.status) to assert the narrow set required by the middleware
contract—preferably a single expected status (e.g., 401 or 500) or at most two
statuses that are explicitly allowed during Redis degradation—so change the
expect call on res.status in the redis-failure.resilience.test to that narrow
set and add a brief comment linking it to the expected degraded behavior from
the auth/middleware code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d9aae05-9154-48eb-be94-62a99a028370

📥 Commits

Reviewing files that changed from the base of the PR and between b6e59ab and 21e346c.

📒 Files selected for processing (8)
  • apps/public-api/package.json
  • apps/public-api/src/controllers/health.controller.js
  • apps/public-api/src/middlewares/api_usage.js
  • apps/public-api/src/routes/health.js
  • apps/public-api/tests/load/rate-limit.load.test.js
  • apps/public-api/tests/resilience/redis-failure.resilience.test.js
  • docker-compose.yml
  • packages/common/src/config/redis.js

Comment thread apps/public-api/src/controllers/health.controller.js
Comment thread apps/public-api/src/controllers/health.controller.js
Comment thread apps/public-api/src/middlewares/api_usage.js
Comment thread apps/public-api/tests/resilience/redis-failure.resilience.test.js
Comment thread apps/public-api/tests/resilience/redis-failure.resilience.test.js
Comment thread apps/public-api/tests/resilience/redis-failure.resilience.test.js
@yash-pouranik

Copy link
Copy Markdown
Collaborator

@Abhushan187
please fix the issues pointed by coderabbit.

@yash-pouranik

Copy link
Copy Markdown
Collaborator

and 4 CI checks are also failing.

Fixed rate-limit-redis to 3.1.0 in package.json. Could you run npm install and push the updated package-lock.json? I don't have the local env set up for lockfile generation.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/public-api/package.json (1)

1-45: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing package-lock.json update breaks reproducible builds.

The commit message notes that the author cannot generate the lockfile locally and requests it be committed. Without an updated package-lock.json:

  • CI/CD pipelines cannot install the exact dependency versions, leading to non-deterministic builds
  • Other developers will resolve different versions when running npm install
  • The four failing CI checks mentioned in PR comments may be related to this inconsistency

Before merging, ensure package-lock.json is regenerated and committed by running:

npm install
git add package-lock.json
git commit -m "chore: update package-lock.json for rate-limit-redis and autocannon"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/public-api/package.json` around lines 1 - 45, package.json was changed
(dependencies like "rate-limit-redis" and "autocannon" and devDeps
"jest"/"supertest") but package-lock.json was not updated; regenerate the
lockfile by running npm install in the public-api package to produce an updated
package-lock.json, verify it includes the new resolved versions for the
dependencies listed in package.json, then add and commit the package-lock.json
so CI and local installs are deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/public-api/package.json`:
- Line 31: The package change downgraded rate-limit-redis but the real security
fix is to upgrade express-rate-limit (CVE-2026-30827); update express-rate-limit
to a patched release (>=8.2.2) in package.json, run CI/unit/integration tests
exercising rate-limit middleware, and confirm the app still integrates with
rate-limit-redis@3.1.0 and ioredis@5.10.0 by verifying usage of
sendCommand/client.call and any peer-dependency expectations in the
rate-limit-redis and ioredis docs; update the PR description to state that no
published advisory targets rate-limit-redis@3.1.0 and that the security
remediation is the express-rate-limit upgrade.

---

Outside diff comments:
In `@apps/public-api/package.json`:
- Around line 1-45: package.json was changed (dependencies like
"rate-limit-redis" and "autocannon" and devDeps "jest"/"supertest") but
package-lock.json was not updated; regenerate the lockfile by running npm
install in the public-api package to produce an updated package-lock.json,
verify it includes the new resolved versions for the dependencies listed in
package.json, then add and commit the package-lock.json so CI and local installs
are deterministic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b7e8d74-0b88-4cab-8e8c-8d18e5d27514

📥 Commits

Reviewing files that changed from the base of the PR and between 21e346c and 125b769.

📒 Files selected for processing (1)
  • apps/public-api/package.json

"express": "5.1.0",
"express-rate-limit": "8.2.1",
"ioredis": "5.10.0",
"rate-limit-redis": "3.1.0",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for security advisories for rate-limit-redis 3.1.0
gh api graphql -f query='
{
  securityVulnerabilities(first: 10, ecosystem: NPM, package: "rate-limit-redis") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

# Check NPM for latest version and 3.1.0 release date
echo "=== rate-limit-redis versions ==="
npm view rate-limit-redis time --json | jq '.["3.1.0"], .["4.0.2"], .modified'

# Check compatibility notes in the package README or changelog
echo "=== Checking package metadata for 3.1.0 ==="
npm view rate-limit-redis@3.1.0 peerDependencies
npm view rate-limit-redis@3.1.0 dependencies

echo "=== Checking package metadata for 4.0.2 ==="
npm view rate-limit-redis@4.0.2 peerDependencies
npm view rate-limit-redis@4.0.2 dependencies

Repository: geturbackend/urBackend

Length of output: 152


🌐 Web query:

rate-limit-redis 3.1.0 vs 4.0.2 breaking changes compatibility ioredis 5.x

💡 Result:

The rate-limit-redis package maintains compatibility with ioredis 5.x throughout the v3 and v4 release cycles [1][2]. Transitioning from v3.1.0 to v4.0.2 involves several breaking changes and architectural shifts [3][4]. Breaking Changes (v3.1.0 to v4.0.2) 1. Node.js Support: Version 4.0.0 and later dropped support for Node.js v14 [3][4]. Users must be on Node.js 16 or higher [5][2]. 2. Dependency Updates: Version 4.0.0 introduced support for express-rate-limit v7 [3][4]. This often necessitates updating the express-rate-limit middleware alongside the store [6]. 3. Packaging Changes: Version 4.0.0 switched to using pkgroll for bundling, which may affect how the library is resolved or tree-shaken in some build environments [3][4]. Compatibility with ioredis 5.x ioredis 5.x remains fully compatible with both v3.1.0 and v4.x.x [1][2]. The integration pattern for ioredis relies on the sendCommand option, which has remained consistent [1][2]. For both versions, you should configure the store as follows [1][2]: new RedisStore({ sendCommand: (...args) => client.call(...args), }) Note that while the configuration pattern is stable, later versions (4.x and 5.x) have improved internal script handling and bug fixes, such as the resolution of race conditions affecting hit counters and TTLs [7]. Always ensure that each rate limiter instance uses its own new RedisStore instance to prevent counter collisions [6].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== peerDependencies for rate-limit-redis@3.1.0 ==="
npm view rate-limit-redis@3.1.0 peerDependencies --json || true

echo "=== peerDependencies for rate-limit-redis@4.0.2 ==="
npm view rate-limit-redis@4.0.2 peerDependencies --json || true

echo "=== dependencies for rate-limit-redis@3.1.0 ==="
npm view rate-limit-redis@3.1.0 dependencies --json || true

echo "=== dependencies for rate-limit-redis@4.0.2 ==="
npm view rate-limit-redis@4.0.2 dependencies --json || true

Repository: geturbackend/urBackend

Length of output: 152


🌐 Web query:

rate-limit-redis 3.1.0 security vulnerability advisory

💡 Result:

There are no known security vulnerability advisories specifically for the rate-limit-redis package version 3.1.0 [1]. It is possible that you may be conflating this package with a vulnerability in its core dependency, express-rate-limit. Specifically, a high-severity vulnerability (CVE-2026-30827) was recently identified in express-rate-limit (versions 8.0.0 through 8.2.1), which causes IPv4-mapped IPv6 addresses to share a single rate-limit bucket on dual-stack servers [2][3]. This issue has been patched in newer versions of express-rate-limit [3][4]. If you are using rate-limit-redis, ensure that your dependencies are up to date. The latest releases of rate-limit-redis (as of mid-2026) require express-rate-limit version 8.5.0 or higher [1]. Always verify security advisories against the specific package name, as vulnerability scanning tools may sometimes flag dependencies or misattribute issues [5][6].

Citations:


🌐 Web query:

rate-limit-redis 4.0.2 security vulnerability advisory

💡 Result:

There are no known security vulnerabilities specifically targeting the rate-limit-redis package itself [1]. However, the rate-limit-redis package is a store for the express-rate-limit middleware, and it is important to note that a vulnerability has been identified in the parent middleware, express-rate-limit, which may affect applications using it [2][3][4]. The vulnerability, CVE-2026-30827, affects express-rate-limit versions 8.0.0 through 8.2.1 [3][4]. It involves an issue where IPv4-mapped IPv6 addresses are handled incorrectly on dual-stack network servers, potentially allowing a single client to exhaust the shared rate limit bucket and trigger a denial-of-service for all IPv4 clients [3][4]. If your application uses express-rate-limit within this vulnerable range, it is recommended to upgrade express-rate-limit to version 8.0.2, 8.1.1, 8.2.2, or 8.3.0 and higher [3][4]. You should ensure that your project's dependencies are updated accordingly [3]. For ongoing monitoring, you can check for updates to your project's dependencies using tools like npm audit or security scanning platforms [2][1].

Citations:


Clarify rate-limit-redis downgrade risk and prioritize the real security fix

  • No known published security advisories specifically target rate-limit-redis@3.1.0; upstream indicates rate-limit-redis stays compatible with ioredis@5.x across the v3/v4 line (when using the standard sendCommand/client.call(...) integration).
  • The security problem is instead in express-rate-limit@8.2.1 (CVE-2026-30827; affects 8.0.08.2.1), so upgrading express-rate-limit to a patched release is required regardless of the rate-limit-redis version.
  • Confirm the downgrade doesn’t break integration assumptions between rate-limit-redis@3.1.0, express-rate-limit@8.2.1, and ioredis@5.10.0 (peer-dependency/integration expectations).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/public-api/package.json` at line 31, The package change downgraded
rate-limit-redis but the real security fix is to upgrade express-rate-limit
(CVE-2026-30827); update express-rate-limit to a patched release (>=8.2.2) in
package.json, run CI/unit/integration tests exercising rate-limit middleware,
and confirm the app still integrates with rate-limit-redis@3.1.0 and
ioredis@5.10.0 by verifying usage of sendCommand/client.call and any
peer-dependency expectations in the rate-limit-redis and ioredis docs; update
the PR description to state that no published advisory targets
rate-limit-redis@3.1.0 and that the security remediation is the
express-rate-limit upgrade.

@Abhushan187

Copy link
Copy Markdown
Author

@Abhushan187 please fix the issues pointed by coderabbit.

resolved

@yash-pouranik

Copy link
Copy Markdown
Collaborator

Still CI Fails.
@Abhushan187
Please run the test on local first
if all passes on local
only then commit to cloud

check lists on tests in ./github

@yash-pouranik

Copy link
Copy Markdown
Collaborator

also please sync ur brannch with main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PERF] Add Redis connection pooling for high-throughput public API endpoints to reduce connection overhead

2 participants