Skip to content

fix(resilience): apply 7 review fixes that didn't reach main#2729

Merged
koala73 merged 2 commits intomainfrom
fix/resilience-review-fixes-to-main
Apr 5, 2026
Merged

fix(resilience): apply 7 review fixes that didn't reach main#2729
koala73 merged 2 commits intomainfrom
fix/resilience-review-fixes-to-main

Conversation

@koala73
Copy link
Copy Markdown
Owner

@koala73 koala73 commented Apr 5, 2026

Summary

PRs #2660 and #2662 merged to feature branches, not main. When #2673 reimplemented handlers for main, it omitted the review fixes we applied during code review. This PR backports all 7 fixes.

Fixes

# Issue Impact
1 cronbach > 0 && guard skips negative alpha Countries with internally inconsistent dimensions show lowConfidence: false
2 cachedFetchJson timeout 300ms for 15+ Redis RTTs Score computation likely times out, returning overallScore: 0
3 Duplicate round() function Imported from resilience-stats instead
4 runRedisPipeline swallows HTTP errors silently Restored console.warn with status code
5 resilienceRanking not in health.js No staleness monitoring for ranking cache
6 No seed-meta write in ranking handler Health can't report freshness
7 Ranking + meta written independently Now atomic via single pipeline

Test plan

  • npm run typecheck + npm run typecheck:api
  • npx tsx --test tests/resilience-handlers.test.mts tests/resilience-stats.test.mts tests/resilience-dimension-scorers.test.mts
  • Verify health endpoint shows resilienceRanking after first Pro ranking request

These fixes were applied during review of PRs #2660/#2662 but those
PRs merged to feature branches, not main. PR #2673 reimplemented
handlers for main but omitted the review feedback.

Fixes:
1. Remove cronbach > 0 guard (negative alpha = worse reliability)
2. Increase cachedFetchJson timeout 300ms → 2000ms (15+ Redis RTTs)
3. Import round() from resilience-stats (remove duplicate)
4. Restore HTTP status logging in runRedisPipeline
5. Register resilienceRanking in health.js (STANDALONE + SEED_META + ON_DEMAND)
6. Write seed-meta:resilience:ranking on complete ranking materialization
7. Use atomic pipeline for ranking + meta (both succeed or fail together)
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Apr 5, 2026 1:58pm

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 5, 2026

Greptile Summary

This PR backports 7 review fixes from feature branches that were omitted when PR #2673 reimplemented the resilience handlers for main. All fixes are targeted, well-scoped, and address real production bugs (silent lowConfidence: false on inconsistent data, silent HTTP errors, missing health monitoring, non-atomic cache writes).

Key changes:

Confidence Score: 4/5

Safe to merge — all 7 fixes are targeted bug fixes with no new functionality, and the logic is straightforward and well-understood

Score of 4 (not 5) because (a) the diff is not available for direct inspection so analysis is based on the current file state, and (b) fix #2 (negativeTtlSeconds change) cannot be fully verified without seeing the before state. All visible code paths look correct.

server/worldmonitor/resilience/v1/_shared.ts — the negativeTtlSeconds=2000 argument is unreachable (fetcher never returns null) and could mislead future readers; server/worldmonitor/resilience/v1/get-resilience-ranking.ts — meta key constant is defined locally rather than co-located with sibling constants in _shared.ts

Important Files Changed

Filename Overview
api/health.js Correctly registers resilienceRanking in STANDALONE_KEYS, SEED_META (maxStaleMin: 720), and ON_DEMAND_KEYS for health staleness monitoring
server/_shared/redis.ts Restores console.warn with HTTP status code in runRedisPipeline, preventing silent swallowing of Redis HTTP errors
server/_shared/resilience-stats.ts Exports round() as the canonical rounding helper for the resilience subsystem
server/worldmonitor/resilience/v1/_shared.ts Imports round from resilience-stats (dedup), fixes computeLowConfidence to flag negative cronbach alpha, and calls cachedFetchJson with negativeTtlSeconds=2000
server/worldmonitor/resilience/v1/get-resilience-ranking.ts Adds seed-meta write for health monitoring and atomically writes ranking + meta via single pipeline when all country scores are available

Sequence Diagram

sequenceDiagram
    participant Client
    participant RankingHandler as get-resilience-ranking.ts
    participant SharedLib as _shared.ts
    participant Redis as Redis (Upstash)
    participant Health as api/health.js

    Client->>RankingHandler: getResilienceRanking()
    RankingHandler->>Redis: getCachedJson('resilience:ranking')
    Redis-->>RankingHandler: null (cache miss)
    RankingHandler->>SharedLib: listScorableCountries()
    SharedLib->>Redis: getCachedJson(RESILIENCE_STATIC_INDEX_KEY)
    Redis-->>SharedLib: country list
    SharedLib-->>RankingHandler: ["US", "DE", ...]
    RankingHandler->>SharedLib: getCachedResilienceScores(countryCodes)
    SharedLib->>Redis: runRedisPipeline([GET score:US, GET score:DE, ...])
    Redis-->>SharedLib: cached scores
    SharedLib-->>RankingHandler: Map<code, score>
    Note over RankingHandler: warmMissingResilienceScores() if needed
    RankingHandler->>SharedLib: ensureResilienceScoreCached(missing[])
    SharedLib->>Redis: cachedFetchJson (score computation)
    Redis-->>SharedLib: stored
    RankingHandler->>RankingHandler: sortRankingItems()
    Note over RankingHandler: Fix #7: atomic write
    RankingHandler->>Redis: runRedisPipeline([\n  SET resilience:ranking EX 21600,\n  SET seed-meta:resilience:ranking EX 604800\n])
    RankingHandler-->>Client: { items: [...] }
    Health->>Redis: STRLEN resilience:ranking
    Health->>Redis: GET seed-meta:resilience:ranking
    Redis-->>Health: { fetchedAt, count }
    Health->>Health: seedAge > 720min? → STALE_SEED
Loading

Comments Outside Diff (1)

  1. server/worldmonitor/resilience/v1/_shared.ts, line 194 (link)

    P2 negativeTtlSeconds = 2000 is unreachable

    The fetcher passed to cachedFetchJson always returns a fully-populated GetResilienceScoreResponse object and never returns null, so the negativeTtlSeconds parameter is never exercised. The cachedFetchJson function caches NEG_SENTINEL only when the fetcher returns null; since that path is unreachable here, this value has no effect at runtime.

    This is harmless, but it may cause confusion for the next reader who wonders why 2000 seconds was chosen. A brief comment clarifying intent would help:

Reviews (1): Last reviewed commit: "fix: revert cachedFetchJson arg change (..." | Re-trigger Greptile

Comment on lines +19 to +20
const RESILIENCE_RANKING_META_KEY = 'seed-meta:resilience:ranking';
const RESILIENCE_RANKING_META_TTL_SECONDS = 7 * 24 * 60 * 60;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Meta key constant not co-located with sibling constants

RESILIENCE_RANKING_META_KEY and RESILIENCE_RANKING_META_TTL_SECONDS are defined locally here, but all related resilience cache constants (RESILIENCE_RANKING_CACHE_KEY, RESILIENCE_RANKING_CACHE_TTL_SECONDS, etc.) live in _shared.ts. The meta key string 'seed-meta:resilience:ranking' is also hardcoded independently in api/health.js (line 257). If the key ever changes, both sites must be updated manually.

Consider moving these to _shared.ts and exporting them:

// in _shared.ts
export const RESILIENCE_RANKING_META_KEY = 'seed-meta:resilience:ranking';
export const RESILIENCE_RANKING_META_TTL_SECONDS = 7 * 24 * 60 * 60;

Then import in the handler and reference from health.js as documentation.

@koala73 koala73 merged commit dbb9b36 into main Apr 5, 2026
10 checks passed
@koala73 koala73 deleted the fix/resilience-review-fixes-to-main branch April 5, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant