is-it-down is an open-source service health platform that uses direct endpoint checks (not crowdsourced reports) to determine service status.
It is designed to be easy to extend: if you want to add a new service checker, improve reliability signals, or improve API/UI output, this repo is built for that workflow.
- Add support for more services.
- Improve checker quality (fewer false positives / false negatives).
- Improve status attribution for dependency-driven incidents.
- Improve the FastAPI backend and Next.js dashboard experience.
Service Checkers -> BigQuery -> FastAPI API -> Next.js Web App
Runtime components:
checker-job: runs service checkers and writes check rows to BigQuery.api: FastAPI service that serves status and incident data.web: Next.js 16 app inweb/that consumes the API.
Prerequisites:
- Python
3.13+ uv
Install deps and run core validation:
uv sync --extra dev
uv run --extra dev ruff check .
uv run --extra dev pytestList and run checkers locally (no BigQuery writes):
uv run is-it-down-run-service-checker --list
uv run is-it-down-run-service-checker cloudflare
uv run is-it-down-run-service-checker cloudflare --jsonRun scheduled checks with BigQuery writes disabled:
uv run is-it-down-run-scheduled-checks --dry-runFind BaseChecks that were degraded/down in the last 48 hours:
uv run find-failing-base-checkers
uv run find-failing-base-checkers --json
uv run find-failing-base-checkers --service-key cloudflare --lookback-hours 24uv run is-it-down-apiHealth endpoint:
curl http://localhost:8080/healthzPrerequisites:
bun(project usesbun@1.2.16)
cd web
bun install
cp .env.example .env.local
bun devFrontend env vars:
API_BASE_URL: server-side fetch base URL.NEXT_PUBLIC_API_BASE_URL: client-side API base URL.
- Create/update checker module(s) in
src/is_it_down/checkers/services/. - Keep checks focused and independent (status page + API + web edge checks is a common pattern).
- Make sure non-up results include debug metadata.
- Validate with:
uv run --extra dev ruff check .
uv run --extra dev pytest
uv run is-it-down-run-service-checker --list
uv run is-it-down-run-service-checker <service_key> --json
uv run is-it-down-run-service-checker <service_key> --verboseUse this pre-PR pass:
uv run --extra dev ruff check .
uv run --extra dev pytestcd web
bun run lint
bun run buildMost local contributor workflows only need defaults, but these are commonly used:
IS_IT_DOWN_ENV:local,development, orproduction.IS_IT_DOWN_DEFAULT_CHECKER_PROXY_URL: local proxy override for checks that useproxy_setting="default".IS_IT_DOWN_PROXY_SECRET_PROJECT_ID: GCP project containing checker proxy secrets.IS_IT_DOWN_DEFAULT_CHECKER_PROXY_SECRET_ID: default proxy secret ID.IS_IT_DOWN_CHECKER_CONCURRENCY(default:10)IS_IT_DOWN_CHECKER_MAX_RESPONSE_BODY_BYTES(default:524288)IS_IT_DOWN_CHECKER_MAX_JSON_RESPONSE_BODY_BYTES(default:1048576)IS_IT_DOWN_CHECKER_INSERT_BATCH_SIZE(default:500)
BigQuery settings (for non-dry-run scheduled checks / API integrations):
IS_IT_DOWN_BIGQUERY_PROJECT_IDIS_IT_DOWN_BIGQUERY_DATASET_ID(default:is_it_down)IS_IT_DOWN_BIGQUERY_TABLE_ID(default:check_results)IS_IT_DOWN_TRACKING_BIGQUERY_DATASET_ID(default:is_it_down_tracking)IS_IT_DOWN_TRACKING_BIGQUERY_TABLE_ID(default:service_detail_views)
API cache + Redis settings:
IS_IT_DOWN_API_CACHE_ENABLED(default:true)IS_IT_DOWN_API_CACHE_TTL_SECONDS(default:60)IS_IT_DOWN_API_CACHE_KEY_PREFIX(default:is-it-down:api:v1)IS_IT_DOWN_API_CACHE_REDIS_URL(optional direct Redis URL; useful for local development)IS_IT_DOWN_API_CACHE_REDIS_SECRET_ID(Secret Manager secret ID/resource for Redis URL)IS_IT_DOWN_REDIS_SECRET_PROJECT_ID(project used when secret ID is short and not fully-qualified)IS_IT_DOWN_API_CACHE_WARM_ON_CHECKER_JOB(default:true)IS_IT_DOWN_API_CACHE_WARM_ON_CLOUD_RUN_CHECKER_JOB(default:true; set tofalseto disable warming in Cloud Run checker task executions)IS_IT_DOWN_API_CACHE_WARM_IMPACTED_SERVICE_LIMIT(default:25)IS_IT_DOWN_API_CACHE_WARM_TOP_VIEWED_SERVICE_LIMIT(default:25)
src/is_it_down/checkers: checker framework, utilities, and service checkers.src/is_it_down/api: FastAPI routes and API infrastructure.src/is_it_down/core: scoring, attribution, and shared domain models.src/is_it_down/scripts/run_service_checker.py: local ad-hoc checker runner.src/is_it_down/scripts/run_scheduled_checks.py: checker job entrypoint.web/: Next.js dashboard.infra/terraform: Cloud Run + Cloud Scheduler + BigQuery infra.
CI (.github/workflows/ci.yml) runs:
ruff check src testspytest -q
Deployment summary:
- Push to
maindeploysdev(is-it-down-dev). - GitHub Release deploys
prod(is-it-down-prod). - Images built/pushed:
checker,api,web. - Terraform applies the image tag to Cloud Run resources.
Required GitHub secret for deploy workflows:
GCP_SA_KEY(configured per GitHub Environment, e.g.devandprod).
- Prefer small, focused PRs.
- Include tests when behavior changes.
- Keep checker logic defensive against partial/malformed payloads.
- Run lint + tests before opening a PR.
- If adding a new checker, include enough metadata/debug info to make production triage easy.
If you are new to the codebase, a great first contribution is improving one service checker's signal quality and adding regression tests for that behavior.