This project implements a backend data pipeline that periodically fetches air quality data from the OpenAQ API, processes PM2.5 values, caches sensor metadata in Redis, and stores time-series measurements in TimescaleDB.
The system is designed to:
- decouple frontend applications from third-party APIs
- reduce latency using Redis caching
- store historical air quality measurements efficiently using TimescaleDB
- provide a structured data source for visualization and analytics The service decouples the frontend from third-party APIs and provides low-latency, cached responses.
Aggregated PM2.5 measurements are stored in TimescaleDB, a PostgreSQL extension designed for time-series workloads.
The database schema uses a hypertable optimized for sensor data.
Key features used:
- Hypertable partitioning for time-series scalability
- 1-day chunk intervals for efficient storage and querying
- Indexing by sensor and time for fast lookups
- Columnar compression segmented by
sensor_id - Automatic compression policy for data older than 7 days
This allows efficient queries such as:
- historical PM2.5 trends
- per-sensor pollution analysis
- time-bucketed averages
The data/ folder and pm25.json file are not committed to the repository because they are generated at runtime.
When the server starts:
The cron job fetches data from the OpenAQ API.
PM2.5 values are processed and stored in Redis.
The /pm25.json endpoint serves data directly from Redis.
If Redis is empty, wait for the first cron execution cycle before accessing the endpoint.
