This document describes the architecture and design decisions for the calendar dimension table solution.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Databricks Workspace │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────────┐ │
│ │ Workflow │────▶│ Notebook │────▶│ Delta Table │ │
│ │ (Scheduled) │ │ 02_daily_ │ │ common.reference. │ │
│ │ 6 AM ET │ │ refresh │ │ dim_calendar │ │
│ └──────────────┘ └──────┬───────┘ └──────────────────────────────┘ │
│ │ ▲ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Python Generation │ │ │
│ │ ┌─────────────┐ ┌───────────────┐ │ │ │
│ │ │ pandas │ │ holidays │ │ │ │
│ │ │ DataFrame │ │ library │ │ │ │
│ │ └─────────────┘ └───────────────┘ │ │ │
│ └──────────────────────────────────────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Spark DataFrame │───── MERGE ───┘ │
│ │ (staging temp view) │ │
│ └──────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
│
│ API Call
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ FRED API │
│ (Federal Reserve Economic Data) │
│ USRECD Series │
└─────────────────────────────────────────────────────────────────────────────┘
Purpose: Orchestrates daily execution of the calendar refresh.
Schedule: Daily at 6:00 AM Eastern Time
Configuration:
- Cluster: Shared job cluster or serverless
- Timeout: 30 minutes
- Retries: 2 with 5-minute intervals
- Alerts: Email on failure
- One-time execution
- Creates catalog, schema, and table
- Installs dependencies
- Daily execution
- Generates calendar data
- Merges into target table
- Optimizes table
Libraries:
pandas: DataFrame operations and date calculationsholidays: US federal holiday detection with observed datesrequests: FRED API integrationzoneinfo: Timezone handling
Key Functions:
generate_calendar_dimension(): Main generation logicfetch_fred_recession_data(): FRED API integrationlocalize_to_eastern(): Timezone conversion
Location: common.reference.dim_calendar
Format: Delta Lake
Optimizations:
- Auto-optimize writes enabled
- Auto-compaction enabled
- Z-ORDER by
year,month_number,date_key
- Series: USRECD (US Recession Indicator)
- Authentication: API key stored in Databricks Secrets
- Frequency: Monthly data
- Fallback: Graceful degradation if API unavailable
1. Workflow triggers at 6 AM Eastern
│
▼
2. Notebook installs dependencies
│
▼
3. Python generates full calendar DataFrame
├── Date range: 1854-12-01 to current date
├── Holiday detection via `holidays` library
└── FRED API call for recession data
│
▼
4. Convert pandas DataFrame to Spark DataFrame
│
▼
5. Create temporary staging view
│
▼
6. MERGE into target Delta table
├── Match on date_key
├── Update existing rows (YTD/MTD/QTD flags change daily)
└── Insert new rows (new dates)
│
▼
7. OPTIMIZE table with Z-ORDER
│
▼
8. Validate and log results
- Preserves table metadata: Column comments and table properties are retained
- Efficient updates: Only changed rows are updated (YTD/MTD/QTD flags)
- Audit trail: Delta time travel remains intact
- Reduced I/O: Less data written compared to full overwrite
- Simplicity: Date calculations are more intuitive in pandas
- holidays library: Integrates seamlessly with pandas
- Performance: 62K rows is well within pandas capacity
- Flexibility: Easy to add new calculated fields
- Finance focus: US markets operate on Eastern Time
- Consistency: All date/time fields use same timezone
- Business alignment: Matches typical US business operations
- ACID transactions: Ensures data consistency
- Time travel: Ability to query historical versions
- Optimization: Z-ORDER improves query performance
- Unity Catalog: Full governance integration
┌─────────────────────────────────────┐
│ Databricks Secrets │
│ ┌─────────────────────────────┐ │
│ │ Scope: fred-api │ │
│ │ Key: api-key │ │
│ │ Value: [FRED API Key] │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────┘
- Table owned by service principal
- Read access granted to analysts/reports
- Write access limited to ETL service principal
- Row count matches expected date range
- No null values in required fields
- FRED data coverage percentage
- Job failure notification
- FRED API unavailability warning
- Data validation failures
- ~62,000 rows (1854-2025)
- ~170 years of data
- Sub-minute execution time
- +365 rows per year
- Minimal performance impact
- No architectural changes needed for 100+ years
- Delta time travel (30 days default)
- Can recreate from scratch (deterministic generation)
- Check Delta history for valid version
- If corrupted, truncate and regenerate
- Run 02_daily_refresh notebook
- Additional holiday calendars: UK, EU, APAC markets
- Fiscal calendar support: Configurable fiscal year start
- Additional FRED series: Interest rates, unemployment
- Streaming updates: Real-time flag updates