Skip to content

zhiweio/redshift-spectra

Repository files navigation

Redshift Spectra πŸš€

License: MIT Python 3.11+ Terraform Terragrunt LocalStack AWS Docs

Turn your Amazon Redshift into a secure, multi-tenant Data-as-a-Service platform.

Redshift Spectra is an enterprise-grade serverless middleware that transforms your Redshift data warehouse into a managed RESTful API. With zero-trust security, database-level tenant isolation, and dual execution modes (synchronous Query API + asynchronous Bulk API), you can safely expose analytical data to internal teams, partners, and customersβ€”without exposing your database directly.

✨ Why Redshift Spectra?

Building a multi-tenant data platform is hard. Redshift Spectra solves the key challenges:

flowchart TB
    subgraph Traditional["❌ Traditional Approach"]
        direction TB
        T1[Direct Database Access] --> T2[Security Risks]
        T1 --> T3[Connection Pool Exhaustion]
        T1 --> T4[No Tenant Isolation]
        T1 --> T5[Compliance Violations]
    end

    subgraph Modern["βœ… Redshift Spectra"]
        direction TB
        M1[RESTful API Layer] --> M2[Zero-Trust Security]
        M1 --> M3[Serverless Scaling]
        M1 --> M4[Database-Level Isolation]
        M1 --> M5[Full Audit Trail]
    end

    Traditional -.->|"Transform"| Modern
Loading
Challenge Traditional Solution Redshift Spectra
πŸ”’ Tenant Isolation Application-level filtering (error-prone) Database-level RLS/RBAC enforcement
πŸ›‘οΈ Security Shared credentials, custom logic Zero-trust with JWT/API Key/IAM auth
⚑ Interactive Queries Complex async polling Synchronous API with immediate results
πŸ“Š Large Exports Timeout constraints Async Bulk API with S3 delivery
⏱️ Latency Cold connections every request Session reuse for 80% latency reduction
πŸ” Compliance Custom audit logging Built-in audit trail with X-Ray tracing

πŸ—οΈ Architecture

flowchart TB
    subgraph Consumers["πŸ“± Data Consumers"]
        APP[Internal Apps]
        BI[BI Tools]
        PARTNER[Partner Systems]
        ETL[ETL Pipelines]
    end

    subgraph Spectra["πŸ” Redshift Spectra"]
        subgraph Edge["Edge Layer"]
            APIGW[API Gateway<br/>Rate Limiting Β· WAF]
        end

        subgraph Auth["Authentication"]
            AUTHZ[Authorizer<br/>JWT Β· API Key Β· IAM]
        end

        subgraph Compute["Compute Layer"]
            QUERY[Query Handler<br/>Sync Execution]
            BULK[Bulk Handler<br/>Async Operations]
        end

        subgraph State["State Layer"]
            DDB[(DynamoDB<br/>Jobs Β· Sessions)]
            S3[(S3<br/>Large Results)]
        end
    end

    subgraph Data["🏒 Enterprise Data"]
        RS[(Amazon Redshift<br/>RLS Β· RBAC)]
    end

    Consumers --> APIGW
    APIGW --> AUTHZ
    AUTHZ --> QUERY
    AUTHZ --> BULK
    QUERY --> DDB
    QUERY --> RS
    BULK --> DDB
    BULK --> S3
    BULK --> RS
Loading

πŸš€ Quick Start

Local Development (Recommended)

# Clone and install
git clone https://github.com/zhiweio/redshift-spectra.git
cd redshift-spectra
task setup:install-dev

# Configure (defaults work for LocalStack)
cp .env.example .env

# Start LocalStack and deploy
task local:deploy

# Verify deployment
task local:status

Query API β€” Synchronous Execution

For interactive queries with immediate results (≀10,000 rows):

curl -X POST "https://your-api.execute-api.region.amazonaws.com/v1/queries" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: tenant-a" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT * FROM sales WHERE region = '\''APAC'\'' LIMIT 100"}'

Response (immediate):

{
  "data": [
    {"id": 1, "product": "Widget", "amount": 99.99, "region": "APAC"},
    {"id": 2, "product": "Gadget", "amount": 149.99, "region": "APAC"}
  ],
  "metadata": {
    "row_count": 2,
    "column_info": [...],
    "execution_time_ms": 245
  }
}

Bulk API β€” Asynchronous Execution

For large exports and long-running operations:

# Create a bulk job
curl -X POST "https://your-api.execute-api.region.amazonaws.com/v1/bulk/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: tenant-a" \
  -H "Content-Type: application/json" \
  -d '{"operation": "query", "sql": "SELECT * FROM large_table"}'

# Response: {"job_id": "job-abc123", "state": "QUEUED"}

# Poll for completion
curl "https://your-api.execute-api.region.amazonaws.com/v1/bulk/jobs/job-abc123" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: tenant-a"

# When complete, get results via presigned S3 URL
curl "https://your-api.execute-api.region.amazonaws.com/v1/bulk/jobs/job-abc123/result"

🎯 Key Features

πŸ” Zero-Trust Security

  • Database-level isolation β€” Tenant security enforced by Redshift RLS/RBAC, not application code
  • Multiple auth methods β€” JWT tokens, API keys, AWS IAM (SigV4)
  • SQL injection prevention β€” Built-in validator blocks dangerous operations
  • Complete audit trail β€” Every query logged with tenant context and X-Ray tracing

⚑ Dual Execution Modes

  • Query API (Sync) β€” Immediate results for interactive workloads, ≀10,000 rows
  • Bulk API (Async) β€” Background processing for large exports, unlimited rows
  • Automatic LIMIT injection β€” Query API enforces safe result sizes

πŸš€ Performance Optimized

  • Session Reuse β€” Redshift Data API SessionKeepAliveSeconds for connection pooling
  • CSV Format Parsing β€” get_statement_result_v2 with 70%+ faster deserialization
  • Lambda Layers β€” Shared dependencies for faster cold starts

πŸ“¦ Intelligent Data Delivery

  • Inline JSON β€” Small results returned directly in API response
  • S3 Presigned URLs β€” Large results exported to S3 (Parquet/CSV)
  • Automatic switching β€” Threshold-based format selection

πŸ”§ Enterprise Ready

  • Multi-tenancy β€” One database user per tenant with isolated permissions
  • Terragrunt IaC β€” DRY multi-environment deployments (dev/staging/prod)
  • Observability β€” CloudWatch dashboards, X-Ray tracing, structured logging
  • Idempotency β€” Request deduplication with configurable TTL

πŸ“Š Performance Benchmarks

Metric Without Optimization With Optimization Improvement
First Query Latency 500ms 500ms β€”
Subsequent Queries 500ms 100ms 80% ⬇️
Result Parsing (10K rows) 120ms 35ms 71% ⬇️
Cold Start 3s 1.5s 50% ⬇️

Optimizations include session reuse and CSV format parsing via get_statement_result_v2.

πŸ“š Documentation

Full documentation is available at zhiweio.github.io/redshift-spectra

Section Description
Getting Started Installation, configuration, quickstart
User Guide Query API (sync), Bulk API (async)
Concepts Architecture, multi-tenancy, data delivery
Security Zero-trust model, authentication, SQL validation
Performance Session reuse, CSV format optimization
Deployment Terragrunt, Lambda layers, monitoring
API Reference Complete REST API specification

πŸ› οΈ Development

Local Development with LocalStack

For local development and testing without AWS costs:

# Start LocalStack
task local:start

# Deploy infrastructure to LocalStack
task local:deploy

# Check status
task local:status

# Clean up
task local:stop

⚠️ Note: Full functionality (Redshift Data API, Lambda Layers) requires LocalStack Pro. See the LocalStack documentation for details on Community vs Pro features.

Running Tests

# Install dev dependencies
task setup:install-dev

# Run all tests (450+ test cases)
task test:all

# Lint and format Python code
task lint:check
task lint:format

# Format Terraform/Terragrunt files
task infra:iac-fmt

# Build Lambda packages
task build:all

# Serve documentation locally
task docs:serve

πŸ“ Project Structure

redshift-spectra/
β”œβ”€β”€ src/spectra/          # Lambda function source code
β”‚   β”œβ”€β”€ handlers/         # API endpoint handlers
β”‚   β”œβ”€β”€ services/         # Business logic (Redshift, S3, DynamoDB)
β”‚   β”œβ”€β”€ middleware/       # Tenant context, authentication
β”‚   β”œβ”€β”€ models/           # Request/response schemas
β”‚   └── utils/            # SQL validator, config, helpers
β”œβ”€β”€ terraform/            # Terraform modules
β”‚   └── modules/          # api-gateway, dynamodb, iam, lambda, monitoring, s3
β”œβ”€β”€ terragrunt/           # Terragrunt configurations
β”‚   └── environments/     # local, dev, prod environments
β”œβ”€β”€ tests/                # Unit and integration tests
β”‚   β”œβ”€β”€ unit/             # Handler, service, utility tests
β”‚   └── integration/      # End-to-end workflow tests
└── docs/                 # MkDocs documentation source

🀝 Contributing

Contributions are welcome! Please see our Contributing Guide for details.

πŸ“„ License

MIT License β€” see LICENSE for details.


Built with ❀️ for the data engineering community

Documentation β€’ Report Bug β€’ Request Feature

About

Turn your Amazon Redshift into a secure, multi-tenant Data-as-a-Service platform. Enterprise-grade serverless API with zero-trust security.

Topics

Resources

License

Stars

Watchers

Forks

Contributors