Turn your Amazon Redshift into a secure, multi-tenant Data-as-a-Service platform.
Redshift Spectra is an enterprise-grade serverless middleware that transforms your Redshift data warehouse into a managed RESTful API. With zero-trust security, database-level tenant isolation, and dual execution modes (synchronous Query API + asynchronous Bulk API), you can safely expose analytical data to internal teams, partners, and customersβwithout exposing your database directly.
Building a multi-tenant data platform is hard. Redshift Spectra solves the key challenges:
flowchart TB
subgraph Traditional["β Traditional Approach"]
direction TB
T1[Direct Database Access] --> T2[Security Risks]
T1 --> T3[Connection Pool Exhaustion]
T1 --> T4[No Tenant Isolation]
T1 --> T5[Compliance Violations]
end
subgraph Modern["β
Redshift Spectra"]
direction TB
M1[RESTful API Layer] --> M2[Zero-Trust Security]
M1 --> M3[Serverless Scaling]
M1 --> M4[Database-Level Isolation]
M1 --> M5[Full Audit Trail]
end
Traditional -.->|"Transform"| Modern
| Challenge | Traditional Solution | Redshift Spectra |
|---|---|---|
| π Tenant Isolation | Application-level filtering (error-prone) | Database-level RLS/RBAC enforcement |
| π‘οΈ Security | Shared credentials, custom logic | Zero-trust with JWT/API Key/IAM auth |
| β‘ Interactive Queries | Complex async polling | Synchronous API with immediate results |
| π Large Exports | Timeout constraints | Async Bulk API with S3 delivery |
| β±οΈ Latency | Cold connections every request | Session reuse for 80% latency reduction |
| π Compliance | Custom audit logging | Built-in audit trail with X-Ray tracing |
flowchart TB
subgraph Consumers["π± Data Consumers"]
APP[Internal Apps]
BI[BI Tools]
PARTNER[Partner Systems]
ETL[ETL Pipelines]
end
subgraph Spectra["π Redshift Spectra"]
subgraph Edge["Edge Layer"]
APIGW[API Gateway<br/>Rate Limiting Β· WAF]
end
subgraph Auth["Authentication"]
AUTHZ[Authorizer<br/>JWT Β· API Key Β· IAM]
end
subgraph Compute["Compute Layer"]
QUERY[Query Handler<br/>Sync Execution]
BULK[Bulk Handler<br/>Async Operations]
end
subgraph State["State Layer"]
DDB[(DynamoDB<br/>Jobs Β· Sessions)]
S3[(S3<br/>Large Results)]
end
end
subgraph Data["π’ Enterprise Data"]
RS[(Amazon Redshift<br/>RLS Β· RBAC)]
end
Consumers --> APIGW
APIGW --> AUTHZ
AUTHZ --> QUERY
AUTHZ --> BULK
QUERY --> DDB
QUERY --> RS
BULK --> DDB
BULK --> S3
BULK --> RS
# Clone and install
git clone https://github.com/zhiweio/redshift-spectra.git
cd redshift-spectra
task setup:install-dev
# Configure (defaults work for LocalStack)
cp .env.example .env
# Start LocalStack and deploy
task local:deploy
# Verify deployment
task local:statusFor interactive queries with immediate results (β€10,000 rows):
curl -X POST "https://your-api.execute-api.region.amazonaws.com/v1/queries" \
-H "Authorization: Bearer $API_KEY" \
-H "X-Tenant-ID: tenant-a" \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT * FROM sales WHERE region = '\''APAC'\'' LIMIT 100"}'Response (immediate):
{
"data": [
{"id": 1, "product": "Widget", "amount": 99.99, "region": "APAC"},
{"id": 2, "product": "Gadget", "amount": 149.99, "region": "APAC"}
],
"metadata": {
"row_count": 2,
"column_info": [...],
"execution_time_ms": 245
}
}For large exports and long-running operations:
# Create a bulk job
curl -X POST "https://your-api.execute-api.region.amazonaws.com/v1/bulk/jobs" \
-H "Authorization: Bearer $API_KEY" \
-H "X-Tenant-ID: tenant-a" \
-H "Content-Type: application/json" \
-d '{"operation": "query", "sql": "SELECT * FROM large_table"}'
# Response: {"job_id": "job-abc123", "state": "QUEUED"}
# Poll for completion
curl "https://your-api.execute-api.region.amazonaws.com/v1/bulk/jobs/job-abc123" \
-H "Authorization: Bearer $API_KEY" \
-H "X-Tenant-ID: tenant-a"
# When complete, get results via presigned S3 URL
curl "https://your-api.execute-api.region.amazonaws.com/v1/bulk/jobs/job-abc123/result"- Database-level isolation β Tenant security enforced by Redshift RLS/RBAC, not application code
- Multiple auth methods β JWT tokens, API keys, AWS IAM (SigV4)
- SQL injection prevention β Built-in validator blocks dangerous operations
- Complete audit trail β Every query logged with tenant context and X-Ray tracing
- Query API (Sync) β Immediate results for interactive workloads, β€10,000 rows
- Bulk API (Async) β Background processing for large exports, unlimited rows
- Automatic LIMIT injection β Query API enforces safe result sizes
- Session Reuse β Redshift Data API
SessionKeepAliveSecondsfor connection pooling - CSV Format Parsing β
get_statement_result_v2with 70%+ faster deserialization - Lambda Layers β Shared dependencies for faster cold starts
- Inline JSON β Small results returned directly in API response
- S3 Presigned URLs β Large results exported to S3 (Parquet/CSV)
- Automatic switching β Threshold-based format selection
- Multi-tenancy β One database user per tenant with isolated permissions
- Terragrunt IaC β DRY multi-environment deployments (dev/staging/prod)
- Observability β CloudWatch dashboards, X-Ray tracing, structured logging
- Idempotency β Request deduplication with configurable TTL
| Metric | Without Optimization | With Optimization | Improvement |
|---|---|---|---|
| First Query Latency | 500ms | 500ms | β |
| Subsequent Queries | 500ms | 100ms | 80% β¬οΈ |
| Result Parsing (10K rows) | 120ms | 35ms | 71% β¬οΈ |
| Cold Start | 3s | 1.5s | 50% β¬οΈ |
Optimizations include session reuse and CSV format parsing via
get_statement_result_v2.
Full documentation is available at zhiweio.github.io/redshift-spectra
| Section | Description |
|---|---|
| Getting Started | Installation, configuration, quickstart |
| User Guide | Query API (sync), Bulk API (async) |
| Concepts | Architecture, multi-tenancy, data delivery |
| Security | Zero-trust model, authentication, SQL validation |
| Performance | Session reuse, CSV format optimization |
| Deployment | Terragrunt, Lambda layers, monitoring |
| API Reference | Complete REST API specification |
For local development and testing without AWS costs:
# Start LocalStack
task local:start
# Deploy infrastructure to LocalStack
task local:deploy
# Check status
task local:status
# Clean up
task local:stop
β οΈ Note: Full functionality (Redshift Data API, Lambda Layers) requires LocalStack Pro. See the LocalStack documentation for details on Community vs Pro features.
# Install dev dependencies
task setup:install-dev
# Run all tests (450+ test cases)
task test:all
# Lint and format Python code
task lint:check
task lint:format
# Format Terraform/Terragrunt files
task infra:iac-fmt
# Build Lambda packages
task build:all
# Serve documentation locally
task docs:serveredshift-spectra/
βββ src/spectra/ # Lambda function source code
β βββ handlers/ # API endpoint handlers
β βββ services/ # Business logic (Redshift, S3, DynamoDB)
β βββ middleware/ # Tenant context, authentication
β βββ models/ # Request/response schemas
β βββ utils/ # SQL validator, config, helpers
βββ terraform/ # Terraform modules
β βββ modules/ # api-gateway, dynamodb, iam, lambda, monitoring, s3
βββ terragrunt/ # Terragrunt configurations
β βββ environments/ # local, dev, prod environments
βββ tests/ # Unit and integration tests
β βββ unit/ # Handler, service, utility tests
β βββ integration/ # End-to-end workflow tests
βββ docs/ # MkDocs documentation source
Contributions are welcome! Please see our Contributing Guide for details.
MIT License β see LICENSE for details.
Built with β€οΈ for the data engineering community
Documentation β’ Report Bug β’ Request Feature