Analyze $50B+ in SBIR/STTR funding data: Track technology transitions, patent outcomes, and economic impact of federal R&D investments.
- π 533K+ SBIR awards from 1983-present across all federal agencies
- π 40K-80K technology transitions detected using 6 independent signals
- π CET classification for Critical & Emerging Technology trend analysis
- π° Economic impact analysis with ROI and federal tax receipt estimates
- π Patent ownership chains tracking SBIR-funded innovation outcomes
- Python 3.11+ (required)
- Docker (optional, for local Neo4j database)
- AWS credentials (optional, for cloud features and S3 data)
Get started in 2 minutes:
git clone https://github.com/hollomancer/sbir-analytics
cd sbir-analytics
make install # Install dependencies with uv
make dev # Start Dagster UI
# Open http://localhost:3000Next steps:
- Materialize
raw_sbir_awardsasset in Dagster UI - Explore data in Neo4j Browser (http://localhost:7474)
- See Getting Started Guide for detailed walkthrough
For production use, see Deployment Guide for:
- GitHub Actions (orchestrates ETL pipelines via
dagster job execute) - AWS Lambda (serverless, for scheduled data downloads)
- Five-stage ETL: Extract β Validate β Enrich β Transform β Load
- Asset-based orchestration: Dagster with dependency management
- Data quality gates: Comprehensive validation at each stage
- Cloud-first design: AWS S3 + Neo4j (Docker) + GitHub Actions
| System | Purpose | Documentation |
|---|---|---|
| Transition Detection | Identify SBIR β federal contract transitions (β₯85% precision) | docs/transition/ |
| Phase II β III Latency | Time-to-Phase-III survival analysis with matched-pair + KM frames | docs/phase-transition-latency.md |
| CET Classification | ML-based technology area classification | docs/ml/ |
| ModernBERT-Embed | Patent-award similarity using semantic embeddings | docs/ml/paecter.md |
| Fiscal Returns | Economic impact & ROI analysis using BEA I-O tables | docs/fiscal/ |
| Patent Analysis | USPTO patent chains and tech transfer tracking | docs/schemas/patent-neo4j-schema.md |
- Orchestration: Dagster 1.7+ (asset-based pipeline), GitHub Actions
- Database: Neo4j 5.x (graph database for relationships)
- Processing: DuckDB 1.0+ (analytical queries), Pandas 2.2+
- Configuration: Pydantic 2.8+ (type-safe YAML config)
- Deployment: Docker, AWS Lambda, GitHub Actions
| Topic | Description |
|---|---|
| Getting Started | Detailed setup guides for local, cloud, and ML workflows |
| Architecture | System design, patterns, and technical decisions |
| Deployment | Production deployment options and guides |
| Testing | Testing strategy, guides, and coverage |
| Schemas | Neo4j graph schema and data models |
| API Reference | Code documentation and API reference |
See Documentation Index for complete map.
graph TD
subgraph Sources
SBIR[SBIR.gov CSV]
USA[USAspending Bulk]
USPTO[USPTO Patents]
BEA[BEA I-O Tables]
end
subgraph "sbir_etl Β· Core ETL Library"
EXT[Extractors]
VAL[Validators]
ENR[Enrichers]
TRN[Transformers]
end
subgraph "packages/"
ML["sbir-ml Β· CET Β· Transition Β· PaECTER"]
AN["sbir-analytics Β· Dagster Assets Β· Jobs"]
GR["sbir-graph Β· Neo4j Loaders"]
end
subgraph Targets
NEO4J[(Neo4j)]
S3[(S3 / DuckDB)]
end
SBIR & USA & USPTO & BEA --> EXT
EXT --> VAL --> ENR --> TRN
TRN --> AN
ML --> AN
AN --> GR --> NEO4J
AN --> S3
style Sources fill:#e8f4f8,stroke:#5b9bd5
style Targets fill:#e2efda,stroke:#70ad47
Data flows top-down: sources are extracted by sbir_etl, orchestrated through Dagster assets in sbir-analytics, and loaded into Neo4j via sbir-graph. ML models in sbir-ml feed classification and scoring into the asset graph.
sbir-analytics/
βββ sbir_etl/ # Core ETL library (extractors, enrichers, transformers, validators)
βββ packages/
β βββ sbir-analytics/ # Dagster orchestration (assets, jobs, sensors, CLI)
β βββ sbir-graph/ # Neo4j loading and relationship creation
β βββ sbir-ml/ # ML models (CET classification, transition, PaECTER)
β βββ sbir-rag/ # RAG system for award/patent search
βββ tests/ # Unit, integration, and E2E tests
βββ config/ # YAML configuration (base.yaml, thresholds)
βββ docs/ # Architecture, deployment, testing, schema docs
βββ specs/ # Feature specifications
βββ infrastructure/cdk/ # AWS CDK stacks (security, storage, batch)
βββ lambda/ # Lambda layer dependency definitions
βββ scripts/ # Pipeline runners, benchmarks, utilities
βββ migrations/ # Database migration scripts
βββ notebooks/ # Jupyter analysis notebooks
βββ examples/ # Usage examples
See CONTRIBUTING.md for detailed breakdown.
# Development
make install # Install dependencies
make dev # Start Dagster UI
make test # Run tests
make lint # Run linters
# Docker (alternative)
make docker-build # Build Docker image
make docker-up-dev # Start development stack
make docker-test # Run tests in container
# Data operations
make transition-run # Run transition detection
make cet-run # Run CET classificationSee Makefile for all available commands.
Configuration uses YAML files with environment variable overrides:
# Override any config using SBIR_ETL__SECTION__KEY pattern
export SBIR_ETL__NEO4J__URI="bolt://localhost:7687"
export SBIR_ETL__ENRICHMENT__BATCH_SIZE=200See Configuration Guide for details.
We welcome contributions! See CONTRIBUTING.md for:
- Development setup and workflow
- Code quality standards (black, ruff, mypy)
- Testing requirements (β₯85% coverage)
- Pull request process
make test # Run all tests
make test-unit # Unit tests only
make test-integration # Integration tests
make test-e2e # End-to-end testsSee Testing Guide for details.
This project is licensed under the MIT License. Copyright (c) 2025 Conrad Hollomon.
This project makes use of and is grateful for the following open-source tools and research:
- BEA API - Bureau of Economic Analysis Input-Output tables for fiscal impact modeling
- Bayesian Mixture-of-Experts - Research on calibration and uncertainty estimation by Albus Yizhuo Li
- ModernBERT-Embed - Embedding model by Nomic AI (768-dim, 8192 token context)
- @SquadronConsult - Help with SAM.gov data integration
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/