Quick Navigation
Build an automated NBA analytics pipeline that:
- Ingests raw player data via API
- Stores it in a cloud-native data lake
- Enables SQL analytics without data movement
- Serves as a foundation for sports betting/analytics applications
- Serverless Infrastructure: Zero servers to manage (S3 + Glue + Athena)
- Real-Time Schema Discovery: Auto-catalog JSON data with AWS Glue
- Cost-Efficient Queries: $5/TB scanned via Amazon Athena
- Scalable Storage: Handle 10,000+ player records with S3
- Data Source: SportsDataIO API for NBA player statistics.
- Data Ingestion: Python script (
boto3) for API integration and S3 uploads. - Storage Layer: AWS S3 bucket for raw JSON data and query results.
- Metadata Catalog: AWS Glue for schema discovery and table creation.
- Query Layer: Amazon Athena for serverless SQL analytics.
- Python script fetches NBA player data from SportsDataIO API.
- Raw JSON is uploaded to an S3 bucket (
s3://<bucket>/raw-data/). - AWS Glue crawler auto-discovers schema and creates metadata tables.
- Analysts run SQL queries directly on S3 data via Athena.

| Category | Technologies |
|---|---|
| Data Source | SportsDataIO API |
| Cloud Storage | AWS S3 |
| Data Catalog | AWS Glue |
| Query Engine | Amazon Athena |
| Execution | CloudShell |
| Automation | Python, Boto3 SDK |
| Environment Mgmt | Python-dotenv |
nba-analytics-data-lake/
βββ src/
β βββ setup_nba_data_lake.py # Infrastructure automation
β βββ delete_resources.py # Cleanup script
βββ .env # API credentials
βββ docs/ # Architecture diagrams
- AWS Account with permissions to:
- Create/delete S3 buckets
- Manage Glue databases
- Run Athena queries
- SportsDataIO API Key (Free Tier)
-
Log in to AWS Management Console
-
Navigate to IAM:
- In the search bar, type IAM and select IAM from the results.
- Create a New Policy:
- In the IAM dashboard, click on Policies in the left-hand menu.
- Click the Create policy button.
- Switch to JSON Editor:
- In the Create Policy page, select the JSON tab.
- Copy the provided JSON policy and paste it into the editor.

nano .env- Press
ito insert - Paste:
SPORTS_DATA_API_KEY=your_actual_key_here
NBA_ENDPOINT=https://api.sportsdata.io/v3/nba/scores/json/Players- Save & exit.
nano setup_nba_data_lake.py- Press
ito enter insert mode - Paste script content
- Save & exit.
pip install -r requirements.txtpython3 setup_nba_data_lake.pySuccessful Output:
S3 Bucket Created: s3://<bucket_name>
Glue Database 'nba_analytics' Ready
Athena Query Interface Activated!
Run in Athena Query Editor:
-- Total points per team
SELECT Team, SUM(points) AS TotalTeamPoints
FROM "glue-nba-data-lake"."nba_players"
GROUP BY TEAM
ORDER BY TotalTeamPoints DESC
LIMIT 5Run more queries & know about players or team.
- Verify S3 Data:
- Check Glue Catalog:
- IAM Roles: Least privilege access for S3/Glue/Athena
- API Key Protection: Stored in
.env(not committed to Git) - Encryption: S3 server-side encryption enabled
| Issue | Resolution |
|---|---|
BucketAlreadyExists |
Use globally unique bucket name |
AccessDenied in Glue |
Verify IAM permissions |
| No data in Athena | Wait 2-3 mins after Glue crawl |
- Automated daily sync with EventBridge
- Data transformation to Parquet format
- Cost monitoring dashboard
- Fork the repository
- Submit PRs to
newbranch
MIT License - Full Text





