A small, production-style ETL pipeline built on AWS using CloudFormation. Raw JSON lands in S3, a Lambda function transforms it, and writes CSV back to S3.
- S3 bucket with two prefixes:
raw/for incoming JSONprocessed/for transformed CSV
- S3 event triggers Lambda on new objects in
raw/ - Lambda parses JSON, normalizes fields, writes CSV to
processed/
flowchart LR
A["Local sample.json"] -->|aws s3 cp| B["S3 raw/"]
B -->|S3 event| C["Lambda transform"]
C --> D["S3 processed/ (CSV)"]
- Uploading
data/sample.jsontoraw/producesprocessed/sample.csv. - The output CSV contains a normalized header and values from the input records.
- AWS S3
- AWS Lambda (Python)
- CloudFormation
- AWS CLI
infra/template.yaml
lambda/handler.py
data/sample.json
notification.json
.gitignore
- AWS CLI configured (
aws configure) - An S3 bucket to store the Lambda zip (code bucket)
- Package the Lambda
Compress-Archive -Path lambda/handler.py -DestinationPath lambda.zip -Force- Create a code bucket (unique name)
aws s3 mb s3://<code-bucket-name> --region eu-north-1
aws s3 cp lambda.zip s3://<code-bucket-name>/lambda/lambda.zip- Deploy the stack
aws cloudformation deploy \
--template-file infra/template.yaml \
--stack-name aws-s3-lambda-etl \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides ProjectName=aws-s3-lambda-etl CodeBucket=<code-bucket-name> CodeKey=lambda/lambda.zip \
--region eu-north-1- Enable S3 -> Lambda notifications
aws s3api put-bucket-notification-configuration \
--bucket <data-bucket-name> \
--notification-configuration file://notification.json \
--region eu-north-1aws s3 cp data/sample.json s3://<data-bucket-name>/raw/sample.json --region eu-north-1
aws s3 ls s3://<data-bucket-name>/processed/ --region eu-north-1aws cloudformation delete-stack --stack-name aws-s3-lambda-etl --region eu-north-1
aws s3 rb s3://<code-bucket-name> --force --region eu-north-1
aws s3 rb s3://<data-bucket-name> --force --region eu-north-1- Replace
<data-bucket-name>with the bucket created by CloudFormation. - Update
notification.jsonwith your Lambda ARN if you use a different AWS account or region. - Keep input files small to stay in free tier.
- CI runs a simple Python syntax check on
lambda/handler.py. - Licensed under the MIT License.
- S3 and Lambda are within free tier for small usage; avoid large files to minimize costs.
