Skip to content

mikkohaario/aws-s3-lambda-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS S3 -> Lambda ETL (CloudFormation)

A small, production-style ETL pipeline built on AWS using CloudFormation. Raw JSON lands in S3, a Lambda function transforms it, and writes CSV back to S3.

Architecture

  • S3 bucket with two prefixes:
    • raw/ for incoming JSON
    • processed/ for transformed CSV
  • S3 event triggers Lambda on new objects in raw/
  • Lambda parses JSON, normalizes fields, writes CSV to processed/
flowchart LR
  A["Local sample.json"] -->|aws s3 cp| B["S3 raw/"]
  B -->|S3 event| C["Lambda transform"]
  C --> D["S3 processed/ (CSV)"]
Loading

What you will see

  • Uploading data/sample.json to raw/ produces processed/sample.csv.
  • The output CSV contains a normalized header and values from the input records.

Screenshot

S3 console showing raw/ and processed/ prefixes

Tech stack

  • AWS S3
  • AWS Lambda (Python)
  • CloudFormation
  • AWS CLI

Repo structure

infra/template.yaml
lambda/handler.py
data/sample.json
notification.json
.gitignore

Prerequisites

  • AWS CLI configured (aws configure)
  • An S3 bucket to store the Lambda zip (code bucket)

Deploy

  1. Package the Lambda
Compress-Archive -Path lambda/handler.py -DestinationPath lambda.zip -Force
  1. Create a code bucket (unique name)
aws s3 mb s3://<code-bucket-name> --region eu-north-1
aws s3 cp lambda.zip s3://<code-bucket-name>/lambda/lambda.zip
  1. Deploy the stack
aws cloudformation deploy \
  --template-file infra/template.yaml \
  --stack-name aws-s3-lambda-etl \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides ProjectName=aws-s3-lambda-etl CodeBucket=<code-bucket-name> CodeKey=lambda/lambda.zip \
  --region eu-north-1
  1. Enable S3 -> Lambda notifications
aws s3api put-bucket-notification-configuration \
  --bucket <data-bucket-name> \
  --notification-configuration file://notification.json \
  --region eu-north-1

Test the pipeline

aws s3 cp data/sample.json s3://<data-bucket-name>/raw/sample.json --region eu-north-1
aws s3 ls s3://<data-bucket-name>/processed/ --region eu-north-1

Cleanup

aws cloudformation delete-stack --stack-name aws-s3-lambda-etl --region eu-north-1
aws s3 rb s3://<code-bucket-name> --force --region eu-north-1
aws s3 rb s3://<data-bucket-name> --force --region eu-north-1

Notes

  • Replace <data-bucket-name> with the bucket created by CloudFormation.
  • Update notification.json with your Lambda ARN if you use a different AWS account or region.
  • Keep input files small to stay in free tier.
  • CI runs a simple Python syntax check on lambda/handler.py.
  • Licensed under the MIT License.

Costs

  • S3 and Lambda are within free tier for small usage; avoid large files to minimize costs.

About

AWS S3 → Lambda ETL pipeline using CloudFormation (raw JSON to CSV)

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages