Skip to content

shaunniee/SERVERLESS_ORDERING_SYSTEM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ Serverless Ordering System

High-Scale Event-Driven Order Processing on AWS

AWS Terraform Node.js DynamoDB Step Functions

A production-grade, event-driven order processing pipeline built entirely on AWS serverless services. Designed to handle 10,000+ orders/min using the Saga pattern for distributed transactions with automatic compensation on failure.

Demonstrates distributed systems design, serverless architecture, Infrastructure as Code, and event-driven patterns.


πŸ“ Architecture Overview

The system follows an event-driven microservices architecture where each component is decoupled through queues and events. A client submits an order via REST API, and the system asynchronously orchestrates inventory reservation, payment processing, order confirmation, and domain event publishing β€” all with automatic rollback if anything fails.

Architecture Diagram

πŸ”„ End-to-End Request Flow

flowchart TD
    subgraph Client["🌐 Client"]
        A["πŸ“± POST /orders"]
    end

    subgraph API["πŸ›‘οΈ API Gateway"]
        B["πŸ“‹ JSON Schema<br/>Validation"]
        B2["🚦 Throttling<br/>200 burst / 100 sustained"]
    end

    subgraph Create["⚑ createOrder Lambda"]
        C1["πŸ”‘ Idempotency Check<br/>(Powertools body hash)"]
        C2["πŸ’Ύ Write to DynamoDB<br/>(status: PENDING)"]
        C3["πŸ“¨ Enqueue to SQS"]
    end

    subgraph Queue["πŸ“¬ SQS"]
        D1["πŸ“₯ Order Queue"]
        D2["☠️ Dead Letter Queue<br/>(after 3 failures)"]
    end

    subgraph Process["⚑ processOrder Lambda"]
        E["πŸš€ Start Saga<br/>(orderId as execution name)"]
    end

    subgraph Saga["πŸ” Step Functions Express β€” Order Saga"]
        direction TB
        F1["πŸ“¦ Reserve Inventory"]
        F2["πŸ’³ Process Payment"]
        F3["βœ… Confirm Order"]
        F4["πŸ“‘ Emit Event"]
    end

    subgraph Events["πŸ“‘ EventBridge"]
        G1["🚌 Custom Event Bus"]
        G2["πŸ“‹ OrderPlaced Rule"]
        G3["πŸ“Š CloudWatch Logs"]
    end

    A --> B --> B2 --> C1 --> C2 --> C3
    C3 --> D1
    D1 --> E
    D1 -. "3 failures" .-> D2
    E --> F1 --> F2 --> F3 --> F4
    F4 --> G1 --> G2 --> G3

    style Client fill:#1a1a2e,stroke:#e94560,color:#fff
    style API fill:#16213e,stroke:#0f3460,color:#fff
    style Create fill:#1a1a2e,stroke:#e94560,color:#fff
    style Queue fill:#16213e,stroke:#0f3460,color:#fff
    style Process fill:#1a1a2e,stroke:#e94560,color:#fff
    style Saga fill:#0f3460,stroke:#53a8b6,color:#fff
    style Events fill:#16213e,stroke:#0f3460,color:#fff
Loading

πŸ›‘οΈ Saga Compensation β€” Automatic Rollback

When a step fails, the saga doesn't just stop β€” it undoes everything that succeeded before it. This guarantees data consistency across services without distributed locks.

flowchart LR
    subgraph Happy["βœ… Happy Path"]
        direction LR
        H1["πŸ“¦ Reserve<br/>Inventory"] --> H2["πŸ’³ Process<br/>Payment"] --> H3["βœ… Confirm<br/>Order"] --> H4["πŸ“‘ Emit<br/>Event"]
    end

    subgraph Fail1["❌ Payment Fails"]
        direction LR
        X1["πŸ“¦ Reserve<br/>Inventory βœ“"] --> X2["πŸ’³ Payment<br/>βœ— FAILS"] --> X3["πŸ“¦ Release<br/>Inventory ↩️"] --> X4["❌ Fail<br/>Order"]
    end

    subgraph Fail2["❌ Confirm Fails"]
        direction LR
        Y1["πŸ“¦ Reserve βœ“<br/>πŸ’³ Payment βœ“"] --> Y2["βœ… Confirm<br/>βœ— FAILS"] --> Y3["πŸ’³ Refund<br/>Payment ↩️"] --> Y4["πŸ“¦ Release<br/>Inventory ↩️"] --> Y5["❌ Fail<br/>Order"]
    end

    style Happy fill:#0d7377,stroke:#14ffec,color:#fff
    style Fail1 fill:#6b0f1a,stroke:#e94560,color:#fff
    style Fail2 fill:#6b0f1a,stroke:#e94560,color:#fff
Loading

Each state has 3 retries with exponential backoff (2s β†’ 4s β†’ 8s) before triggering compensation.


🧠 How It Works β€” Deep Dive

1️⃣ API Gateway β€” The Front Door

The REST API exposes two endpoints:

Method Path Lambda Purpose
POST /orders createOrder Submit a new order
GET /orders/{orderId} getOrder Check order status

Request validation happens before Lambda even runs. A JSON Schema model enforces:

  • userId β€” non-empty string
  • items β€” non-empty array, each with productId (string) and qty (integer β‰₯ 1)
  • totalAmount β€” number β‰₯ 0.01

Invalid requests get a 400 response at the API Gateway level β€” zero Lambda invocations, zero cost. Rate limiting (200 burst / 100 sustained) protects downstream services from traffic spikes.

2️⃣ createOrder Lambda β€” Dual-Layer Idempotency

This Lambda is the API entry point and the most critical piece for data integrity. It prevents duplicate orders at two independent layers:

Layer 1: Powertools Idempotency
  └─ Hashes the request body β†’ stores in Idempotency table (1hr TTL)
  └─ Duplicate body within 1 hour? β†’ returns cached response, handler never runs

Layer 2: DynamoDB Conditional Write
  └─ ConditionExpression: attribute_not_exists(orderId)
  └─ UUID collision (near-impossible)? β†’ caught and rejected with 409

Flow: Generate UUID β†’ write order to DynamoDB (status: PENDING) β†’ enqueue to SQS β†’ return 201 { orderId, status: "PENDING" }. The client gets an instant response; processing is fully asynchronous.

3️⃣ SQS β€” Buffer, Decouple, Retry

The queue sits between the API and the saga. This isn't just a "nice to have" β€” it's essential for resilience:

Feature Configuration Why
πŸ”„ Retry 3 attempts Failed messages get reprocessed before giving up
☠️ Dead Letter Queue 14-day retention Poisons messages are quarantined, not lost
⏱️ Long polling 10 seconds Reduces empty receives and API costs
πŸ‘οΈ Visibility timeout 60 seconds Prevents concurrent processing of the same message
🧩 Partial batch failures ReportBatchItemFailures One bad message doesn't block the whole batch of 10

4️⃣ processOrder Lambda β€” SQS Consumer

Receives up to 10 messages per batch, parses each, and starts a Step Functions Express execution. The orderId is used as the execution name β€” built-in dedup at the AWS level (duplicate execution names are rejected).

Returns { batchItemFailures } so only failed messages retry while successful ones are deleted by SQS.

5️⃣ Step Functions Express β€” The Saga Engine

stateDiagram-v2
    [*] --> ReserveInventory

    ReserveInventory --> ProcessPayment: βœ… Stock reserved
    ReserveInventory --> FailOrder: ❌ Insufficient stock

    ProcessPayment --> ConfirmOrder: βœ… Payment OK
    ProcessPayment --> ReleaseInventory: ❌ Payment declined

    ConfirmOrder --> EmitOrderPlaced: βœ… Status β†’ CONFIRMED
    ConfirmOrder --> RefundPayment: ❌ Update failed

    EmitOrderPlaced --> [*]: πŸ“‘ Event published

    ReleaseInventory --> FailOrder: ↩️ Stock restored
    RefundPayment --> CompensateRelease: ↩️ Payment refunded
    CompensateRelease --> FailOrder: ↩️ Stock restored
    FailOrder --> [*]: πŸ’€ Status β†’ FAILED
Loading

Why Express over Standard? Express state machines run synchronously, cost less (priced per execution, not per state transition), and are designed for sub-5-minute workflows. Perfect for order processing.

πŸ“¦ ReserveInventory

Atomic conditional update per item:

UpdateExpression: 'SET stock = stock - :qty'
ConditionExpression: 'attribute_exists(productId) AND stock >= :qty'

DynamoDB guarantees atomicity β€” two concurrent orders for the last item can't both succeed. If any item fails, already-reserved items in the same batch are rolled back within the Lambda before the saga-level compensation even kicks in.

πŸ’³ ProcessPayment

Simulates payment with a configurable failure rate (FAIL_PAYMENT_PERCENT, default 20%). This is intentional β€” it triggers compensation paths during demos so you can observe the saga rollback in X-Ray traces.

βœ… ConfirmOrder

Updates the order status from PENDING to CONFIRMED in DynamoDB. Uses ExpressionAttributeNames: { '#status': 'status' } because status is a DynamoDB reserved word.

πŸ“‘ EmitOrderPlaced

Publishes an OrderPlaced domain event to EventBridge with:

{
  "Source": "ordering-system",
  "DetailType": "OrderPlaced",
  "Detail": { "orderId", "userId", "totalAmount", "itemCount", "timestamp" }
}

This step has no compensation catch β€” if it fails after 3 retries, the saga ends but the order is already confirmed in DynamoDB. Event emission is "best effort" β€” the database is the source of truth.

6️⃣ EventBridge β€” Domain Events

A custom event bus (dev-ser-ord-sys-events) receives OrderPlaced events. An event rule pattern-matches on source and detail-type, routing every matched event to a CloudWatch Log Group for visibility and debugging.

flowchart LR
    A["⚑ emitEvent<br/>Lambda"] -->|PutEvents| B["🚌 Custom<br/>Event Bus"]
    B --> C{"πŸ“‹ order-placed-rule<br/>source: ordering-system<br/>detail-type: OrderPlaced"}
    C --> D["πŸ“Š CloudWatch<br/>Logs"]
    C -. "future" .-> E["πŸ“§ SNS<br/>Notification"]
    C -. "future" .-> F["⚑ Analytics<br/>Lambda"]

    style A fill:#1a1a2e,stroke:#e94560,color:#fff
    style B fill:#0f3460,stroke:#53a8b6,color:#fff
    style C fill:#16213e,stroke:#0f3460,color:#fff
    style D fill:#0d7377,stroke:#14ffec,color:#fff
    style E fill:#2d2d2d,stroke:#666,color:#999,stroke-dasharray: 5 5
    style F fill:#2d2d2d,stroke:#666,color:#999,stroke-dasharray: 5 5
Loading

The bus is extensible β€” future consumers (notifications, analytics, audit) can subscribe to the same events without modifying the producer.


πŸ—„οΈ Data Model

DynamoDB Tables

erDiagram
    ORDERS {
        string orderId PK "Partition Key (UUID)"
        string userId "Customer ID"
        list items "Array of productId + qty"
        number totalAmount "Order total"
        string status "PENDING | CONFIRMED | FAILED"
        string failureReason "null or error message"
        string createdAt "ISO 8601"
        string updatedAt "ISO 8601"
    }

    INVENTORY {
        string productId PK "Partition Key"
        string name "Product name"
        number price "Unit price"
        number stock "Available quantity"
        string category "Product category"
    }

    IDEMPOTENCY {
        string id PK "Request body hash"
        string status "INPROGRESS | COMPLETED"
        string data "Cached Lambda response"
        number expiration "TTL epoch (1hr)"
    }

    ORDERS ||--o{ INVENTORY : "items reference"
Loading
Table PK GSI TTL PITR
πŸ“‹ Orders orderId userId-createdAt-index β€” βœ…
πŸ“¦ Inventory productId β€” β€” βœ…
πŸ”‘ Idempotency id β€” expiration (1hr) β€”

πŸ—οΈ Tech Stack

Layer Technology Purpose
πŸ—οΈ IaC Terraform Custom reusable modules for every AWS resource
⚑ Compute Lambda (Node.js 20.x ESM) 11 functions, each with least-privilege IAM
πŸ“¦ Shared Code Lambda Layer Powertools + AWS SDK v3 + utility modules
πŸ” Orchestration Step Functions Express Saga pattern with compensation flows
πŸ’Ύ Storage DynamoDB (on-demand) 3 tables with PITR, GSI, TTL
🌐 API API Gateway REST JSON Schema validation, throttling, X-Ray
πŸ“¬ Queue SQS + DLQ Buffering, retry, partial batch failures
πŸ“‘ Events EventBridge Custom bus, pattern-matched rules
πŸ” Observability CloudWatch + X-Ray Structured logs, traces, custom metrics

🎯 Key Design Decisions

πŸ”’ Two-Layer Idempotency β€” Why one layer isn't enough

Layer 1 (Powertools): Hashes the request body and caches the response. Protects against network-level retries where the client re-sends the exact same payload.

Layer 2 (Conditional Write): attribute_not_exists(orderId) on the DynamoDB PutItem. Protects against the edge case where the idempotency cache TTL has expired but the order still exists.

Neither layer alone covers all scenarios β€” together they provide bulletproof deduplication.

πŸ“¦ Partial Batch Failure Reporting β€” Processing 10 messages without poisoning the batch

SQS delivers up to 10 messages per Lambda invocation. Without ReportBatchItemFailures, a single failing message would cause all 10 to retry β€” including the 9 that succeeded (which would then create duplicates).

By returning { batchItemFailures: [{ itemIdentifier: failedMessageId }] }, only the specific failed messages retry. The successful ones are deleted from the queue.

πŸ” Express vs Standard Step Functions β€” The right tool for the job
Feature Express Standard
Max duration 5 minutes 1 year
Pricing Per execution Per state transition
Execution mode Synchronous Asynchronous
Dedup via name βœ… βœ…

Order processing completes in seconds. Express is cheaper, synchronous (processOrder waits for the result), and the execution-name-based dedup prevents reprocessing the same order from SQS retries.

πŸ“¦ Atomic Inventory Reservation β€” Preventing overselling without locks
ConditionExpression: 'attribute_exists(productId) AND stock >= :qty'
UpdateExpression: 'SET stock = stock - :qty'

DynamoDB evaluates the condition and applies the update atomically in a single operation. Two concurrent orders each requesting the last unit cannot both succeed β€” one will get a ConditionalCheckFailedException. No distributed locks, no race conditions.

πŸ“‘ Best-Effort Event Emission β€” Why EmitOrderPlaced has no compensation

The EmitOrderPlaced saga step has retries but no Catch block. If EventBridge publishing fails after 3 attempts, the saga ends β€” but the order is already CONFIRMED in DynamoDB. The database is the authoritative source of truth, not the event. Downstream consumers are designed for eventual consistency.


πŸ—‚οΈ Project Structure

πŸ“ infrastructure/
β”œβ”€β”€ πŸ“„ main.tf                 # AWS provider, region, default tags
β”œβ”€β”€ πŸ“„ variables.tf            # Input variables (region, env, project name)
β”œβ”€β”€ πŸ“„ var.tfvars              # Variable values
β”œβ”€β”€ πŸ“„ dynamodb.tf             # 3 DynamoDB tables
β”œβ”€β”€ πŸ“„ sqs.tf                  # Order queue + dead letter queue
β”œβ”€β”€ πŸ“„ lambda.tf               # 11 Lambda functions + SQS event mapping
β”œβ”€β”€ πŸ“„ lambda_layer.tf         # Shared dependencies layer
β”œβ”€β”€ πŸ“„ api_gateway.tf          # REST API, routes, JSON Schema validation
β”œβ”€β”€ πŸ“„ step_functions.tf       # Express state machine (saga)
β”œβ”€β”€ πŸ“„ eventbridge.tf          # Custom event bus + rules + log target
β”œβ”€β”€ οΏ½ cloudwatch.tf           # Dashboard, alarms, SNS topic
β”œβ”€β”€ οΏ½πŸ“ asl/
β”‚   └── πŸ“„ order_saga.asl.json # Amazon States Language definition
β”œβ”€β”€ πŸ“ iam/
β”‚   └── πŸ“ policies/           # Per-Lambda IAM policy modules
└── πŸ“„ outputs.tf              # Terraform outputs

πŸ“ backend/
β”œβ”€β”€ πŸ“ layers/shared-deps/
β”‚   β”œβ”€β”€ πŸ“„ package.json        # Powertools, AWS SDK v3, @middy/core
β”‚   β”œβ”€β”€ πŸ“„ build_layer.sh      # Build + zip script
β”‚   └── πŸ“ nodejs/lib/
β”‚       β”œβ”€β”€ πŸ“„ dynamodb.mjs    # DynamoDB DocumentClient (X-Ray traced)
β”‚       β”œβ”€β”€ πŸ“„ sqs.mjs         # SQS client (X-Ray traced)
β”‚       β”œβ”€β”€ πŸ“„ sfn.mjs         # Step Functions client (X-Ray traced)
β”‚       β”œβ”€β”€ πŸ“„ eventbridge.mjs # EventBridge client (X-Ray traced)
β”‚       └── πŸ“„ response.mjs    # HTTP response helpers
└── πŸ“ lambdas/orders/
    β”œβ”€β”€ πŸ“ createOrder/        # API β†’ validate, write, enqueue
    β”œβ”€β”€ πŸ“ getOrder/           # API β†’ read order by ID
    β”œβ”€β”€ πŸ“ processOrder/       # SQS β†’ start saga execution
    β”œβ”€β”€ πŸ“ replayDlq/          # Ops β†’ drain DLQ to main queue
    β”œβ”€β”€ πŸ“ reserveInventory/   # Saga β†’ atomic stock decrement
    β”œβ”€β”€ πŸ“ releaseInventory/   # Saga β†’ compensation: restore stock
    β”œβ”€β”€ πŸ“ processPayment/     # Saga β†’ simulated payment
    β”œβ”€β”€ πŸ“ refundPayment/      # Saga β†’ compensation: log refund
    β”œβ”€β”€ πŸ“ confirmOrder/       # Saga β†’ status β†’ CONFIRMED
    β”œβ”€β”€ πŸ“ failOrder/          # Saga β†’ status β†’ FAILED
    └── πŸ“ emitEvent/          # Saga β†’ publish OrderPlaced event

πŸ“ scripts/
└── πŸ“„ seed_inventory.sh       # Seeds 10 sample products

πŸš€ Getting Started

Prerequisites

  • AWS CLI configured with valid credentials
  • Terraform β‰₯ 1.5.0
  • Node.js 20.x
  • jq (for seed script)

Deploy

# 1️⃣ Build the Lambda Layer
cd backend/layers/shared-deps && ./build_layer.sh

# 2️⃣ Initialize and deploy infrastructure
cd ../../../infrastructure
terraform init
terraform plan -var-file=var.tfvars
terraform apply -var-file=var.tfvars

# 3️⃣ Seed inventory data (10 products)
cd .. && ./scripts/seed_inventory.sh

Test

# Create an order
curl -X POST https://<api-id>.execute-api.eu-west-1.amazonaws.com/dev/orders \
  -H "Content-Type: application/json" \
  -d '{
    "userId": "user-123",
    "items": [
      { "productId": "PROD-001", "qty": 2 },
      { "productId": "PROD-003", "qty": 1 }
    ],
    "totalAmount": 109.97
  }'

# Check order status
curl https://<api-id>.execute-api.eu-west-1.amazonaws.com/dev/orders/<orderId>

πŸ” Observability

Every Lambda is instrumented with AWS Lambda Powertools:

Tool What It Provides
πŸ“ Logger Structured JSON logs with orderId correlation across all functions
πŸ”­ Tracer X-Ray tracing β€” every AWS SDK call appears in the service map
πŸ“Š Metrics Custom CloudWatch metrics: OrderCreated, PaymentFailed, InventoryReserved, etc.

All AWS SDK clients in the shared layer are wrapped with tracer.captureAWSv3Client(), so the full request journey β€” from API Gateway through Lambda, DynamoDB, SQS, Step Functions, and EventBridge β€” is visible as a single distributed trace in X-Ray.

πŸ“ˆ CloudWatch Dashboard

A unified dashboard (dev-ser-ord-sys-dashboard) provides real-time visibility across 6 widget rows:

Row Widgets
1 API Gateway β€” request count, latency percentiles (P50/P95/P99), 4xx/5xx errors
2 Step Functions β€” saga success vs failure, execution duration, throttles
3 SQS β€” messages sent/received/deleted, DLQ depth, oldest message age
4 Lambda duration P95 β€” API functions and saga step functions
5 Lambda errors and concurrent executions across all functions
6 Custom Powertools metrics β€” order lifecycle and payment/inventory events

🚨 CloudWatch Alarms

Alarm Condition Action
DLQ Depth > 10 messages visible SNS notification
Saga Failure Rate > 30% over 5 minutes SNS notification
API 5xx Rate > 5% over 5 minutes SNS notification

All alarms send to the dev-ser-ord-sys-alarms SNS topic. Subscribe an email or Slack webhook to receive alerts.

πŸ“š Documentation

Document Description
πŸ“‹ Build Order Phased implementation plan with dependency matrix
πŸ“– Project Details Full architecture documentation

Built with ❀️ on AWS Serverless

Terraform β€’ Lambda β€’ Step Functions β€’ DynamoDB β€’ SQS β€’ EventBridge β€’ API Gateway

About

A high-throughput, fault-tolerant serverless ordering system on AWS featuring DynamoDB sharded inventory, Step Functions saga orchestration, and Lambda Powertools ,built with Terraform, TypeScript, and event-driven patterns.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors