N8N AI-Powered Services Stack

A production-ready, low-ops blueprint for hosting multiple independent Docker services (N8N instances, AI services, databases, custom apps) on one VPS using Traefik reverse proxy and CloudFlare tunnel.

📖 New to this repository? See the Configuration Guide for detailed setup instructions.

🚀 Quick Start

Prerequisites

VPS with Docker and Docker Compose v2
Domain on CloudFlare
CloudFlare Tunnel setup (get TUNNEL_ID and credentials)
GitHub repository with configured secrets (for automated deployment)

1. Initial Configuration

After cloning this repository, you need to configure it for your environment:

Local Environment Setup

# Install dependencies
pip3 install -r requirements.txt

# Create your environment configuration
cp env.example .env

Configure `.env` file

Edit .env with your actual values:

# Domain Name Configuration
DOMAIN_NAME=your-domain.com

# Cloudflare Tunnel Token (from dashboard: Networks > Tunnels > Configure > Token)
CLOUDFLARE_TUNNEL_TOKEN=your_tunnel_token_here

2. GitHub Repository Setup (For Automated Deployment)

Configure these secrets in your GitHub repository (Settings > Secrets and variables > Actions):

Required Secrets:

# VPS Connection
VPS_SSH_KEY=<your-private-ssh-key-content>
PRODUCTION_VPS_HOST=your.vps.ip.address
STAGING_VPS_HOST=your.vps.ip.address    # Same or different VPS

# CloudFlare Configuration
CLOUDFLARE_API_TOKEN=<your-cloudflare-api-token>
CLOUDFLARE_TUNNEL_TOKEN=<your-tunnel-token>
DOMAIN_NAME=your-domain.com

# Optional: Notification webhooks
SLACK_WEBHOOK_URL=<your-slack-webhook>
DISCORD_WEBHOOK_URL=<your-discord-webhook>

SSH Key Setup:

# Generate SSH key pair for GitHub Actions
ssh-keygen -t rsa -b 4096 -C "github-actions@yourdomain.com" -f ~/.ssh/github-actions

# Copy public key to your VPS
ssh-copy-id -i ~/.ssh/github-actions.pub deployer@your-vps-ip

# Add private key content to VPS_SSH_KEY secret in GitHub
cat ~/.ssh/github-actions

3. CloudFlare Tunnel Setup

The configuration files now use environment variables. Your CloudFlare tunnel config will automatically use your domain:

# edge/cloudflared/config.yml uses ${DOMAIN_NAME}
ingress:
  - hostname: "*.${DOMAIN_NAME}"
    service: http://traefik:80

4. Deploy Edge Stack

# Initialize the project
./scripts/svc init

# Deploy Traefik + CloudFlared
./scripts/svc edge deploy

4. Create Your First AI-Powered N8N Instance

# Create AI services (internal only)
./scripts/svc new --name ollama-main --template ollama
./scripts/svc new --name qdrant-vectors --template qdrant

# Deploy AI services
./scripts/svc deploy ollama-main
./scripts/svc deploy qdrant-vectors

# Create N8N instance that can use AI services
./scripts/svc new --name n8n-ai --template n8n --domain n8n.example.com

# Deploy it
./scripts/svc deploy n8n-ai

5. Access Your Instance

Visit https://n8n.example.com - it should be accessible through the CloudFlare tunnel!

Your N8N instance can now connect to:

Ollama (LLM): http://ollama-main:11434
Qdrant (Vector DB): http://qdrant-vectors:6333

📁 Architecture Overview

┌─────────────────────────────────────────────────────┐
│                   CloudFlare Edge                   │
└────────────────┬────────────────────────────────────┘
                 │ (*.example.com)
                 ▼
┌─────────────────────────────────────────────────────┐
│            CloudFlared Tunnel                       │
└────────────────┬────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────┐
│              Traefik Reverse Proxy                  │
│          (Auto-discovery via Docker labels)         │
└────────────────┬────────────────────────────────────┘
                 │
    ┌────────────┼────────────┬──────────────┬──────────────┐
    ▼            ▼            ▼              ▼              ▼
┌────────┐  ┌────────┐  ┌────────┐    ┌────────┐    ┌────────┐
│  N8N   │  │  N8N   │  │ Custom │    │Monitor-│    │   AI   │
│   AI   │  │Instance│  │Service │    │  ing   │    │Services│
│        │  │   #2   │  │        │    │ Stack  │    │        │
└───┬────┘  └────────┘  └────────┘    └────────┘    └────────┘
    │                                                      ▲
    └───────────┌─────────────┬─────────────┬──────────────┘
                ▼             ▼             ▼
            ┌────────┐    ┌────────┐   ┌────────┐
            │ Ollama │    │ Qdrant │   │Postgres│
            │  (LLM) │    │Vector  │   │   DB   │
            │Internal│    │DB      │   │Internal│
            └────────┘    │Internal│   └────────┘
                          └────────┘

Key Benefits

AI-Ready: Built-in LLM (Ollama) and vector database (Qdrant) support
Wildcard tunnel: Add services without tunnel configuration
Zero-touch routing: Services self-register via Docker labels
Security: No host ports exposed, AI services internal-only
Isolation: Each service has its own network and resources

🛠️ Service Management

CLI Commands

# List all services
./scripts/svc list

# ===== AI SERVICES (Internal Only) =====

# Create Ollama LLM service (no resource limits)
./scripts/svc new --name ollama-main --template ollama

# Create Qdrant vector database
./scripts/svc new --name qdrant-vectors --template qdrant

# Create shared PostgreSQL database  
./scripts/svc new --name postgres-shared --template postgresql

# ===== N8N SERVICES (Web Accessible) =====

# Create N8N instance with AI capabilities
./scripts/svc new --name n8n-ai --template n8n --domain n8n.example.com

# Create additional N8N instances
./scripts/svc new --name n8n-dev --template n8n --domain n8n-dev.example.com

# ===== GENERIC SERVICES =====

# Create custom service
./scripts/svc new --name wordpress --template generic-app \
  --domain blog.example.com --image wordpress:latest --port 80

# ===== DEPLOYMENT & MANAGEMENT =====

# Deploy service
./scripts/svc deploy myapp

# View service info and status
./scripts/svc info myapp

# View logs
./scripts/svc logs myapp --follow

# Restart service
./scripts/svc restart myapp
# Remove service (with confirmation)
./scripts/svc remove myapp --volumes

Edge Stack Management

# Deploy edge stack
./scripts/svc edge deploy

# View edge stack logs
./scripts/svc edge logs

# Restart edge stack
./scripts/svc edge restart

# Check edge stack status
./scripts/svc edge status

🔄 Service Updates

Updating N8N and Other Services

This architecture provides zero-data-loss updates for all services through persistent Docker volumes. All workflows, credentials, execution history, and file uploads are automatically preserved during updates.

Quick Update Commands

# Update a specific service to latest version
./scripts/svc deploy service-name

# Update with fresh image pull
cd /opt/n8n-v2/shared/services/service-name
docker compose -p service-name pull
docker compose -p service-name up -d --remove-orphans

# Force complete container recreation (for testing)
docker compose -p service-name up -d --force-recreate

Automated Update via GitHub Actions

# Update specific service via GitHub Actions
gh workflow run service-management.yml \
  -f action=deploy-service \
  -f service_name=n8n-private \
  -f environment=production

# Full deployment update (updates all services)
gh workflow run deploy-to-vps.yml \
  -f environment=production \
  -f dry_run=false

Update Procedure Step-by-Step

1. Pre-Update Safety Check

# Check current service status
./scripts/svc info service-name

# Verify health before update
docker exec service-name wget -qO- http://localhost:5678/healthz

# Check data integrity
docker exec service-name ls -lah /home/node/.n8n/

2. Automatic Backup (Built-in)

The deployment system automatically creates comprehensive backups before any update:

Database backups: PostgreSQL/SQLite dumps
Volume backups: Complete data directory archives
Configuration backups: Service settings and environment files
Location: /opt/n8n-v2/shared/backups/

# Manual backup (optional - auto-backup is recommended)
BACKUP_DIR="/opt/n8n-v2/shared/backups/manual-$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR

# Backup N8N data volume
docker run --rm \
  -v service-name_n8n_data:/data \
  -v $BACKUP_DIR:/backup \
  alpine sh -c "tar czf /backup/n8n_data.tar.gz -C / data"

3. Execute Update

# Method 1: Using service CLI (recommended)
./scripts/svc deploy service-name

# Method 2: Direct Docker Compose (production)
cd /opt/n8n-v2/shared/services/service-name
docker compose -p service-name pull
docker compose -p service-name up -d --remove-orphans

# Method 3: GitHub Actions (remote deployment)
gh workflow run service-management.yml \
  -f action=deploy-service \
  -f service_name=service-name \
  -f environment=production

4. Post-Update Verification

# Verify service health
docker ps --filter "name=service-name"
./scripts/svc info service-name

# Check N8N specific health
docker exec service-name n8n --version
docker exec service-name wget -qO- http://localhost:5678/healthz

# Verify data preservation
docker exec service-name ls -lah /home/node/.n8n/
docker exec service-name du -sh /home/node/.n8n/

# Test web interface
curl -s https://your-domain.com/healthz

Data Preservation Architecture

How Data Survives Updates

Named Docker Volumes: All critical data stored in persistent volumes

volumes:
  service-name_n8n_data:     # Workflows, credentials, settings
  service-name_n8n_files:    # File uploads, binary data
  service-name_postgres_data: # Database (if using PostgreSQL)

Volume Persistence: Volumes exist independently of containers
- Container recreation doesn't affect volumes
- Data automatically remounts to new containers
- Zero downtime during volume reattachment

SQLite Database: N8N data stored in persistent SQLite files

/home/node/.n8n/database.sqlite      # Main database
/home/node/.n8n/database.sqlite-wal  # Write-ahead log
/home/node/.n8n/database.sqlite-shm  # Shared memory

What Gets Preserved

✅ Always Preserved:

N8N workflows and nodes
User credentials (encrypted)
Execution history
Binary data and file uploads
Custom settings and configuration
Database schemas and indexes
Environment variables (.env files)

❌ Not Preserved (by design):

Container state and processes
Temporary files in /tmp
Log files (use external logging)
Memory state and active connections

Version Management

Checking Current Versions

# Check N8N version
docker exec service-name n8n --version

# Check image information
docker images service-name --format "table {{.Repository}}\t{{.Tag}}\t{{.ID}}\t{{.CreatedAt}}"

# Check for available updates (manual)
docker run --rm n8nio/n8n:latest n8n --version

Update Strategies

Rolling Updates (Default):

# Gradual update with health checks
docker compose -p service-name up -d --remove-orphans

Blue-Green Updates:

# Create new instance, test, then switch DNS
./scripts/svc new --name service-name-v2 --template n8n --domain staging.example.com
# Test new instance, then update DNS

Canary Updates:

# Deploy to staging environment first
gh workflow run deploy-to-vps.yml -f environment=staging
# Test thoroughly, then deploy to production
gh workflow run deploy-to-vps.yml -f environment=production

Rollback Procedures

Automatic Rollback

GitHub Actions deployment includes automatic rollback on failure:

# Triggers automatically if deployment fails
rollback:
  runs-on: ubuntu-latest
  if: failure()
  steps:
    - name: Rollback to previous release

Manual Rollback

# Via GitHub Actions
gh workflow run deploy-to-vps.yml -f environment=production

# Manual rollback to previous container version
cd /opt/n8n-v2/shared/services/service-name
docker compose -p service-name down
docker run --rm -v service-name_n8n_data:/data alpine ls -la /data
# Verify data, then restart with previous image

Recovery from Backup

# List available backups
ls -la /opt/n8n-v2/shared/backups/

# Restore from volume backup
BACKUP_FILE="/opt/n8n-v2/shared/backups/volumes/service-name_n8n_data_20240913_111605.tar.gz"
docker volume rm service-name_n8n_data
docker volume create service-name_n8n_data
docker run --rm \
  -v service-name_n8n_data:/data \
  -v /opt/n8n-v2/shared/backups/volumes:/backup \
  alpine sh -c "cd /data && tar xzf /backup/$(basename $BACKUP_FILE)"

Best Practices

Production Updates

Always test in staging first
Schedule updates during low-traffic periods
Monitor logs during and after updates
Verify all integrations and webhooks
Keep rollback procedure ready

Monitoring Updates

# Watch update progress
docker logs -f service-name

# Monitor resource usage during update
docker stats service-name

# Check update notifications (setup alerts)
# Integration with your monitoring stack

Update Frequency

Security updates: Apply immediately
Feature updates: Test in staging, deploy weekly
Major version updates: Extensive testing, quarterly
Dependencies: Monitor for CVEs, update as needed

Troubleshooting Updates

Common Update Issues

Container won't start after update:

# Check logs for errors
docker logs service-name

# Verify volume mounts
docker inspect service-name --format "{{range .Mounts}}{{.Name}} -> {{.Destination}}{{end}}"

# Check disk space
df -h
docker system df

Data appears missing:

# Verify volumes exist
docker volume ls | grep service-name

# Check volume contents
docker run --rm -v service-name_n8n_data:/data alpine ls -la /data

# Restore from backup if needed

N8N specific issues:

# Database corruption check
docker exec service-name sqlite3 /home/node/.n8n/database.sqlite ".integrity_check"

# Reset N8N (preserves data but clears cache)
docker exec service-name rm -f /home/node/.n8n/crash.journal
docker restart service-name

Emergency Recovery

# Complete service recreation with data preservation
cd /opt/n8n-v2/shared/services/service-name
docker compose -p service-name down
docker compose -p service-name up -d --force-recreate

# Verify data integrity after recreation
docker exec service-name n8n --version
docker exec service-name ls -la /home/node/.n8n/

System-Wide Container Updates and Cleanup

Updating All Containers to Latest Versions

To ensure all containers across your deployment are running the latest versions:

# Update all containers in a specific service
cd /opt/n8n-v2/shared/services/service-name
docker compose -f compose.yml pull
docker compose -f compose.yml up -d --force-recreate

# Update monitoring stack
cd /opt/n8n-v2/shared/monitoring
docker compose -f compose.yml pull
docker compose -f compose.yml up -d --force-recreate

# Update edge stack (Traefik + CloudFlared)
cd /opt/n8n-v2/shared/edge
docker compose -f docker-compose.yml pull
docker compose -f docker-compose.yml up -d --force-recreate

Automated Update All Services:

# Update all services using the management CLI
for service in /opt/n8n-v2/shared/services/*/; do
    service_name=$(basename "$service")
    echo "Updating $service_name..."
    ./scripts/svc deploy "$service_name"
done

Docker Cleanup and Maintenance

Check Current Disk Usage:

# View disk usage summary
docker system df

# View detailed image information
docker images --all

# View container sizes
docker ps -as

Clean Up Dangling Images (safe - removes only unused images):

# Remove dangling images (untagged images)
docker image prune -f

# Expected output: Reclaimed space from unused images

Aggressive Cleanup (removes all unused Docker resources):

# Remove all unused images, containers, networks (preserves volumes)
docker system prune -af --volumes=false

# Expected reclaim: ~10-15GB on active systems

Complete Cleanup (⚠️ use with caution - includes volumes):

# Remove everything unused INCLUDING VOLUMES
# WARNING: This will delete unused volume data
docker system prune -af --volumes

Clean Up Specific Resources:

# Remove unused volumes only
docker volume prune -f

# Remove stopped containers only
docker container prune -f

# Remove unused networks only
docker network prune -f

# Remove dangling build cache
docker builder prune -f

Maintenance Best Practices

1. Regular Cleanup Schedule:

# Add to crontab for weekly cleanup (Sundays at 2 AM)
0 2 * * 0 /usr/bin/docker system prune -af --volumes=false > /var/log/docker-cleanup.log 2>&1

2. Pre-Update Cleanup:

# Clean up before pulling new images to free space
docker image prune -f
docker builder prune -f

# Then update
docker compose pull
docker compose up -d --force-recreate

3. Monitor Disk Usage:

# Check before cleanup
docker system df

# Perform cleanup
docker system prune -af --volumes=false

# Check after cleanup to verify reclaimed space
docker system df

4. Keep Only Active Images:

# Remove all images not associated with running containers
docker image prune -a -f

# This keeps only images currently in use

Example Maintenance Workflow

Monthly Maintenance Routine:

#!/bin/bash
# Monthly Docker maintenance script

echo "=== Docker Maintenance Started ==="

# 1. Check current usage
echo "Current disk usage:"
docker system df

# 2. Clean dangling images
echo "Removing dangling images..."
docker image prune -f

# 3. Update all services to latest
echo "Updating all services..."
cd /opt/n8n-v2/shared/edge
docker compose pull && docker compose up -d --force-recreate

cd /opt/n8n-v2/shared/monitoring
docker compose pull && docker compose up -d --force-recreate

for service in /opt/n8n-v2/shared/services/*/; do
    cd "$service"
    docker compose pull && docker compose up -d --force-recreate
done

# 4. Aggressive cleanup (excluding volumes)
echo "Performing aggressive cleanup..."
docker system prune -af --volumes=false

# 5. Check final usage
echo "Final disk usage:"
docker system df

echo "=== Maintenance Completed ==="

Save as /opt/n8n-v2/shared/scripts/maintenance.sh and run monthly.

Troubleshooting Cleanup Issues

"No space left on device" error:

# Emergency cleanup to free space immediately
docker system prune -af --volumes=false
docker volume prune -f

# Check available space
df -h

Images won't delete:

# Force remove specific image
docker rmi -f <image_id>

# Stop all containers and remove all images
docker stop $(docker ps -aq)
docker rm $(docker ps -aq)
docker rmi -f $(docker images -q)

Verify cleanup safety:

# List volumes before pruning
docker volume ls

# Check what will be removed (dry run)
docker system prune -af --volumes=false --dry-run

# List running containers to ensure they're not affected
docker ps

📋 Available Templates

The service management CLI supports multiple templates for different use cases:

Template	Description	Access	Default Resources
`n8n`	N8N workflow automation with AI support	Web (domain required)	2G memory, 1 CPU
`ollama`	Ollama LLM service for AI text generation	Internal only	No limits (full resources)
`qdrant`	Qdrant vector database for embeddings	Internal only	2G memory, 1 CPU
`postgresql`	PostgreSQL database with per-instance setup	Internal only	2G memory, 1 CPU
`generic-app`	Custom Docker applications	Web (domain required)	User configurable

Template Usage Examples

# AI-powered N8N instance
./scripts/svc new --name n8n-ai --template n8n --domain ai.example.com

# LLM service (models: llama3.2:1b, nomic-embed-text:latest)  
./scripts/svc new --name ollama-main --template ollama

# Vector database for RAG applications
./scripts/svc new --name vectors --template qdrant

# Dedicated PostgreSQL instance
./scripts/svc new --name db-shared --template postgresql

# Custom application (requires image and port)
./scripts/svc new --name blog --template generic-app \
  --domain blog.example.com --image wordpress:latest --port 80

🤖 AI Services

Available AI Services

Ollama (Large Language Models)

Template: ollama
Access: Internal only (http://service-name:11434)
Default Models: llama3.2:1b, nomic-embed-text:latest
Resource Limits: None (models need full system resources)
Use Cases: Text generation, embeddings, chat completion

Qdrant (Vector Database)

Template: qdrant
Access: Internal only (http://service-name:6333)
Resource Limits: 2G memory, 1 CPU
Ports: HTTP (6333), gRPC (6334)
Use Cases: Similarity search, RAG applications, embedding storage

PostgreSQL (Database)

Template: postgresql
Access: Internal only (service-name:5432)
Resource Limits: 2G memory, 1 CPU
Features: Per-instance databases, automatic backups
Use Cases: Application data, N8N workflows storage

N8N AI Integration Examples

# Connect to Ollama in N8N workflows
HTTP Request: POST http://ollama-main:11434/api/generate
Body: {"model": "llama3.2:1b", "prompt": "Hello World"}

# Connect to Qdrant for vector operations
HTTP Request: POST http://qdrant-vectors:6333/collections
Body: {"name": "documents", "vectors": {"size": 384, "distance": "Cosine"}}

# Connect to shared PostgreSQL
Database Host: postgres-shared
Database: postgres-shared_db  
User: postgres-shared
Password: [check services/postgres-shared/.env]

AI Service Deployment Pattern

# 1. Deploy AI infrastructure
./scripts/svc new --name ollama-prod --template ollama
./scripts/svc new --name qdrant-prod --template qdrant
./scripts/svc deploy ollama-prod
./scripts/svc deploy qdrant-prod

# 2. Create N8N with AI access
./scripts/svc new --name n8n-ai --template n8n --domain ai.example.com  
./scripts/svc deploy n8n-ai

# 3. Build AI workflows in N8N using:
# - ollama-prod:11434 for LLM operations
# - qdrant-prod:6333 for vector operations

📊 Monitoring

Built-in Stack

Prometheus: Metrics collection
Grafana: Dashboards and visualization
AlertManager: Alert notifications
Node Exporter: System metrics
cAdvisor: Container metrics

Deploy Monitoring

cd monitoring
docker compose up -d

# Access dashboards (protect with CloudFlare Access)
# - Grafana: https://grafana.example.com (admin/admin)
# - Prometheus: https://prometheus.example.com
# - AlertManager: https://alerts.example.com

Service Metrics

Add these labels to any service for automatic Prometheus scraping:

labels:
  - prometheus.io/scrape=true
  - prometheus.io/port=9090
  - prometheus.io/path=/metrics

🔒 Security

CloudFlare Access (Recommended)

Protect admin interfaces with CloudFlare Access:

traefik.example.com - Traefik dashboard
grafana.example.com - Grafana
prometheus.example.com - Prometheus
n8n.example.com - N8N UI

Webhook Security

For N8N webhooks that need external access:

Use separate webhook domains (unprotected by Access)
Or allow specific paths: /webhook* on main domain

Firewall

# Only SSH should be open to the internet
# CloudFlare tunnel handles all web traffic
ufw allow ssh
ufw deny 80,443
ufw enable

🗂️ Directory Structure

├── edge/                    # Traefik + CloudFlared
│   ├── docker-compose.yml
│   └── cloudflared/
│       ├── config.yml
│       └── YOUR_TUNNEL_ID.json
├── services/               # Individual services
│   ├── n8n-prod/
│   │   ├── compose.yml
│   │   └── .env
│   └── n8n-dev/
│       ├── compose.yml
│       └── .env
├── monitoring/             # Monitoring stack
│   ├── compose.yml
│   ├── prometheus/
│   └── grafana/
├── templates/             # Service templates
│   ├── n8n.yml              # N8N with AI support
│   ├── ollama.yml           # Ollama LLM service  
│   ├── qdrant.yml           # Qdrant vector database
│   ├── postgresql.yml       # PostgreSQL database
│   └── generic-app.yml      # Custom applications
└── scripts/
    └── svc               # Management CLI

🚨 Troubleshooting

Common Issues

Service returns 404 after update (requires Traefik restart):

This is a common issue where Traefik doesn't detect service updates automatically. After updating any service, you might need to restart Traefik.

Why this happens:

Missed Docker events: When containers are recreated, Traefik relies on Docker events to detect changes. Under load or during simultaneous updates, these events can be missed.
IP address changes: Recreated containers get new internal IP addresses, but Traefik's routing table might still point to old IPs.
Stale configuration cache: Traefik caches backend service information, and this cache doesn't always refresh on container recreation.
Timing issues: Services might restart faster than Traefik can detect the change.

Immediate Fix:

# Quick fix - restart Traefik to force configuration reload
docker restart edge-traefik-1

# Or restart entire edge stack
./scripts/svc edge restart

Permanent Solution (already implemented):

Your Traefik configuration has been updated with:

# These settings ensure Traefik detects service changes reliably
- --providers.docker.watch=true                    # Enable Docker event watching
- --providers.providersThrottleDuration=10s        # Throttle rapid config changes

After deploying this update, Traefik will:

Watch Docker events in real-time for container changes
Use unlimited timeout for Docker API connections (default behavior, suitable for systems with many containers)
Throttle configuration reloads (only processes the latest change if multiple occur within 10s)
Automatically detect service IP changes and routing updates

Deploy the fix:

# Update Traefik with new configuration
cd /opt/n8n-v2/shared/edge
docker compose pull
docker compose up -d --force-recreate

# Verify Traefik is watching correctly
docker logs edge-traefik-1 2>&1 | grep -i "provider.docker"

Alternative: Automated Traefik Restart

If you still experience issues, add a post-update hook:

#!/bin/bash
# /opt/n8n-v2/shared/scripts/post-service-update.sh

SERVICE_NAME=$1

echo "Service $SERVICE_NAME updated, checking Traefik..."

# Give service time to start
sleep 5

# Test if service is accessible
if ! curl -s -o /dev/null -w "%{http_code}" https://your-service-domain.com | grep -q "200\|301\|302"; then
    echo "Service not accessible, restarting Traefik..."
    docker restart edge-traefik-1
    sleep 5
    echo "Traefik restarted"
fi

Monitoring Traefik Configuration Updates:

# Watch Traefik logs during service updates
docker logs -f edge-traefik-1

# Look for these messages:
# - "Configuration received from provider docker"
# - "Creating service" or "Updating service"
# - "Server status changed"

# Check current routing configuration
curl -s http://localhost:8080/api/http/routers | jq '.'

Prevention Best Practices:

Update edge stack first: Always ensure Traefik is on the latest version
One service at a time: Update services sequentially, not in parallel
Monitor during updates: Watch Traefik logs during service updates
Health checks: Ensure services have proper health checks configured
Grace periods: Wait 10-15 seconds between service updates

Traefik Docker API timeout errors:

If you see errors like "context deadline exceeded" or "Client.Timeout" in Traefik logs:

# Error example:
# ERR Provider error, retrying error="context deadline exceeded (Client.Timeout)"

Causes:

Docker API taking too long to respond (many containers)
Docker daemon under heavy load
httpClientTimeout set too low

Solution:

# The default configuration uses no timeout (recommended)
# If you added httpClientTimeout, remove it from edge/compose.yml

# Verify Docker daemon is healthy
docker info

# Check Docker daemon load
docker stats --no-stream

# Restart Docker daemon if needed (as last resort)
sudo systemctl restart docker
sleep 10
docker compose -f /opt/n8n-v2/shared/edge/compose.yml up -d

Service not accessible:

# Check if edge stack is running
./scripts/svc edge status

# Check service status
./scripts/svc info myservice

# Check Traefik dashboard
https://traefik.example.com

Traefik not detecting service:

Ensure service is on edge network
Check Traefik labels syntax
Verify traefik.enable=true label

CloudFlare tunnel issues:

# Check tunnel logs
./scripts/svc edge logs

# Verify tunnel configuration
cat edge/cloudflared/config.yml

N8N webhook issues:

Verify WEBHOOK_URL environment variable
Check if CloudFlare Access is blocking webhooks
Use separate webhook domain if needed

Debug Commands

# Docker network inspection
docker network ls
docker network inspect edge

# Service logs
docker compose -f services/myapp/compose.yml logs -f

# Container inspection
docker inspect <container_name>

📈 Scaling

Adding More Services

Services scale horizontally - just add more:

./scripts/svc new --name app-2 --domain app2.example.com
./scripts/svc deploy app-2

Resource Management

Configure in compose files:

deploy:
  resources:
    limits:
      memory: 1G
      cpus: '1.0'
    reservations:
      memory: 512M
      cpus: '0.5'

Load Balancing

Traefik can load balance multiple containers:

# Deploy multiple replicas
deploy:
  replicas: 3

🔧 Advanced Configuration

Custom Traefik Middlewares

labels:
  # Rate limiting
  - traefik.http.middlewares.myapp-ratelimit.ratelimit.average=100
  - traefik.http.middlewares.myapp-ratelimit.ratelimit.burst=200
  
  # Basic Auth
  - traefik.http.middlewares.myapp-auth.basicauth.users=user:$$2y$$10$$...
  
  # Apply middlewares
  - traefik.http.routers.myapp.middlewares=myapp-ratelimit,myapp-auth

Environment-specific Configurations

Use different compose files:

# Development
docker compose -f compose.yml -f compose.dev.yml up -d

# Production  
docker compose -f compose.yml -f compose.prod.yml up -d

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
docs		docs
edge		edge
monitoring		monitoring
scripts		scripts
templates		templates
.gitignore		.gitignore
README.md		README.md
SECURITY.md		SECURITY.md
env.example		env.example
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation