Following is the AWS cloud deployment architecture of OpsForge AI, covering both serverless Lambda deployment [1a] and full EC2 stack deployment [2a]. It shows the integration with AWS Bedrock for LLM capabilities [3b], DynamoDB persistence [4a], and the agent orchestration flow [5a] that coordinates specialist agents for incident analysis. The serverless deployment uses API Gateway [6a] and EventBridge [6c] for event-driven processing, while the EC2 deployment provides a complete stack with nginx reverse proxy [2b] and systemd services.
AWS Lambda functions handle incoming events and route them to the orchestrator for analysis
Defines the main Lambda function for event-driven processing
Resources:
OpsForgeFunction:
Type: AWS::Serverless::Function
Main Lambda handler that processes incoming events
def lambda_handler(event, context):
"""AWS Lambda entry point for OpsForge AI"""
Routes events to appropriate handlers based on type
if event_type == 'alerts':
return handle_alerts(event.get('data', []))
elif event_type == 'metrics':
return handle_metrics(event.get('data', []))
elif event_type == 'incident':
return handle_incident(event.get('alerts', []), event.get('metrics', []))
Delegates incident analysis to the EnhancedOrchestrator
result = orchestrator.handle_incident(alerts, metrics=None)
Complete application stack running on EC2 with nginx reverse proxy and systemd services
Launches t3.medium Ubuntu instance for the full stack
aws ec2 run-instances \
--image-id ami-0e001c9271cf7f3b9 \
--instance-type t3.medium
Configures nginx to serve frontend and proxy API requests
server {
listen 80;
# Backend API proxy
location /api/ {
proxy_pass http://localhost:8000/api/;
Systemd service definition for FastAPI backend
ExecStart=/usr/bin/python3 backend_api.py
Systemd service for continuous incident simulation
ExecStart=/usr/bin/python3 live_data_generator.py
Integration with AWS Bedrock for LLM-powered reasoning and synthesis across all agents
Creates boto3 client for AWS Bedrock runtime
self.bedrock_runtime = boto3.client(
service_name='bedrock-runtime',
region_name=self.region_name
)
Calls Bedrock API with rate limiting and retry logic
response = self.bedrock_runtime.invoke_model(
modelId=model,
body=json.dumps(request_body)
)
EnhancedOrchestrator synthesizes findings using Claude
response = client.messages.create(
model=os.getenv("STRANDS_MODEL_ID", "claude-sonnet-4-20250514"),
max_tokens=2048,
system=ORCHESTRATOR_SYSTEM_PROMPT
Lambda IAM policy permissions for Bedrock access
- bedrock:InvokeModel
- bedrock:InvokeModelWithResponseStream
DynamoDB integration for persistent storage of incidents, patterns, and agent knowledge
DynamoDB table schema for incident storage
INCIDENT_MEMORY_TABLE = {
'TableName': 'opsforge-incident-memory',
'KeySchema': [
{'AttributeName': 'incident_id', 'KeyType': 'HASH'}
]
Stores learned patterns for future reference
PATTERN_LIBRARY_TABLE = {
'TableName': 'opsforge-pattern-library'
Stores agent-specific knowledge and learnings
AGENT_KNOWLEDGE_TABLE = {
'TableName': 'opsforge-agent-knowledge',
'KeySchema': [
{'AttributeName': 'agent_name', 'KeyType': 'HASH'},
{'AttributeName': 'knowledge_key', 'KeyType': 'RANGE'}
]
Initializes DynamoDB resource for table operations
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
EnhancedOrchestrator coordinates specialist agents for incident analysis and response
AgentSelector chooses relevant agents based on incident content
agents_involved = agent_selector.select_agents(alert_dicts, metric_dicts, threshold=60)
Calls AlertOps for alert correlation analysis
if "AlertOps" in agents_involved:
alert_analysis = analyze_alert_stream_with_memory(alerts)
Calls PredictiveOps for metric trend analysis
if "PredictiveOps" in agents_involved and metrics:
prediction = analyze_metrics(metrics)
Combines all agent analyses into unified response
synthesis = self._synthesize(alert_analysis, prediction, learned)
Serverless event routing through API Gateway REST endpoints and EventBridge rules
Defines REST API for external access
OpsForgeApi:
Type: AWS::Serverless::Api
Properties:
StageName: prod
POST endpoint for incident analysis requests
Path: /analyze
Method: POST
EventBridge rule for alert events
AlertEvent:
Type: EventBridgeRule
Properties:
Pattern:
source:
- opsforge.alerts
EventBridge rule for metric events
MetricEvent:
Type: EventBridgeRule
Properties:
Pattern:
source:
- opsforge.metrics