๐ค Production-grade agentic AI system for autonomous issue detection and resolution during e-commerce platform migrations.
This system demonstrates proper agent behavior that goes far beyond a single LLM call:
โ
State Management - Persistent state across the observe-reason-decide-act loop
โ
Multi-Step Reasoning - Pattern detection โ Root cause โ Risk assessment โ Action planning
โ
Tool Orchestration - 8+ specialized tools working together autonomously
โ
Feedback Loops - Learning from outcomes and adapting behavior
โ
Safety Controls - Multiple layers including safe mode, circuit breakers, and human oversight
Get the complete system running in under 10 minutes:
cd migrationguard-ai
setup.cmd # Start infrastructure
uv run python demo_agent_system.py # Run demoSee it in action: The demo showcases authentication error detection โ pattern analysis โ root cause reasoning โ automated ticket creation with full state tracking and feedback loops.
๐ Detailed Guide: QUICKSTART.md
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AGENT ORCHESTRATOR โ
โ (Observe-Reason-Decide-Act Loop) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ OBSERVE โ โ REASON โ โ DECIDE โ
โ โ โ โ โ โ
โ โข Signal โ โ โข Pattern โ โ โข Risk โ
โ Ingestion โโโโโโโโถ Detection โโโโโโโถ Assessment โ
โ โข Normalize โ โ โข Root Cause โ โ โข Action โ
โ โข Track โ โ Analysis โ โ Selection โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโ
โ ACT โ
โ โ
โ โข Execute โ
โ โข Track โ
โ โข Learn โโโโ
โโโโโโโโโโโโโโโโ โ
โ โ
โผ โ
โโโโโโโโโโโโโโโโ โ
โ FEEDBACK โ โ
โ LOOP โโโโ
โโโโโโโโโโโโโโโโ
- Multi-source signal ingestion (API errors, support tickets, webhooks)
- Real-time normalization and enrichment
- Time-series storage with TimescaleDB
- Pattern detection across signals using Elasticsearch
- Root cause analysis with Google Gemini 2.5 Flash (+ rule-based fallback)
- Evidence gathering and confidence scoring (75-92% confidence)
- Automated risk assessment (low/medium/high)
- Approval requirements for high-risk actions
- Safety controls (safe mode, circuit breakers)
- Rate limiting and retry logic
- Graceful degradation on failures
- Comprehensive audit trail
- Outcome tracking and analysis
- Confidence calibration from results
- Adaptive behavior based on feedback
- Safe Mode: Automatic activation on critical errors
- Circuit Breakers: Fault tolerance for external services
- Graceful Degradation: Fallback mechanisms (Claude โ rules, Elasticsearch โ PostgreSQL, Kafka โ Redis)
- Human Oversight: Approval workflows and manual controls
200+ Tests with 85%+ Coverage
- โ 150+ Unit Tests (core components, services, integrations)
- โ 50+ Property-Based Tests (RBAC, redaction, API, decisions, patterns)
- โ Integration Tests (error handling, end-to-end flows)
- โ All tests passing with comprehensive coverage
uv run pytest tests/unit/ -v- Backend: Python 3.11+, FastAPI, Pydantic
- AI: Google Gemini 2.5 Flash (FREE tier, 15 req/min) with rule-based fallback
- Agent Framework: Custom orchestration with state management and feedback loops
- Database: PostgreSQL + TimescaleDB (time-series)
- Cache: Redis (caching, rate limiting, buffering)
- Search: Elasticsearch (pattern matching, full-text search)
- Streaming: Apache Kafka (event streaming, async processing)
- Metrics: Prometheus + Grafana
- Logs: Structured logging with ELK stack support
- Visualization: Kibana for log exploration
- Containers: Docker + Docker Compose
- Orchestration: Kubernetes-ready
- CI/CD: GitHub Actions ready
Input: 3 signals (2 API 401 errors + 1 support ticket)
Agent Behavior:
- ๐ญ Observe: Ingest and normalize signals
- ๐ Detect: Identify auth failure pattern (confidence: 0.85)
- ๐ง Reason: Analyze root cause โ "authentication_error"
- โ๏ธ Decide: Select "create_support_ticket" (risk: low)
- โก Act: Create ticket with troubleshooting steps
- ๐ Learn: Track outcome, calibrate confidence
Output: Support ticket created with authentication guidance
Trigger: Confidence drift detected (expected: 0.90, actual: 0.75)
Agent Behavior:
- ๐ก๏ธ Safe mode automatically activated
- โธ๏ธ All actions require human approval
- ๐ Actions queued for review
- ๐ Operator notified
- โ Manual deactivation by authorized operator
- QUICKSTART.md - Get started in 10 minutes
- INFRASTRUCTURE_SETUP.md - Detailed infrastructure guide
- README_DEMO.md - Demo explanation and agent behavior
- HACKATHON_SUBMISSION.md - Complete submission details
- DEVELOPMENT.md - Development guide
- API Docs: http://localhost:8000/docs (when running)
- Docker Desktop - Download
- Python 3.11+ with
uv- Install uv - Git (for cloning)
cd migrationguard-ai
setup.cmdThis will:
- โ Start all infrastructure services (PostgreSQL, Redis, Kafka, Elasticsearch)
- โ Run database migrations
- โ Create Kafka topics and Elasticsearch indices
- โ Verify connectivity
REM 1. Start infrastructure
docker-compose up -d
REM 2. Wait for services (30 seconds)
timeout /t 30
REM 3. Check connectivity
uv run python scripts/check_infrastructure.py
REM 4. Run migrations
uv run alembic upgrade head
REM 5. Setup Kafka and Elasticsearch
uv run python scripts/setup_infrastructure.pySee the complete agent in action:
uv run python demo_agent_system.pyuv run pytest tests/unit/ -vuv run uvicorn src.migrationguard_ai.api.app:app --reloadAPI available at: http://localhost:8000
API docs: http://localhost:8000/docs
cd frontend
npm install
npm run devFrontend available at: http://localhost:3000
| Service | URL | Credentials |
|---|---|---|
| API | http://localhost:8000 | - |
| API Docs | http://localhost:8000/docs | - |
| Grafana | http://localhost:3001 | admin/admin |
| Kibana | http://localhost:5601 | - |
| Prometheus | http://localhost:9090 | - |
| Elasticsearch | http://localhost:9200 | - |
migrationguard-ai/
โโโ src/migrationguard_ai/
โ โโโ agent/ # Agent orchestration (state, graph)
โ โโโ api/ # FastAPI REST API
โ โโโ core/ # Core components (auth, config, safety)
โ โโโ db/ # Database models (SQLAlchemy)
โ โโโ services/ # Business logic (decision, action, pattern)
โ โโโ integrations/ # External integrations (support systems)
โ โโโ workers/ # Background workers (pattern detection)
โโโ tests/
โ โโโ unit/ # 150+ unit tests
โ โโโ integration/ # Integration tests
โ โโโ e2e/ # End-to-end tests
โโโ alembic/ # Database migrations
โโโ scripts/ # Setup and utility scripts
โโโ frontend/ # React dashboard (TypeScript)
โโโ docker-compose.yml # Infrastructure setup
โโโ demo_agent_system.py # Complete agent demo
โโโ setup.cmd # Automated setup script
REM All tests
uv run pytest tests/unit/ -v
REM With coverage
uv run pytest tests/unit/ --cov=src --cov-report=html
REM Specific test file
uv run pytest tests/unit/test_decision_engine.py -v
REM Property-based tests
uv run pytest tests/unit/test_*_properties.py -vREM Format code
uv run black src tests
REM Lint code
uv run ruff check src tests
REM Type checking
uv run mypy srcREM Create migration
uv run alembic revision --autogenerate -m "Description"
REM Apply migrations
uv run alembic upgrade head
REM Rollback
uv run alembic downgrade -1All configuration via environment variables in .env file:
# Google Gemini API (FREE tier - get key at https://aistudio.google.com/apikey)
GOOGLE_API_KEY=your-api-key-here
# Database
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=migrationguard
POSTGRES_PASSWORD=changeme
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
# Kafka
KAFKA_BOOTSTRAP_SERVERS='["localhost:9092"]'
# Elasticsearch
ELASTICSEARCH_HOSTS='["http://localhost:9200"]'
# Agent Configuration
AGENT_CONFIDENCE_THRESHOLD=0.7
AGENT_HIGH_RISK_APPROVAL_REQUIRED=trueExposed at /metrics:
- Signal ingestion rate
- Pattern detection latency
- Decision accuracy
- Action success rate
- System resource usage
Structured JSON logs for:
- Signal processing
- Pattern detection
- Root cause analysis
- Decision making
- Action execution
- Audit trail
Pre-configured dashboards:
- System health and performance
- Agent decision metrics
- Business impact (ticket deflection, resolution time)
- Infrastructure health
REM Stop all services
docker-compose down
REM Stop and remove all data
docker-compose down -vREM Start Docker Desktop, then verify:
docker psREM Check logs:
docker-compose logs [service-name]
REM Restart services:
docker-compose restartREM Reset database:
docker-compose down -v
docker-compose up -d postgres
timeout /t 10
uv run alembic upgrade headREM Verify infrastructure:
uv run python scripts/check_infrastructure.py
REM Run with verbose output:
uv run pytest tests/unit/ -v -s- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Run code quality checks (
black,ruff,mypy) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with FastAPI
- AI powered by Google Gemini (FREE tier)
- Infrastructure by Docker
- Testing with pytest and Hypothesis
- Documentation: See the documentation files in the repository
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built for the Hackathon | Production-Ready | Fully Tested | Open Source