Project Scope: Lexecon v1.0 - Enterprise-grade AI Governance Decision Engine
Current Status: Phases 1-4 Complete (50%). Phases 5-8 In Progress.
Objective: Complete enterprise readiness for AI governance v1.0, enabling:
- SOC 2 Type II certification
- Production deployment at scale
- GDPR/EU AI Act compliance
- 99.9%+ uptime SLA
Out of Scope for v1.0:
- Multi-domain expansion (Finance, Healthcare, Defense, Supply Chain, Energy)
- Domain plugin system
- Universal governance protocol
Future: Multi-domain support will be Lexecon v2.0, implemented as a separate repository and architecture.
Current State: No automated deployment pipeline. Manual builds and deployments.
Target State: Fully automated CI/CD with feature flags, staged rollouts, and rollback capability.
Implementation Time: ~2 hours (Claude code)
Acceptance Criteria:
-
push to maintriggers test suite automatically- Unit tests pass (>80% coverage)
- Integration tests pass
- Linting passes (Black, isort, flake8)
- Type checking passes (mypy on core modules)
- Failed tests block merge to main (required branch protection)
-
push to mainautomatically builds Docker image and pushes to registry - Docker image tagged with commit SHA and
latest - Build pipeline succeeds on every valid commit
- Build logs accessible and searchable
- Build can be re-triggered if transient failure occurs
Implementation Scope:
.github/workflows/:
- test.yml (pytest, coverage, linting)
- build.yml (Docker build & push)
- deploy-staging.yml (deploy to staging on main)
- deploy-prod.yml (manual approval, deploy to prod)Key Decisions:
- Docker registry: GitHub Container Registry (ghcr.io) - no additional cost
- Test framework: pytest (already in use)
- Linting: Black + isort + flake8 (standard Python)
- No time-based scheduled deployments (manual deploys only)
Implementation Time: ~3 hours (Terraform + Helm files)
Acceptance Criteria:
- All infrastructure defined in code (no manual changes in cloud console)
- Staging and production environments defined identically
- Database schema version controlled and versioned
- Secrets injected at deployment time (never in code)
- Environment variables documented and validated
- Infrastructure changes reviewed before deployment (same as code review)
Implementation Scope:
infrastructure/:
terraform/
main.tf (provider, resources)
variables.tf (inputs)
outputs.tf (deployment details)
prod.tfvars (production variables)
staging.tfvars (staging variables)
helm/
Chart.yaml (app metadata)
values.yaml (default config)
templates/ (k8s manifests)
deployment.yaml
service.yaml
configmap.yaml
secret.yaml (template, values from env)
scripts/
deploy.sh (deploy to staging/prod)
rollback.sh (rollback previous version)
Key Decisions:
- Terraform for cloud infrastructure (Kubernetes cluster, databases)
- Helm for Kubernetes deployment (app configuration)
- Secrets stored in cloud secrets manager (AWS Secrets Manager or equivalent)
- No local state files (use remote state backend)
Acceptance Criteria:
- Staging deployment automatic on every
push to main- Deployment executes without manual intervention
- Health checks pass before marking success
- Rollback automatic if health checks fail
- Production deployment requires manual approval
- Approval gate exists in GitHub Actions
- Approval required before any prod changes
- Change log automatically generated from commits
- Rollback available at any time (previous version)
- Single command:
./scripts/rollback.sh prod - Rollback completes successfully
- Health checks pass before marking success
- Single command:
- Deployment logs captured and searchable
- All deployment events in audit trail
- Approver tracked
- Change details captured
Deployment Architecture:
push to main
↓
[GitHub Actions] → test.yml → build.yml
↓
ghcr.io/lexecon:abc123 (Docker image pushed)
↓
[Auto] Deploy to staging (Helm upgrade)
↓
Health checks: /health, /health/ready, /health/live
↓
Slack notification: "Deployed to staging: abc123"
↓
[Manual] Deploy to prod (requires approval)
↓
Slack notification: "Ready to deploy prod. Approve here: [link]"
↓
User clicks approve
↓
[Deploy to prod] (Helm upgrade)
↓
Health checks: /health, /health/ready, /health/live
↓
Slack notification: "Deployed to prod: abc123"
Key Decisions:
- Staging mirrors production (same resources, same config, different data)
- Prod deployments are blue-green (new version runs alongside old, switch at load balancer)
- Rollback is instantaneous (revert to previous Helm release)
- No canary deployments initially (complexity not justified yet)
Implementation Time: ~1.5 hours (integrate Unleash, wrap 3 features)
Acceptance Criteria:
- Feature flag system implemented (LaunchDarkly or Unleash)
- At minimum 3 flags configured:
enable_async_auth(toggle new async/await implementation)enable_observability_export(toggle OTLP export to Datadog)enable_strict_validation(toggle extended input validation)
- Flags queryable in code:
if flags.is_enabled('enable_async_auth'): - Flag changes take effect without redeployment
- Flag evaluation does not introduce measurable latency regression
- Audit trail: who changed what flag when
Implementation:
# src/lexecon/core/flags.py
from unleash import Unleash
flags = Unleash(
url="https://unleash.example.com",
app_name="lexecon",
instance_id="prod-1"
)
# Usage in code
if flags.is_enabled("enable_async_auth"):
use_async_auth_service()
else:
use_sync_auth_service()Key Decisions:
- Unleash (open-source) vs LaunchDarkly (commercial)
- Flag evaluation cached for 15 seconds (balance: freshness vs latency)
- Feature flags used for risk mitigation, not A/B testing
Current State: Observability code written but not integrated into operations (no dashboards, no alerts).
Target State: Production observability with automated alerting, SLA monitoring, and on-call runbooks.
Implementation Time: ~2 hours (configure Datadog export, verify traces)
Acceptance Criteria:
- All traces exported to centralized backend (Datadog or Jaeger)
- Trace latency histogram visible (p50, p95, p99)
- Trace errors automatically flagged
- Distributed trace visualization works (request → auth service → policy engine → ledger)
- All logs centralized (Datadog, ELK, or equivalent)
- Logs searchable by request_id (trace correlation)
- Log levels: ERROR, WARN, INFO (no DEBUG in prod)
- Logs retained and searchable
- Performance regressions identified automatically
- Latency spikes detected and alerted
- Trace comparison available (before/after)
- Error tracking with context
- Every error includes: trace_id, user_id, action, timestamp
- Errors grouped by type and location
- Error rate tracked per endpoint
Configuration:
# config/observability.yaml
traces:
enabled: true
export_to: "datadog"
sampling_rate: 1.0 # 100% in prod, adjust based on volume
datadog_endpoint: "https://trace.datadoghq.com"
logs:
enabled: true
export_to: "datadog"
level: "INFO"
json_format: true
retention_days: 90
metrics:
enabled: true
export_to: "datadog"
cardinality_limit: 10000Key Decisions:
- Datadog for unified APM/logs/metrics (reduces tool sprawl)
- 100% trace sampling in prod (within Lexecon's scale)
- Automatic trace correlation via request headers
- No personally identifiable information in traces (audit requirements)
Implementation Time: ~2.5 hours (define metrics, configure alerts, build dashboard)
Acceptance Criteria:
- Decision latency histogram tracked
- P50, P95, P99 visible in dashboard
- Regression detection (p95 latency increases >10% from baseline)
- Decision error rate tracked
- Count of permit/deny/escalate decisions
- Count of errors (policy not found, validation failed, etc.)
- Error rate visible in dashboard
- Authentication latency tracked
- Login latency p95
- Password validation latency p95
- Async auth compared to sync implementation (regression detection)
- Database performance tracked
- Query latency p95, p99
- Query count per type
- Slow query detection (queries regressing from baseline)
- Ledger write performance tracked
- Write latency p95
- Write performance baseline established
- Rate limiter performance
- Token bucket saturation %
- Requests rejected/allowed ratio
- Rejection rate baseline established
- Dependency health
- Database connection pool utilization
- Pool saturation threshold defined and alerted
- Circuit breaker status (open/closed)
Alert Rules:
alerts:
- name: decision_latency_regression
condition: p95_decision_latency > baseline + 10%
severity: WARNING
notification: "slack:#alerts"
runbook: "docs/runbooks/high_latency.md"
- name: error_rate_spike
condition: error_rate > baseline + 50% OR error_rate > threshold_defined
severity: CRITICAL
notification: "slack:#alerts, pagerduty:on-call"
runbook: "docs/runbooks/high_errors.md"
- name: database_connection_pool_saturation
condition: pool_utilization > saturation_threshold
severity: WARNING
notification: "slack:#alerts"
runbook: "docs/runbooks/db_pool.md"
- name: circuit_breaker_state_change
condition: circuit_breaker_status changes from "closed" to "open"
severity: CRITICAL
notification: "pagerduty:on-call"
runbook: "docs/runbooks/circuit_breaker.md"Dashboard:
Title: Lexecon Production Health
Row 1: Key Metrics
- Decision latency (p50, p95, p99)
- Error rate %
- Requests/sec
- Uptime %
Row 2: Decision Breakdown
- Permit vs Deny vs Escalate (pie chart)
- Error types (bar chart)
- Decision latency by endpoint (heatmap)
Row 3: Infrastructure
- Database latency (p95)
- Database pool utilization
- Memory usage
- CPU usage
Row 4: Dependency Health
- Database connection pool status
- Circuit breaker states
- External service health
Implementation Time: ~1.5 hours (implement /health endpoints, set up SLA tracking)
Acceptance Criteria:
-
/healthendpoint returns detailed status{ "status": "healthy", "components": { "database": {"status": "healthy", "latency_ms": 3}, "ledger": {"status": "healthy", "latency_ms": 5}, "auth_service": {"status": "healthy"}, "rate_limiter": {"status": "healthy"} }, "uptime_seconds": 345600, "timestamp": "2026-01-25T10:30:00Z" } -
/health/readyreturns 200 only if ready to serve traffic -
/health/livereturns 200 only if process alive - SLA tracked: uptime measured and trended
- Monthly uptime baseline established
- Regression from baseline detected and alerted
- Incident recovery measured
- Time from alert to resolution tracked
- Improvement over time visible
Health Check Implementation:
# src/lexecon/observability/health.py
class HealthCheck:
def __init__(self):
self.db = DatabaseHealthCheck()
self.ledger = LedgerHealthCheck()
self.auth = AuthServiceHealthCheck()
async def check_health(self):
"""Return detailed health status."""
return {
"status": "healthy" if all_healthy else "degraded",
"components": {
"database": await self.db.check(),
"ledger": await self.ledger.check(),
"auth_service": await self.auth.check(),
},
"uptime_seconds": time.time() - START_TIME,
"timestamp": datetime.utcnow().isoformat()
}
async def check_ready(self):
"""Return True if ready to serve traffic."""
return all([
await self.db.is_responsive(),
await self.ledger.is_responsive(),
await self.auth.is_responsive(),
])SLA Dashboard:
Title: SLA & Uptime
- Monthly uptime %: 99.95%
- Target uptime: 99.9%
- Status: EXCEEDING TARGET
- Hours downtime this month: 3.6
- Incidents: 2
- Jan 15: Database connection pool exhaustion (15 min)
- Jan 22: Rate limiter saturation attack (3.5 min)
Acceptance Criteria:
- Runbook exists for each alert type:
- High decision latency
- High error rate
- Database connection pool exhausted
- Circuit breaker open
- Rate limiter saturation
- Deployment failed
- Each runbook includes:
- Symptom (what the alert is)
- Immediate investigation steps
- Root cause candidates
- Resolution steps
- Escalation path (who to page if not resolved in 5 min)
- Runbooks tested quarterly (team goes through each one)
Example Runbook:
# High Error Rate Runbook
## Symptom
Alert: error_rate > 1%
Currently at 3.2% of requests returning errors
## Immediate Investigation
1. Check error logs:datadog logs where env:prod and status:error
What error type? (validation, database, auth, policy)
2. Check recent deployments:
- Did error rate spike after a deployment?
- Run: `kubectl rollout history deployment/lexecon-api`
3. Check external dependencies:
- Database: Run `/health` endpoint, check database latency
- Auth service: Check if MFA service is failing
- Policy engine: Check policy load errors
## Resolution
### If error_type == "validation_error"
- Likely: Recent code change introduced stricter validation
- Check: Last 3 commits to validation.py
- Action: Either fix the validation rule or rollback deployment
### If error_type == "database_error"
- Likely: Database connection pool exhausted
- Check: Monitor database pool utilization
- Action: Increase pool size OR identify slow queries causing backlog
- Immediate: Kill slow queries: `SELECT * FROM pg_stat_statements`
### If error_type == "auth_error"
- Likely: MFA service timeout or intermittent failure
- Check: Is MFA service responding? `/health/mfa`
- Action: Restart MFA service or failover to backup
- Immediate: Disable MFA requirement temporarily: `feature_flags.disable('require_mfa')`
## Escalation
If error rate not resolved in 5 minutes:
1. Page database team (if db errors)
2. Page on-call engineer (if code errors)
3. Prepare rollback plan (have commit hash ready)
## Post-Incident
1. Update this runbook with what you learned
2. Create ticket to prevent recurrence
Current State: GDPR/SOC 2 documentation incomplete. Disaster recovery untested.
Target State: SOC 2 Type II certification ready. GDPR compliance verified.
Acceptance Criteria:
- Security policy document exists and reviewed
- Password policy (complexity, expiration, rotation)
- MFA requirements (who must use, how)
- Access control policy (role definitions, approval process)
- Incident response procedure
- Change management process
- Access control audit
- Every user has documented reason for access
- Admin access limited to <3 people
- No shared credentials
- Quarterly access review (deactivate unused accounts)
- Data encryption audit
- Encryption at rest (database encrypted)
- Encryption in transit (TLS 1.2+ on all endpoints)
- Key rotation procedure documented
- Keys stored in secrets manager (never in code)
- Audit logging complete
- Every data access logged (who, what, when)
- All privilege changes logged
- All policy changes logged
- Logs retained 90 days minimum
- Logs tamper-protected (append-only)
- Vulnerability management
- Automated vulnerability scanning (Dependabot, Snyk)
- SLA for patching: Critical <24h, High <7d, Medium <30d
- Vulnerability disclosure policy published
- Business continuity
- Disaster recovery plan (RTO, RPO defined)
- Backup tested monthly
- Alternate data center identified
SOC 2 Questionnaire Answers:
CC1: Information Security Governance
- Security policy: See docs/policies/security_policy.md
- Security roles: See docs/governance/roles.md
- Risk assessment: Annual in January
CC2: Information Security Awareness
- Training: All employees trained on security policy
- Third-party: No contractors with data access
- Secure development: Reviewed in code review process
CC3: Risk Assessment
- Risk assessment: Annual + when major changes
- Threat modeling: Done for each API endpoint
- Monitoring: Continuous via APM/logs
A1: Confidentiality
- Encryption at rest: AES-256
- Encryption in transit: TLS 1.2+
- Access control: RBAC with approval process
Acceptance Criteria:
- Data processing agreement (DPA) with customers documented
- Right to access implemented
- User can request: "Give me all data you have about me"
- Response mechanism documented (timeline defined)
- Export format: JSON
- Right to deletion implemented
- User can request: "Delete all data about me"
- Deletion process documented (timeline defined)
- Backup retention policy defined
- Data minimization audit
- Only collect data needed for decision making
- Data retention policy documented
- PII encrypted if retained
- Consent management
- Explicit opt-in for data collection documented
- Consent documented in audit trail
- Opt-out mechanism implemented
- Breach notification plan
- Incident response procedure documented (who to call, how to escalate)
- Notification process to users defined
- Notification process to regulators defined
- Data protection impact assessment (DPIA)
- Process defined for new data processing
- Risk assessment methodology documented
- Mitigation process defined
GDPR Compliance Checklist:
## Data Processing
- [ ] Collect only required data for decision making
- [ ] Retention policy: Decision logs 30 days, override logs 1 year, audit logs 3 years
- [ ] Data minimization: No SSN, credit card numbers, medical records unless required
- [ ] PII encrypted in transit and at rest
## User Rights
- [ ] Right to access: /api/v1/users/{id}/data-export endpoint
- [ ] Right to deletion: /api/v1/users/{id}/delete-all-data endpoint
- [ ] Right to portability: /api/v1/users/{id}/data-export (JSON/CSV)
- [ ] Right to object: Can opt-out of all processing
## Transparency
- [ ] Privacy policy published on website
- [ ] Consent form before using service
- [ ] Data processing documented in privacy notice
- [ ] Processing algorithm explained to users
## Accountability
- [ ] Data Protection Officer designated
- [ ] Register of processing activities maintained
- [ ] Risk assessments for high-risk processing
- [ ] Incident response plan documentedAcceptance Criteria:
- Recovery objectives defined
- RTO (recovery time objective) - maximum acceptable downtime
- RPO (recovery point objective) - maximum acceptable data loss
- Document in disaster recovery plan
- Backup strategy documented
- Frequency: Regular (not ad-hoc)
- Location: Geographically redundant
- Retention: Documented and enforced
- Backup verification
- Test restore from backup (frequency documented)
- Restore to alternate environment
- Verify data integrity on every test
- Disaster recovery plan
- Steps to failover documented
- Communication chain defined
- Roles and responsibilities documented
- Failover procedures
- Step-by-step instructions documented
- Failover can be executed by trained team member
- Verification steps defined
Disaster Recovery Runbook:
# Disaster Recovery Runbook
## Scenario: Primary Database Corruption
### Detection
- Alert: Database corruption detected
- Health check fails: SELECT COUNT(*) FROM decisions fails
- Multiple users report "500 errors"
### Immediate Actions (0-5 min)
1. Page on-call database team
2. Alert all users: "Service degradation - investigating"
3. Take backup of corrupted database (for investigation)
4. Check backup status: Is latest backup valid?
### Recovery Actions (5-30 min)
1. Restore database from latest backupkubectl exec -it lexecon-db --
psql -U admin < /backups/lexecon-db-latest.sql
2. Wait for recovery to complete (estimated 10-15 min)
3. Run integrity checks:
SELECT COUNT() FROM decisions; SELECT COUNT() FROM policies;
4. Failover load balancer to restored database
### Verification
1. Health check passes: curl https://api.lexecon.io/health
2. Decision requests succeed: Test with sample request
3. Verify data consistency: Compare with audit log
4. Check for data loss: Compare row counts to baseline
### Post-Recovery
1. Investigation: Why did corruption happen?
2. Root cause mitigation: Prevent recurrence
3. Backup strategy review: Was backup frequency adequate?
4. Update this runbook with lessons learned
Acceptance Criteria:
- Policy documents written and reviewed
- Security policy
- Acceptable use policy
- Incident response policy
- Change management policy
- Evidence collected for SOC 2 audit
- Access logs (every login, privilege change)
- Change logs (every production deployment)
- Vulnerability scans (monthly results)
- Incident logs (all security incidents)
- Documentation audit trail
- Every policy change tracked
- Approved by security lead
- Reviewed annually
- Training documentation
- Security awareness training: All employees
- Completion records: Saved for audit
Compliance Documentation Checklist:
## Required Documents
- [x] Information Security Policy
- [x] Acceptable Use Policy
- [x] Password Policy
- [x] Access Control Policy
- [x] Incident Response Plan
- [x] Disaster Recovery Plan
- [x] Data Retention Policy
- [x] Privacy Policy (public-facing)
- [x] Vendor Risk Assessment (for third-party services)
## Evidence to Collect
- [x] Access control logs (monthly export)
- [x] Deployment change logs (monthly export)
- [x] Vulnerability scan results (monthly)
- [x] Security training completion records (annual)
- [x] Incident investigation reports (as they occur)
- [x] Access reviews documentation (quarterly)Current State: No performance optimization or cost tracking.
Target State: Optimized infrastructure with predictable costs and performance headroom.
Acceptance Criteria:
- Database query performance audited
- All queries have explain plans reviewed
- No N+1 queries identified
- Appropriate indexes created for hot queries
- Query performance baseline established
- Decision service latency optimized
- Baseline latency established (p95)
- Hot paths identified (profiling)
- Optimization candidates prioritized by impact
- Latency regression detected if optimization causes slowdown
- Auth service latency optimized
- Baseline latency established (p95)
- Async implementation compared to baseline
- No regression from baseline
- Memory usage optimized
- Memory usage per request measured
- Memory leaks detected and fixed
- Memory trending tracked over time
- Rate limiter efficiency
- Token bucket operations measured
- LRU eviction performance measured
- No memory leaks in eviction logic
Optimization Checklist:
-- Query Analysis
EXPLAIN ANALYZE SELECT * FROM decisions WHERE created_at > NOW() - INTERVAL '7 days';
-- Review plan, identify if indexes needed
-- Index Creation (based on analysis)
CREATE INDEX idx_decisions_created_at ON decisions(created_at DESC);
CREATE INDEX idx_decisions_actor ON decisions(actor);
CREATE INDEX idx_decisions_decision ON decisions(decision);
-- Memory Profiling
python -m memory_profiler decision_service.pyAcceptance Criteria:
- Web dashboard (if applicable) performance measured
- Page load time baseline established
- Core Web Vitals (LCP, FID, CLS) measured
- Regression detection if changes degrade performance
- API response caching implemented
- Cache strategy defined for each endpoint
- Cache invalidation logic implemented
- Cache hit rate measured
- JavaScript bundle size optimized
- Bundle size baseline established
- Lazy loading of non-critical code implemented
- Regression detection if bundle size increases
- Image optimization implemented
- Modern formats (WebP where supported)
- Appropriate sizing for viewport
- Compression applied
Acceptance Criteria:
- Baseline costs tracked and understood
- Current infrastructure costs documented by component
- Resource utilization measured (CPU, memory, storage)
- Unused resources identified and removed
- Cost optimization opportunities identified
- Instance sizing reviewed for oversizing
- Spot instances evaluated for non-critical workloads
- Reserved instances evaluated for baseline load
- Cost per unit of work tracked
- Cost trending & controls
- Monthly spend visible in dashboard
- Cost anomalies detected and investigated
- Cost trending over time visible
- Cost-efficiency improvements
- Optimization candidates prioritized by impact
- Trade-offs documented (performance vs cost)
- Changes tested before production deployment
Cost Analysis Template:
Baseline Costs (monthly):
Kubernetes cluster:
- Compute: [measured cost]
- Storage: [measured cost]
Database:
- Primary: [measured cost]
- Backups: [measured cost]
Monitoring:
- [measured cost]
Storage:
- [measured cost]
Total: [sum of above]
Optimization Opportunities:
1. [Specific inefficiency]
- Potential savings: [quantified]
- Risk: [if any]
- Mitigation: [if applicable]
- Effort: [estimated]
2. [Another inefficiency]
- Potential savings: [quantified]
- Risk: [if any]
- Mitigation: [if applicable]
- Effort: [estimated]
Implementation: Prioritize by impact/effort ratio- GitHub Actions pipeline: test, build, deploy-staging, deploy-prod
- Infrastructure as Code: Terraform + Helm
- Automated staging deployments
- Manual prod deployments with approval
- Feature flags: 3 configured flags
- Rollback capability: automated and tested
- APM + Logging centralized (Datadog)
- Metrics dashboard with p50/p95/p99 latencies
- Alert rules configured (latency, errors, dependencies)
- Health checks: /health, /health/ready, /health/live
- SLA tracking: uptime measured and trended
- On-call runbooks: documented procedures
- SOC 2 Type II documentation complete
- GDPR compliance verified
- Disaster recovery: RTO/RPO defined and documented
- Backup restore procedures documented and tested
- All policies reviewed and approved
- Database queries audited (N+1 detection, indexes created)
- Decision service latency baseline established
- Auth service latency baseline established
- Memory usage measured per request
- Frontend performance baseline established
- Cost tracking implemented and baselined
| Phase | Component | Implementation Time | Notes |
|---|---|---|---|
| Phase 5 | GitHub Actions Pipeline | 2h | YAML + secrets config |
| Phase 5 | Infrastructure as Code | 3h | Terraform + Helm |
| Phase 5 | Automated Deployments | 1.5h | Scripts + config |
| Phase 5 | Feature Flags | 1.5h | Unleash integration |
| Phase 6 | APM & Logging | 2h | Datadog export config |
| Phase 6 | Metrics & Alerting | 2.5h | Dashboard + alert rules |
| Phase 6 | Health Checks | 1.5h | Endpoints + tracking |
| Phase 6 | On-Call Runbooks | Documentation | 5+ runbooks |
| Phase 7 | SOC 2 Documentation | Documentation | Policy templates |
| Phase 7 | GDPR Compliance | Documentation | Right-to-access/deletion |
| Phase 7 | Disaster Recovery | Documentation + Testing | RTO/RPO definition |
| Phase 8 | Backend Optimization | Profiling | Query analysis + indexing |
| Phase 8 | Cost Tracking | 1h | Dashboard setup |
Total Implementation (Code): ~15 hours Total Documentation: ~20 hours Total (Code + Docs): ~35 hours
Phase 5.1: GitHub Actions Pipeline (~2 hours)
Start with .github/workflows/test.yml:
- Trigger on push to main
- Run pytest with coverage
- Run linting (Black, isort, flake8)
- Fail if coverage <80% or linting fails
- Report results in PR
Acceptance: Single workflow file passes 10+ test runs without failures.