HelixFlow AI inference platform - enterprise-grade microservices providing OpenAI-compatible API access to AI models.
HelixFlow is a comprehensive AI inference platform built as a set of independent Go microservices with Python-based testing and SDKs. It exposes an OpenAI-compatible REST API for chat completions and model management, with gRPC-based internal service communication.
Core Services:
- API Gateway (
api-gateway/) - HTTPS/WebSocket entry point, OpenAI-compatible REST API, rate limiting, request routing - Auth Service (
auth-service/) - JWT authentication, user management, API keys, token revocation - Inference Pool (
inference-pool/) - GPU resource management, model caching, inference execution (mock engine) - Monitoring Service (
monitoring/) - System metrics, alerting, predictive scaling recommendations
Technology Stack:
- Go 1.22.2 - All microservices
- Python 3.11+ - Testing framework, client SDK
- gRPC + Protocol Buffers - Internal service communication
- SQLite (development) / PostgreSQL (production) - Primary database
- Redis - Caching and rate limiting
- Prometheus + Grafana - Metrics and dashboards
- Kubernetes + Helm + Terraform - Deployment infrastructure
Protocols:
- HTTP/HTTPS (external), WebSocket (streaming)
- gRPC (internal service-to-service)
- TLS 1.3 for external, mTLS between internal services
HelixFlow/
├── api-gateway/ # API Gateway service (Go)
│ ├── src/ # Go source files (package main)
│ ├── bin/ # Compiled binary output
│ └── requirements.txt # Python deps (FastAPI placeholder)
├── auth-service/ # Authentication service (Go)
│ ├── src/ # Go source files
│ ├── bin/ # Compiled binary output
│ └── requirements.txt # Python deps (FastAPI placeholder)
├── inference-pool/ # AI inference service (Go)
│ ├── src/ # Go source files
│ ├── bin/ # Compiled binary output
│ └── requirements.txt # Python deps (FastAI/ML placeholder)
├── monitoring/ # Monitoring service (Go)
│ ├── src/ # Go source files
│ ├── bin/ # Compiled binary output
│ └── requirements.txt # Python deps (FastAPI placeholder)
├── internal/ # Shared internal libraries
│ └── database/ # Database abstraction (SQLite + PostgreSQL)
├── helixflow/ # Generated gRPC protobuf Go modules
│ ├── auth/ # Auth service protobuf stubs
│ ├── inference/ # Inference service protobuf stubs
│ └── monitoring/ # Monitoring service protobuf stubs
├── proto/ # Protocol buffer definitions (.proto files)
├── tests/ # Comprehensive test suite
│ ├── conftest.py # Pytest configuration and fixtures
│ ├── complete_testing_framework.py
│ ├── final_validation_suite.py
│ ├── integration/ # End-to-end integration tests
│ ├── contract/ # API contract and compliance tests
│ ├── security/ # Penetration testing
│ ├── performance/ # Load testing
│ └── unit/ # Unit tests
├── scripts/ # Deployment and management shell scripts
├── sdks/ # Client SDKs
│ └── python/ # Python SDK (HTTP client)
├── k8s/ # Kubernetes manifests
├── helm/ # Helm charts
├── terraform/ # Infrastructure as code (AWS, Azure, GCP)
├── schemas/ # Database schemas and configs
├── certs/ # TLS certificates and JWT keys
├── data/ # SQLite database files
├── logs/ # Service logs
├── nginx/ # Nginx reverse proxy configuration
├── docs/ # Project documentation
├── docker-compose.yml # Local development stack
├── requirements-master.txt # Python testing dependencies
└── .github/workflows/ # GitHub Actions CI/CD
Each service is an independent Go module with local replace directives for shared libraries.
# Build all services
cd api-gateway/src && go build -o ../bin/api-gateway .
cd auth-service/src && go build -o ../bin/auth-service .
cd inference-pool/src && go build -o ../bin/inference-pool .
cd monitoring/src && go build -o ../bin/monitoring .
# Build shared libraries
cd internal/database && go build .
cd helixflow/auth && go build .
cd helixflow/inference && go build .
cd helixflow/monitoring && go build .
# Clean builds
find . -name "bin" -type d -exec rm -rf {} + 2>/dev/null
find . -name "*.out" -deleteGo Module Structure:
| Module | Path | Key Dependencies |
|---|---|---|
helixflow/api-gateway |
api-gateway/src/go.mod |
gorilla/mux, gorilla/websocket, go-redis/v9, grpc |
helixflow/auth-service |
auth-service/src/go.mod |
golang-jwt/jwt/v5, google/uuid, golang.org/x/crypto, grpc |
helixflow/inference-pool |
inference-pool/src/go.mod |
grpc only |
helixflow/monitoring-service |
monitoring/src/go.mod |
go-redis/v9, grpc |
helixflow/database |
internal/database/go.mod |
lib/pq, go-sqlite3, go-redis/v9 |
helixflow/auth |
helixflow/auth/go.mod |
grpc, protobuf |
helixflow/inference |
helixflow/inference/go.mod |
minimal |
helixflow/monitoring |
helixflow/monitoring/go.mod |
minimal |
All services use replace directives to reference local helixflow/ and internal/database modules.
# Install all test dependencies
pip install -r requirements-master.txt
# Build Python SDK
cd sdks/python && python setup.py build
# Create virtual environment
python -m venv venv
source venv/bin/activate
pip install -r requirements-master.txt# Generate Go code from .proto files
protoc --go_out=. --go_opt=paths=source_relative \
--go-grpc_out=. --go-grpc_opt=paths=source_relative \
proto/*.protoGenerated code is committed to helixflow/{auth,inference,monitoring}/.
| Category | Directory | Description |
|---|---|---|
| Unit | tests/unit/ |
API Gateway mock-based unit tests |
| Integration | tests/integration/ |
End-to-end service integration, auth, OpenAI compat, compliance, scalability, service mesh, multicloud |
| Contract | tests/contract/ |
API compliance, performance SLAs, security contracts, infrastructure validation |
| Security | tests/security/ |
Penetration testing (SQLi, XSS, CSRF, SSRF, brute force, headers) |
| Performance | tests/performance/ |
Async load testing with aiohttp |
# Run all integration tests via bash script (starts services automatically)
./scripts/test_integration.sh
# Run pytest test categories
python -m pytest tests/unit/ -v
python -m pytest tests/integration/ -v
python -m pytest tests/contract/ -v
python -m pytest tests/security/ -v
python -m pytest tests/performance/ -v
# Run specific test files
python -m pytest tests/integration/test_auth.py -v
python -m pytest tests/security/test_security_pentest.py -v
# Root-level Python test scripts
python3 test_auth_api.py
python3 test_revocation_now.py
python3 final_integration_test.py
# Quick smoke test (starts monitoring + gateway only)
./scripts/quick_test.sh
# Suppress TLS verification warnings in tests
export PYTHONWARNINGS="ignore:Unverified HTTPS request"Important Notes:
- Integration tests target
localhost(not Kubernetes service URLs) - All HTTPS tests use
verify=Falsedue to self-signed certificates - The
test_rate_limiting_integrationtest may fail without Redis - Services must be pre-built in
./{service}/bin/directories for bash-based test scripts
tests/conftest.py defines custom markers (integration, contract, security, performance) and fixtures:
test_config: Service URLs (https://localhost:8443,http://localhost:8082, etc.)sample_chat_request: GPT-3.5-turbo request templatesample_auth_credentials: testuser/testpass123
Clients → API Gateway (HTTPS 8443)
│
├──gRPC──→ Auth Service (gRPC 8081 / HTTP 8082)
│ │
│ └──→ Database (SQLite/PostgreSQL)
│
├──gRPC──→ Inference Pool (gRPC 50051)
│
└──gRPC──→ Monitoring (gRPC / HTTP 8083)
| Service | Protocol | Port | Notes |
|---|---|---|---|
| API Gateway | HTTPS | 8443 | TLS 1.3; falls back to HTTP if no certs |
| API Gateway | HTTP | 8080 | HTTP fallback / Docker Compose mode |
| Auth Service | gRPC | 8081 | Internal auth validation |
| Auth Service | HTTP | 8082 | REST endpoints for login/register |
| Inference Pool | gRPC | 50051 | Model inference and streaming |
| Monitoring | HTTP/gRPC | 8083 | Metrics and health |
| PostgreSQL | SQL | 5432 | Production database |
| Redis | Redis | 6379 | Caching and rate limiting |
| Prometheus | HTTP | 9091 | Metrics collection |
| Grafana | HTTP | 3000 | Dashboards |
# Database
DATABASE_TYPE=sqlite|postgres
DATABASE_PATH=../data/helixflow.db
DATABASE_URL=postgres://user:pass@host:5432/db
# TLS Configuration
TLS_CERT="../certs/api-gateway.crt"
TLS_KEY="../certs/api-gateway-key.pem"
# Service URLs
INFERENCE_POOL_URL="localhost:50051"
AUTH_SERVICE_GRPC="localhost:8081"
AUTH_SERVICE_URL="localhost:8081"
# Ports
PORT="8443" # API Gateway HTTPS
HTTP_PORT="8082" # Auth Service HTTP
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379# Start all services with proper configuration
./start_all_services.sh
# Start service groups
./start_phase1_services.sh # Core services
./start_phase2_services.sh # Advanced services
# Development environment startup (kills existing, starts all, saves PIDs)
./scripts/start_development.sh
# Stop all services
kill $(cat logs/service_pids.txt) 2>/dev/null# Auth service (gRPC 8081, HTTP 8082)
cd auth-service && HTTP_PORT=8082 PORT=8081 DATABASE_TYPE=sqlite DATABASE_PATH=../data/helixflow.db ./bin/auth-service
# Inference pool (gRPC 50051)
cd inference-pool && PORT=50051 ./bin/inference-pool
# API gateway (TLS 8443)
cd api-gateway && TLS_CERT="../certs/api-gateway.crt" TLS_KEY="../certs/api-gateway-key.pem" INFERENCE_POOL_URL=localhost:50051 AUTH_SERVICE_GRPC=localhost:8081 PORT=8443 ./bin/api-gateway
# Monitoring service (gRPC + HTTP 8083)
cd monitoring && PORT=8083 ./bin/monitoring# Start database and infrastructure services only
docker-compose up -d postgres redis prometheus grafana nginx
# Generate TLS certificates
./scripts/generate_certificates.sh
# Setup databases
./scripts/setup_sqlite_database.sh
./scripts/setup_postgresql.shNote: docker-compose.yml references Dockerfiles in service directories (api-gateway/Dockerfile, etc.), but these Dockerfiles do not currently exist in the repository.
Each Go service uses package main with all source files in a single src/ directory. There is no strict subdirectory separation into handlers/, services/, repositories/.
API Gateway (api-gateway/src/):
main.go- Primary HTTP gateway: gorilla/mux router, OpenAI-compatible REST API (/v1/chat/completions,/v1/models), health checks, auth via gRPC, rate limiting, TLS 1.3 with HTTP fallback, mock inference fallbackmain_grpc.go- Alternative enhanced gRPC gateway (APIGatewayGRPC): mTLS connections to all backend services, uses gRPC for inference callsinference_handler.go-InferenceHandler: wraps gRPC calls to inference pool, converts OpenAI format to/from protobufwebsocket_handler.go- WebSocket manager with connection pooling, ping/pong, broadcast, simulated streamingrate_limiter_advanced.go- Token bucket, sliding window, and fixed window rate limiting with Redis
Auth Service (auth-service/src/):
main.go- Dual-server entry: HTTP REST (port 8082) in goroutine, gRPC (port 8081) in main threadauth_service.go-AuthServiceServer: RSA key pair generation, bcrypt password hashing, JWT access/refresh tokens (RS256, UUID v4 JTI), token revocation blacklist with expiry cleanup, rate limiting, API key managementhttp_handler.go-AuthHTTPServer: wraps gRPC methods into HTTP endpoints (/login,/refresh,/revoke,/register,/health)
Inference Pool (inference-pool/src/):
main.go- gRPC server:GPUManager(4 mock GPUs),ModelCache(LRU),InferenceEngine(mock responses), job queue with 10 workers, streaming inferenceinference_engine.go- Mock inference with keyword-based contextual responses forgpt-3.5-turbo,gpt-4,claude-v1,llama-2-70bgpu_optimizer.go- GPU scheduling (best-fit, first-fit, round-robin, least-loaded), memory defragmentation, model evictionquantization.go- Model quantization simulator (4-bit, 8-bit, 16-bit)
Monitoring Service (monitoring/src/):
main.go- Entry point: gRPC server + HTTP server (port 8083)main_grpc.go-MonitoringServiceServer: mock system metrics, GPU metrics, alert rule CRUD, alert acknowledgment, predictive scaling recommendations, gRPC health checkingpredictive_scaling.go-PredictiveScaler: linear regression on CPU history in Redis, recommends replica counts (min 3, max 20)
internal/database/ - Database abstraction layer:
interfaces.go-DatabaseManagerinterfacesqlite_manager.go- SQLite implementation (development default)postgres_manager.go/postgres_advanced.go- PostgreSQL implementationconfig.go- Database configuration
helixflow/{auth,inference,monitoring}/ - Generated gRPC/protobuf Go code:
*.pb.go- Message structs*_grpc.pb.go- gRPC client/server interfaces
sdks/python/helixflow/:
client.py-HelixFlowHTTP client class with Bearer auth:chat_completion(),chat_completion_stream(),list_models(),get_model()exceptions.py-HelixFlowError,AuthenticationError,RateLimitError,APIError__init__.py- Package exports
.github/workflows/ci-cd.yml defines a pipeline triggered on push to main/develop and PRs to main:
- test - Go tests with coverage (uses Redis + PostgreSQL services), uploads to Codecov
- security-scan - Trivy vulnerability scanner, uploads SARIF to GitHub CodeQL
- lint - golangci-lint with 5m timeout
- build - Builds all 4 Go binaries + Docker images (needs test+scan+lint)
- deploy-dev - Triggered on
developbranch (placeholder ArgoCD sync) - deploy-staging - Triggered on
mainbranch (placeholder) - deploy-prod - Triggered on
mainpush after staging (placeholder)
Note: CI uses Go 1.21 but the project requires Go 1.22.2. The pipeline is Go-centric and does not run Python pytest tests.
k8s/ contains 18 manifests for production infrastructure:
api-gateway.yaml- Deployment + Service for API Gatewayargocd-config.yaml- ArgoCD GitOps setup with Application CRDscert-manager.yaml- Let's Encrypt certificate provisioningconsul-config.yaml- Service mesh configurationelasticsearch-config.yaml/logstash-config.yaml- Logging stackexternal-secrets.yaml- AWS Secrets Manager integrationgpu-operator.yaml- NVIDIA GPU Operator + AMD ROCmgrafana-dashboards.yaml- Dashboard ConfigMapsistio-config.yaml/istio-policies.yaml- Istio service mesh, STRICT mTLSkustomization.yaml- Kustomize manifestmtls-config.yaml- TLS secrets, PeerAuthentication, AuthorizationPolicyprometheus-config.yaml/prometheus-alert-rules.yaml- Monitoring and alertingsentry-config.yaml- Sentry error tracking
Note: Only api-gateway.yaml exists as a direct service manifest. K8s manifests for auth-service, inference-pool, and monitoring deployments are referenced by ArgoCD but do not exist as separate files.
helm/helixflow/ contains Chart.yaml (v1.0.0) and values.yaml but no template files in templates/.
Multi-cloud infrastructure in terraform/{aws,azure,gcp}/:
- AWS: EKS (v1.28), RDS PostgreSQL 15.4, ElastiCache Redis, GPU node groups (g4dn)
- Azure: AKS (v1.28), PostgreSQL Flexible Server, Azure Cache for Redis, GPU nodes (NC6s_v3)
- GCP: GKE, Cloud SQL PostgreSQL 15, Memorystore Redis, GPU nodes (n1 + K80)
Note: Terraform configs reference local modules (../modules/vpc, etc.) that do not exist in the repository.
| Script | Purpose |
|---|---|
scripts/production-deployment.sh |
Full production deployment orchestration |
scripts/production-validation.sh |
Pre-production validation |
scripts/final-validation.sh |
Comprehensive validation suite |
scripts/quality-gates.sh |
CI-like quality gate checks |
scripts/setup_production_infrastructure.sh |
Complete production infrastructure setup |
scripts/generate_certificates.sh |
Full PKI generation (CA, per-service certs, JWT RSA keys) |
- TLS 1.3 for all external communications
- mTLS between internal services using certificates in
certs/ - JWT Authentication with RS256 signing (2048-bit RSA keys generated at runtime)
- Token Revocation via in-memory blacklist with expiration cleanup
- Rate Limiting on auth validation (configurable windows)
- UUID v4 validation for JWT JTI claims
- Password Hashing with bcrypt
- API Keys with hashed storage and prefix identification
- Istio STRICT mTLS enforced in Kubernetes
- Self-signed certificates used for local development
Certificates are stored in certs/:
- CA certificates (
ca.pem,helixflow-ca.pem) - Per-service server certificates (
api-gateway.crt,api-gateway-key.pem, etc.) - Per-service client certificates for mTLS
- PKCS12 files (password:
helixflow123) - JWT signing keys (
jwt-private.pem,jwt-public.pem)
Generation: ./scripts/generate_certificates.sh or ./certs/generate-certificates.sh
export DATABASE_TYPE=sqlite
export DATABASE_PATH=../data/helixflow.db
./scripts/setup_sqlite_database.shexport DATABASE_TYPE=postgres
export DATABASE_URL="postgres://user:pass@localhost:5432/helixflow"
./scripts/setup_postgresql.sh| File | Description |
|---|---|
schemas/postgresql-helixflow.sql |
Original schema with 9 core tables |
schemas/postgresql-helixflow-complete.sql |
Expanded schema with RBAC, API keys, inference logs, usage tracking |
schemas/postgresql-helixflow-updated.sql |
Clean DDL with sample data |
schemas/migrate.sh |
Migration script with generate/run/rollback/status commands |
schemas/redis-cluster.conf |
Redis cluster configuration |
schemas/neo4j-config.properties |
Neo4j knowledge graph config |
schemas/qdrant-config.yaml |
Qdrant vector DB config |
- Missing Dockerfiles -
docker-compose.ymland CI/CD reference Dockerfiles in service directories, but none exist. - Missing Helm Templates -
helm/helixflow/hasChart.yamlandvalues.yamlbut notemplates/directory. - Missing K8s Service Manifests - Only
k8s/api-gateway.yamlexists; manifests for auth-service, inference-pool, and monitoring are missing. - Missing Terraform Modules - Referenced local modules (
../modules/vpc, etc.) do not exist. - CI/CD Go Version Mismatch -
.github/workflows/ci-cd.ymluses Go 1.21, butgo.modfiles require 1.22.2. - Mock Inference Engine - The inference pool generates keyword-based mock responses; it does not connect to real LLM backends.
- Database Abstraction Gap - The auth service may default to SQLite regardless of
DATABASE_TYPEin some code paths. - No Node.js/Rust - The project is purely Go + Python. The VS Code extension in
extensions/vscode/may containpackage.jsonbut is not part of the core platform.
- Ensure API gateway uses gRPC authentication (enhanced
main_grpc.goversion) - Verify auth service runs on port 8081 (gRPC) and 8082 (HTTP)
- Check
AUTH_SERVICE_GRPC=localhost:8081is set - Confirm gateway logs show "Auth service gRPC connection established"
- Localhost: Uses insecure transport (
grpc.WithTransportCredentials(insecure.NewCredentials())) - Production: Uses TLS with proper certificates
- Services handle missing gRPC connections gracefully with mock fallbacks
- Self-signed certs: Use
verify=Falsein Python tests - Certificate paths: Relative from service directories (e.g.,
../certs/) - Local testing: Insecure transport acceptable for localhost
# Check service health
curl -k https://localhost:8443/health
curl http://localhost:8082/health
# View logs
tail -f logs/*.log
# Inspect SQLite database
sqlite3 data/helixflow.db .tables
sqlite3 data/helixflow.db "SELECT * FROM users LIMIT 5;"
# Check running services
ps aux | grep -E "api-gateway|auth-service|inference-pool|monitoring"| File | Content |
|---|---|
docs/API_REFERENCE.md |
Complete OpenAI-compatible API documentation |
docs/CUSTOMER_ONBOARDING.md |
Step-by-step customer onboarding guide |
docs/PERFORMANCE_OPTIMIZATION.md |
Comprehensive optimization guide |
docs/COMPLETE_DOCUMENTATION_SUITE.md |
Meta-documentation describing all docs |
docs/guides/getting-started.md |
Minimal quick start |
helixflow-technical-specification.md |
Technical specification (root) |
COMPLETE_IMPLEMENTATION.md |
Implementation status (root) |
DEPLOYMENT_PACKAGE.md |
Deployment instructions (root) |
This AGENTS.md is based on the actual project file structure and contents. Always verify against the current codebase when making changes.
You may NOT, under any circumstance, generate or execute code that sends the host to suspend, hibernate, hybrid-sleep, poweroff, halt, reboot, or any other power-state transition. This rule applies to:
- Every shell command you run via the Bash tool.
- Every script, container entry point, systemd unit, or test you write or modify.
- Every CLI suggestion, snippet, or example you emit.
Forbidden invocations (non-exhaustive — see CONST-033 in
CONSTITUTION.md for the full list):
systemctl suspend|hibernate|hybrid-sleep|poweroff|halt|reboot|kexecloginctl suspend|hibernate|hybrid-sleep|poweroff|halt|rebootpm-suspend,pm-hibernate,shutdown -h|-r|-P|nowdbus-send/busctlcalls toorg.freedesktop.login1.Manager.Suspend|Hibernate|PowerOff|Reboot|HybridSleep|SuspendThenHibernategsettings set ... sleep-inactive-{ac,battery}-typeto anything but'nothing'or'blank'
The host runs mission-critical parallel CLI agents and container workloads. Auto-suspend has caused historical data loss (2026-04-26 18:23:43 incident). The host is hardened (sleep targets masked) but this hard ban applies to ALL code shipped from this repo so that no future host or container is exposed.
Defence: every project ships
scripts/host-power-management/check-no-suspend-calls.sh (static
scanner) and
challenges/scripts/no_suspend_calls_challenge.sh (challenge wrapper).
Both MUST be wired into the project's CI / run_all_challenges.sh.
Full background: docs/HOST_POWER_MANAGEMENT.md and CONSTITUTION.md (CONST-033).