BeacnAI is an open-source Site Reliability Engineering (SRE) agent framework.
It is designed to run continuously on a server, ingest incoming alerts (e.g. from Prometheus or Alertmanager), and autonomously investigate incidents using the exact same tools your human engineers use. What makes BeacnAI unique is its Learning Loop: after every incident, it reflects on what it found and updates its local memory files and runbooks, so it gets smarter and faster every time.
- Two-Tier Capability Model:
- Skills (Tier 1): Markdown-based runbooks that map to specific alerts using a fuzzy-matching context engine.
- Tools (Tier 2): Deterministic Python scripts that execute commands (e.g. querying Prometheus, searching Splunk, fetching K8s pods, reading recent GitHub commits, and running shell commands).
- Persistent Memory:
- Uses an async SQLite database (
~/.beacnai/state.db) in WAL mode with FTS5 search to store all incidents, reflections, and full-text session histories. - Injects global context (
INFRA.md) and per-service context directly into the agent prompt.
- Uses an async SQLite database (
- Provider-Agnostic: Out of the box support for OpenRouter (default), Anthropic, and local Ollama inference.
- Python 3.11+
uv(recommended) orpip
Clone the repository and install the dependencies:
uv syncOr using standard pip:
pip install -e .The project installs official provider SDKs for OpenRouter, Anthropic, OpenAI, and Ollama through the package dependencies.
Create a .env file in the root directory. You can use the provided .env.example as a template.
cp .env.example .envEnsure you set your desired LLM Provider:
BEACNAI_PROVIDER=openrouter
BEACNAI_MODEL=anthropic/claude-3.5-sonnet
OPENROUTER_API_KEY=your_key_hereYou can also set a custom database location:
BEACNAI_DB_PATH=/path/to/state.dbConfigure your tools by adding your Prometheus URLs, GitHub tokens, etc. The tools will auto-discover which ones are available based on what variables you provide!
Ensure your provider is connected and see which tools successfully loaded:
python -m beacnai.main statusYou can kick off a manual investigation via the CLI. The agent will execute a ReAct loop, call tools, output the Root Cause Analysis, and run its learning loop to save insights.
python -m beacnai.main investigate "We have a payment latency spike in production."Start BeacnAI as a persistent webhook ingestion service. Use --enable-cron to enable scheduled jobs, or set BEACNAI_CRON_ENABLED=true.
python -m beacnai.main serve --host 0.0.0.0 --port 3001 --enable-cron --cron-schedule "0 9 * * MON-FRI"If you want BeacnAI to accept generic alert payloads behind an API gateway, enable gateway mode. Then POST to /gateway with X-Alert-Source set to the alert source or include a source field in JSON.
BEACNAI_GATEWAY_MODE=true python -m beacnai.main serve --enable-cronExample gateway request:
curl -X POST http://localhost:3001/gateway \
-H "Content-Type: application/json" \
-H "X-Alert-Source: prometheus" \
-d '{"alerts":[{"labels":{"alertname":"HighCpu","severity":"critical","job":"web"}}]}'You can teach the agent new skills by dropping a markdown file into the skills/ directory. BeacnAI will use fuzzy matching to automatically select the right skill when an alert fires.
Example: skills/sre/latency/SKILL.md
---
name: payment-latency
type: skill
description: Investigates high latency on the payment service.
triggers:
alert_names:
- HighLatency
- PaymentLatencySpike
keywords:
- payment latency
---
# Payment Latency Runbook
Check the `fraud-service` first as it is a common bottleneck. - Ingestion: An
aiohttpwebhook listener handles Prometheus alerts. - Context Engine: Assembles
INFRA.md, per-service.md, and matches the alert to a specificSKILL.md. - ReAct Loop: The agent uses an LLM (via OpenRouter/Ollama) to autonomously select tools and investigate the issue.
- Learning Loop: A post-incident review is executed and recorded into SQLite, with insights appended to memory.
- Output: The final RCA is routed to an
#incidentsSlack channel viaSLACK_BOT_TOKEN. Invite the bot to the target channel before using it.
MIT