BeacnAI

BeacnAI is an open-source Site Reliability Engineering (SRE) agent framework.

It is designed to run continuously on a server, ingest incoming alerts (e.g. from Prometheus or Alertmanager), and autonomously investigate incidents using the exact same tools your human engineers use. What makes BeacnAI unique is its Learning Loop: after every incident, it reflects on what it found and updates its local memory files and runbooks, so it gets smarter and faster every time.

Core Features

Two-Tier Capability Model:
- Skills (Tier 1): Markdown-based runbooks that map to specific alerts using a fuzzy-matching context engine.
- Tools (Tier 2): Deterministic Python scripts that execute commands (e.g. querying Prometheus, searching Splunk, fetching K8s pods, reading recent GitHub commits, and running shell commands).
Persistent Memory:
- Uses an async SQLite database (~/.beacnai/state.db) in WAL mode with FTS5 search to store all incidents, reflections, and full-text session histories.
- Injects global context (INFRA.md) and per-service context directly into the agent prompt.
Provider-Agnostic: Out of the box support for OpenRouter (default), Anthropic, and local Ollama inference.

Setup and Installation

1. Requirements

Python 3.11+
uv (recommended) or pip

2. Install

Clone the repository and install the dependencies:

uv sync

Or using standard pip:

pip install -e .

The project installs official provider SDKs for OpenRouter, Anthropic, OpenAI, and Ollama through the package dependencies.

3. Configuration

Create a .env file in the root directory. You can use the provided .env.example as a template.

cp .env.example .env

Ensure you set your desired LLM Provider:

BEACNAI_PROVIDER=openrouter
BEACNAI_MODEL=anthropic/claude-3.5-sonnet
OPENROUTER_API_KEY=your_key_here

You can also set a custom database location:

BEACNAI_DB_PATH=/path/to/state.db

Configure your tools by adding your Prometheus URLs, GitHub tokens, etc. The tools will auto-discover which ones are available based on what variables you provide!

Usage

Check System Status

Ensure your provider is connected and see which tools successfully loaded:

python -m beacnai.main status

Run an Investigation Manually

You can kick off a manual investigation via the CLI. The agent will execute a ReAct loop, call tools, output the Root Cause Analysis, and run its learning loop to save insights.

python -m beacnai.main investigate "We have a payment latency spike in production."

Run the Webhook Server and Optional Cron Scheduler

Start BeacnAI as a persistent webhook ingestion service. Use --enable-cron to enable scheduled jobs, or set BEACNAI_CRON_ENABLED=true.

python -m beacnai.main serve --host 0.0.0.0 --port 3001 --enable-cron --cron-schedule "0 9 * * MON-FRI"

Enable Gateway Mode

If you want BeacnAI to accept generic alert payloads behind an API gateway, enable gateway mode. Then POST to /gateway with X-Alert-Source set to the alert source or include a source field in JSON.

BEACNAI_GATEWAY_MODE=true python -m beacnai.main serve --enable-cron

Example gateway request:

curl -X POST http://localhost:3001/gateway \
  -H "Content-Type: application/json" \
  -H "X-Alert-Source: prometheus" \
  -d '{"alerts":[{"labels":{"alertname":"HighCpu","severity":"critical","job":"web"}}]}'

Add Custom Runbooks (Skills)

You can teach the agent new skills by dropping a markdown file into the skills/ directory. BeacnAI will use fuzzy matching to automatically select the right skill when an alert fires.

Example: skills/sre/latency/SKILL.md

---
name: payment-latency
type: skill
description: Investigates high latency on the payment service.
triggers:
  alert_names:
    - HighLatency
    - PaymentLatencySpike
  keywords:
    - payment latency
---
# Payment Latency Runbook

Check the `fraud-service` first as it is a common bottleneck.

Architecture Summary

Ingestion: An aiohttp webhook listener handles Prometheus alerts.
Context Engine: Assembles INFRA.md, per-service .md, and matches the alert to a specific SKILL.md.
ReAct Loop: The agent uses an LLM (via OpenRouter/Ollama) to autonomously select tools and investigate the issue.
Learning Loop: A post-incident review is executed and recorded into SQLite, with insights appended to memory.
Output: The final RCA is routed to an #incidents Slack channel via SLACK_BOT_TOKEN. Invite the bot to the target channel before using it.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
beacnai		beacnai
scripts		scripts
skills		skills
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
SPEC.md		SPEC.md
beacnai.yaml		beacnai.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BeacnAI

Core Features

Setup and Installation

1. Requirements

2. Install

3. Configuration

Usage

Check System Status

Run an Investigation Manually

Run the Webhook Server and Optional Cron Scheduler

Enable Gateway Mode

Add Custom Runbooks (Skills)

Architecture Summary

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BeacnAI

Core Features

Setup and Installation

1. Requirements

2. Install

3. Configuration

Usage

Check System Status

Run an Investigation Manually

Run the Webhook Server and Optional Cron Scheduler

Enable Gateway Mode

Add Custom Runbooks (Skills)

Architecture Summary

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages