AutoRCA-Core (ADAPT-RCA)

Agentic Root Cause Analysis engine for AI-powered autonomous reliability, SRE, and support.

AutoRCA-Core is a graph-based RCA engine that analyzes logs, metrics, traces, configs, and documentation to automatically identify root causes and recommend remediation steps. It's designed as a reference architecture for building autonomous operations and reliability agents.

What This Is

AutoRCA-Core provides:

Multi-signal ingestion: Logs, metrics, distributed traces, and config changes
Graph-based topology: Builds service dependency graphs and causal relationships
Rule-based reasoning: Deterministic heuristics for identifying root causes
LLM integration (optional): Enhance analysis with natural language insights
Autonomous-first design: Built to be called by AI agents, UIs, and automation workflows

Key differentiators:

Graph-based causal analysis over temporal event correlation
Works offline with rules-only mode (no LLM required)
Designed for integration into larger autonomous ops stacks

Who This Is For

SRE teams investigating production incidents
DevOps engineers correlating failures across services
Platform teams building autonomous reliability agents
Architects designing AI-powered troubleshooting workflows

AutoRCA-Core is part of a broader autonomous operations ecosystem including:

awesome-autonomous-ops – Curated list of AI ops tools
Secure-MCP-Gateway – Security-first MCP gateway for ops tools
Ops-Agent-Desktop – Visual mission control for autonomous ops agents
ADAPT-Agents – Agent orchestration layer (companion repo)

Architecture Overview

AutoRCA-Core follows a layered architecture for clarity and extensibility:

┌─────────────────────────────────────────────────────────────┐
│                      CLI / API Layer                        │
│            (autorca CLI, Python API, MCP server)            │
└─────────────────────────────────────────────────────────────┘
                             │
┌─────────────────────────────────────────────────────────────┐
│                     Reasoning Layer                         │
│   ┌────────────┐  ┌────────────┐  ┌──────────────────┐    │
│   │   Rules    │  │  LLM (opt) │  │  Reasoning Loop  │    │
│   │ Heuristics │  │ Interface  │  │  Orchestration   │    │
│   └────────────┘  └────────────┘  └──────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                             │
┌─────────────────────────────────────────────────────────────┐
│                    Graph Engine Layer                       │
│   ┌─────────────────────┐  ┌─────────────────────────┐    │
│   │   Graph Builder     │  │   Graph Queries         │    │
│   │ (topology + events) │  │ (causal chains, RCA)    │    │
│   └─────────────────────┘  └─────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                             │
┌─────────────────────────────────────────────────────────────┐
│                    Ingestion Layer                          │
│   ┌──────┐  ┌─────────┐  ┌────────┐  ┌─────────────┐     │
│   │ Logs │  │ Metrics │  │ Traces │  │ Configs     │     │
│   └──────┘  └─────────┘  └────────┘  └─────────────┘     │
└─────────────────────────────────────────────────────────────┘
                             │
┌─────────────────────────────────────────────────────────────┐
│                Data Sources (files, APIs, streams)          │
└─────────────────────────────────────────────────────────────┘

Key concepts:

Service Graph: Topology of services and dependencies inferred from traces
Incident Nodes: Anomalies detected (error spikes, latency, resource exhaustion)
Causal Chains: Dependency paths showing how failures propagate
Root Cause Candidates: Ranked list with confidence scores and evidence

Quickstart

Prerequisites

Python 3.10+
(Optional) OpenAI or Anthropic API key for LLM-enhanced summaries

Installation

# Clone the repository
git clone https://github.com/nik-kale/AutoRCA-Core.git
cd AutoRCA-Core

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install the package
pip install -e .

# Or install with LLM support
pip install -e ".[llm]"

Run the Quickstart Example

autorca quickstart

This runs RCA on synthetic data simulating a database connection pool exhaustion incident. You'll see:

Root cause identified: PostgreSQL connection saturation
Causal chain: postgres → user-service → api-gateway → frontend
Remediation: Scale connection pool, check for leaks

Run on Your Own Data

autorca run \
  --logs /path/to/logs \
  --metrics /path/to/metrics \
  --symptom "Checkout API returning 500 errors" \
  --output report.md

Supported formats:

Logs: JSON Lines, plain text (auto-parsed)
Metrics: CSV, JSON Lines
Traces: OpenTelemetry JSON, Jaeger JSON
Configs: JSON, YAML (deployment/config change events)

Usage as a Library

from datetime import datetime
from autorca_core import run_rca, DataSourcesConfig, AnthropicLLM

# Define the incident time window
window = (
    datetime(2025, 11, 10, 10, 0, 0),
    datetime(2025, 11, 10, 10, 5, 0),
)

# Configure data sources
sources = DataSourcesConfig(
    logs_dir="./logs",
    metrics_dir="./metrics",
    traces_dir="./traces",
)

# Run RCA
result = run_rca(
    incident_window=window,
    primary_symptom="API 500 errors",
    data_sources=sources,
)

# Access results
print(f"Top root cause: {result.root_cause_candidates[0].service}")
print(f"Confidence: {result.root_cause_candidates[0].confidence:.0%}")
print(result.summary)

With LLM Enhancement (Anthropic Claude)

import os
from autorca_core import run_rca, DataSourcesConfig, AnthropicLLM

# Initialize Anthropic LLM (requires ANTHROPIC_API_KEY env var)
llm = AnthropicLLM(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
)

# Run RCA with LLM enhancement
result = run_rca(
    incident_window=window,
    primary_symptom="API 500 errors",
    data_sources=sources,
    llm=llm,  # Add LLM for enhanced summaries
)

# Get comprehensive AI-generated analysis
print(result.summary)  # Structured RCA with executive summary, impact assessment, and remediation

# Check token usage and costs
stats = llm.get_usage_stats()
print(f"Tokens used: {stats['total_tokens']}, Cost: ${stats['total_cost_usd']:.4f}")

How This Fits an Autonomous Ops Stack

AutoRCA-Core is designed to be a composable building block in AI-powered operations workflows:

Integration Patterns

Agent-driven troubleshooting
- Autonomous agents (e.g., from ADAPT-Agents) call AutoRCA-Core to investigate incidents
- RCA results guide next actions: gather more data, escalate, or remediate
MCP exposure via Secure-MCP-Gateway
- Expose AutoRCA-Core as an MCP tool for Claude Desktop, Ops-Agent-Desktop, or other MCP clients
- Enable AI assistants to perform RCA with policy controls and human-in-the-loop approvals
Visual investigation in Ops-Agent-Desktop
- Ops-Agent-Desktop calls AutoRCA-Core and visualizes causal graphs in real-time
- Shows live incident timelines and reasoning steps
Runbook automation
- Use AutoRCA-Core to detect root causes, then trigger automated remediation via Ansible, Terraform, or K8s operators

Project Structure

AutoRCA-Core/
├── autorca_core/              # Main package
│   ├── ingestion/             # Data loaders (logs, metrics, traces, configs)
│   ├── model/                 # Data models (events, graphs)
│   ├── graph_engine/          # Graph construction and querying
│   ├── reasoning/             # RCA logic (rules, LLM, loop)
│   ├── outputs/               # Report generation (markdown, JSON, HTML)
│   └── cli/                   # CLI interface
├── examples/                  # Example data and scenarios
│   └── quickstart_local_logs/ # Quickstart synthetic data
├── tests/                     # Test suite
├── docs/                      # Architecture and usage docs
├── pyproject.toml             # Package configuration
├── README.md                  # This file
└── LICENSE                    # MIT license

Extending AutoRCA-Core

AutoRCA-Core is designed for extensibility:

Add Custom Parsers

Implement custom log/metric parsers by extending ingestion modules:

# autorca_core/ingestion/custom_parser.py
from autorca_core.model.events import LogEvent

def parse_custom_format(line: str) -> LogEvent:
    # Your parsing logic
    ...

Add Custom Rules

Add domain-specific heuristics:

# autorca_core/reasoning/custom_rules.py
from autorca_core.reasoning.rules import RootCauseCandidate

def rule_custom_pattern(graph):
    # Detect custom incident patterns
    ...
    return [RootCauseCandidate(...)]

Integrate Custom LLMs

Implement the LLMInterface protocol:

from autorca_core.reasoning.llm import LLMInterface

class MyCustomLLM:
    def summarize_rca(self, graph, candidates, symptom):
        # Call your LLM
        ...

Roadmap

Contributing

Contributions are welcome! This project aims to be a reference architecture for autonomous ops tools.

How to contribute:

Open issues for bugs or feature requests
Submit PRs for parsers, heuristics, or integrations
Share anonymized incident examples for testing
Suggest improvements to the reasoning engine

See CONTRIBUTING.md for guidelines.

Security and Safety

AutoRCA-Core performs read-only analysis by default. It does not execute commands or modify systems.

For production use:

Validate data sources: Ensure logs/metrics are from trusted sources
Sanitize sensitive data: Remove PII, secrets, and credentials before analysis
Use Secure-MCP-Gateway: When exposing AutoRCA-Core as a tool, use policy controls and human approvals

License

MIT License - see LICENSE for details.

Acknowledgments

AutoRCA-Core draws inspiration from:

Academic research in fault localization and causal inference
Production RCA workflows at large-scale SaaS and cloud providers
The growing ecosystem of AI-powered operations tools

Built by Nik Kale as part of an open-source initiative to advance autonomous operations and reliability engineering.

Support

If you find AutoRCA-Core useful:

⭐ Star the repo to help others discover it
📢 Share it with your SRE, DevOps, and platform teams
🐛 Open issues with real-world scenarios (sanitized) to help improve the engine
🤝 Contribute parsers, rules, or integrations

For questions and discussions, open a GitHub issue.

AutoRCA-Core: Foundation for autonomous reliability agents. Graph-based RCA over logs, metrics, and traces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoRCA-Core (ADAPT-RCA)

What This Is

Who This Is For

Architecture Overview

Quickstart

Prerequisites

Installation

Run the Quickstart Example

Run on Your Own Data

Usage as a Library

With LLM Enhancement (Anthropic Claude)

How This Fits an Autonomous Ops Stack

Integration Patterns

Project Structure

Extending AutoRCA-Core

Add Custom Parsers

Add Custom Rules

Integrate Custom LLMs

Roadmap

Contributing

Security and Safety

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
autorca_core		autorca_core
docs		docs
examples		examples
src/adapt_rca		src/adapt_rca
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AutoRCA-Core (ADAPT-RCA)

What This Is

Who This Is For

Architecture Overview

Quickstart

Prerequisites

Installation

Run the Quickstart Example

Run on Your Own Data

Usage as a Library

With LLM Enhancement (Anthropic Claude)

How This Fits an Autonomous Ops Stack

Integration Patterns

Project Structure

Extending AutoRCA-Core

Add Custom Parsers

Add Custom Rules

Integrate Custom LLMs

Roadmap

Contributing

Security and Safety

License

Acknowledgments

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages