Agentic Operations Intelligence Platform

Enterprise-style AI backend for operational ticket triage, incident investigation, evidence-grounded recommendations, and human-review routing.

This project demonstrates how an agentic AI system can decide whether an operational request requires SQL analysis, log search, document retrieval, rule-based validation, or human review. Unlike a basic RAG chatbot, the system performs a controlled decision workflow with tool routing, evidence verification, confidence scoring, trace logging, and benchmark evaluation.

Problem

Companies handle operational tickets, incident reports, service metrics, logs, policies, and runbooks every day.

A normal RAG chatbot can retrieve documents, but real operational decisions often require multiple evidence sources.

Example ticket:

EU customers are reporting payment failures after checkout. Should this be escalated?

A useful AI system should check:

- service metrics
- operational logs
- SLA policies
- escalation rules
- confidence level
- need for human review

What the System Does

Given a ticket or operational query, the system:

Classifies the request type
Plans which tools are required
Routes the request to selected tools
Collects evidence from SQL, logs, and documents
Applies escalation rules
Calculates confidence
Sends uncertain cases to human review
Stores a full trace of the decision

Architecture

FastAPI API
   |
   v
LangGraph Agent Workflow
   |
   +--> Task Classifier
   +--> Planner
   +--> Tool Router
          |
          +--> SQL Tool
          +--> Log Search Tool
          +--> RAG Tool
          +--> Rule Validator
          +--> Human Review Tool
   |
   +--> Evidence Verifier
   +--> Confidence Scorer
   +--> Response Generator
   +--> Trace Logger

Example Output

Request:

{
  "query": "EU customers are reporting payment failures after checkout. Should this be escalated?",
  "ticket_id": "TCK-1001"
}

Response:

{
  "ticket_id": "TCK-1001",
  "task_type": "escalation_decision",
  "priority": "P1",
  "recommendation": "Escalate this incident as P1. Payment-service error rate is 18.7%, above the 10% P1 threshold. Error-level logs were found for the affected service.",
  "confidence": 0.95,
  "human_review_required": false,
  "tools_used": [
    "sql_tool",
    "log_search_tool",
    "rag_tool",
    "rule_validator"
  ],
  "matched_rules": [
    "payment_failure_rate_above_10_percent",
    "affected_customers_above_50",
    "enterprise_customers_above_10",
    "error_logs_present"
  ],
  "trace_id": "trace_..."
}

Tool Routing

Query Type	Example	Selected Tools
Escalation decision	“Should EU payment failures be escalated?”	SQL, logs, RAG, rules
Policy lookup	“What does the SLA policy say?”	RAG
Metrics lookup	“What is the payment failure rate in EU?”	SQL
Log analysis	“Find timeout errors in payment logs.”	Log search
Ambiguous request	“Something seems wrong.”	Human review

Key Features

FastAPI backend
LangGraph-based workflow orchestration
Dynamic tool routing
SQL tool for structured service metrics
Log search tool for operational errors
RAG tool for policies and runbooks
Rule-based escalation validator
Evidence quality verification
Confidence scoring
Human-review fallback
Trace logging for auditability
Benchmark evaluation
Pytest test suite
Dockerized local deployment

Tech Stack

Python
FastAPI
LangGraph
SQLAlchemy
SQLite
Pydantic
Sentence Transformers
FAISS
Pandas
Pytest
Docker

Run Locally

Create and activate a virtual environment:

python -m venv .venv

Windows PowerShell:

.\.venv\Scripts\Activate.ps1

Linux/macOS:

source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Seed the database:

python -m app.db.seed

Build the retrieval index:

python -m app.retrieval.build_index

Run the API:

uvicorn app.main:app --reload

Open Swagger UI:

http://127.0.0.1:8000/docs

Run with Docker

docker compose up --build

Then open:

http://127.0.0.1:8000/docs

Evaluation

Run:

python -m evaluation.run_evaluation

The evaluation measures:

task classification accuracy
tool routing accuracy
human-review accuracy
priority accuracy
decision accuracy
average latency
p95 latency

Generated files:

evaluation/results.csv
evaluation/metrics_summary.json
evaluation/error_analysis.md

Testing

Run:

pytest

Test coverage includes:

classifier
planner
SQL tool
RAG tool
rule validator
API endpoints
human-review behavior
trace generation

Expected result:

28 passed

Screenshots


---

## Design Decisions

Key design choices:

- SQL is used for structured metrics.
- Log search is used for operational events.
- RAG is used for policies and runbooks.
- Escalation logic is handled by deterministic rules.
- Low-confidence or ambiguous cases go to human review.
- Every request is stored as a trace for auditability.
- Evaluation measures workflow behavior, not only final text output.

Detailed docs:

```text
docs/architecture.md
docs/design_decisions.md
docs/agent_workflow.md
docs/evaluation_methodology.md
docs/failure_cases.md
docs/production_considerations.md

Future Improvements

Potential production extensions:

PostgreSQL instead of SQLite
Qdrant or pgvector instead of FAISS
OpenTelemetry tracing
authentication and RBAC
GitHub Actions CI
larger benchmark dataset
hybrid search and reranking
human-review dashboard
integration with Jira, ServiceNow, Datadog, or Grafana

\end{itemize}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
docs		docs
evaluation		evaluation
screenshots		screenshots
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Operations Intelligence Platform

Problem

What the System Does

Architecture

Example Output

Tool Routing

Key Features

Tech Stack

Run Locally

Run with Docker

Evaluation

Testing

Screenshots

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Operations Intelligence Platform

Problem

What the System Does

Architecture

Example Output

Tool Routing

Key Features

Tech Stack

Run Locally

Run with Docker

Evaluation

Testing

Screenshots

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages