Project 2 T RAG

Project 2 – T-RAG: Trace-Native RAG for Root Cause

1. Overview

Trace-Native RAG (T-RAG) is an AI-powered root cause analysis system for complex distributed environments. It leverages telemetry traces collected from services (via OpenTelemetry and CAAT’s eBPF runtime) and augments a large language model (LLM) with a vector-based memory of past spans. By grounding the LLM in actual trace data and similar historical contexts, T-RAG generates structured explanations of “what failed and why,” closing the loop between observability and cognitive reasoning.

2. Architecture

flowchart LR
    %% Classes
    classDef source fill:#dae8fc,stroke:#6c8ebf,stroke-width:1px,color:#000;
    classDef proc   fill:#fff2cc,stroke:#d6b656,stroke-width:1px,color:#000;
    classDef store  fill:#d5e8d4,stroke:#82b366,stroke-width:1px,color:#000;
    classDef llm    fill:#f8cecc,stroke:#b85450,stroke-width:1px,color:#000;
    classDef output fill:#e1d5e7,stroke:#9673a6,stroke-width:1px,color:#000;
    classDef caat   fill:#fde9d9,stroke:#d79b00,stroke-width:1px,color:#000;

    %% Nodes
    subgraph CAAT["CAAT Layer (Project 1)"]
        RL["RL Telemetry Optimizer\n(Budget Engine)"]
        TP["Telemetry Pipeline\n(eBPF / OpenTelemetry)"]
    end

    TL["Trace Loader\n(Span Parsing & Summaries)"]
    VM["Vector Memory Store\n(Embeddings & Similarity)"]
    LLM["LLM Reasoner\n(Trace-Native RAG)"]
    RC["Root Cause Report\n(JSON + Narrative RCA)"]

    %% Apply classes
    class TP,RL caat;
    class TL proc;
    class VM store;
    class LLM llm;
    class RC output;

    %% Main flow
    TP --> TL
    TL --> VM
    VM --> LLM
    LLM --> RC

    %% Control / feedback influence
    RL -. "adjusts sampling / policies" .-> TP

Architecture Description

Telemetry Pipeline (CAAT): Provides trace data from eBPF, OpenTelemetry, and service instrumentation.
Trace Loader: Converts raw spans into structured span summaries.
Vector Memory Store: Embeds spans using Sentence-Transformers; supports similarity search.
LLM Reasoner: Augmented prompt + context retrieved from vector memory → produces structured RCA.
Root Cause Report: JSON output with fields such as root_cause, service_chain, reasoning.

3. Implementation Overview

Code stored in projects/t_rag/src/t_rag/.
Includes components:
- trace_loader.py
- vector_memory.py
- llm_reasoner.py
- service.py
- configurable via config.py
Example trace: projects/t_rag/examples/sample_trace.json
Dependencies: requirements.txt

4. Quickstart

cd MindOps
pip install -r projects/t_rag/requirements.txt
export OPENAI_API_KEY="sk-xxxx"
python -m t_rag.service --trace projects/t_rag/examples/sample_trace.json

5. Integration with CAAT (Project 1)

T-RAG consumes CAAT-optimized trace data. CAAT decides what to collect (cost-aware).
T-RAG decides what it means (LLM reasoning).

6. Roadmap

persistent vector database (Weaviate, Pinecone)
multi-signal ingestion (logs + metrics)
multi-agent RCA

MindOps — Closed‑loop observability for faster RCA, lower cost, and safer telemetry.

Status Focus OpenSLO Security

🧭 Explore: Home · Projects · Orchestrator · Control Plane

🏢 Enterprise: Adoption Guide · Security Model · Day‑Zero Demo

📣 Stay current: Dominant Forces in AI — curated research, trends, and battle‑tested playbooks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project 2 T RAG

Project 2 – T-RAG: Trace-Native RAG for Root Cause

1. Overview

2. Architecture

Architecture Description

3. Implementation Overview

4. Quickstart

5. Integration with CAAT (Project 1)

6. Roadmap

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Overview

Projects

Tooling

Enterprise Readiness

Clone this wiki locally