-
Notifications
You must be signed in to change notification settings - Fork 1
Project 2 T RAG
Trace-Native RAG (T-RAG) is an AI-powered root cause analysis system for complex distributed environments. It leverages telemetry traces collected from services (via OpenTelemetry and CAAT’s eBPF runtime) and augments a large language model (LLM) with a vector-based memory of past spans. By grounding the LLM in actual trace data and similar historical contexts, T-RAG generates structured explanations of “what failed and why,” closing the loop between observability and cognitive reasoning.
flowchart LR
%% Classes
classDef source fill:#dae8fc,stroke:#6c8ebf,stroke-width:1px,color:#000;
classDef proc fill:#fff2cc,stroke:#d6b656,stroke-width:1px,color:#000;
classDef store fill:#d5e8d4,stroke:#82b366,stroke-width:1px,color:#000;
classDef llm fill:#f8cecc,stroke:#b85450,stroke-width:1px,color:#000;
classDef output fill:#e1d5e7,stroke:#9673a6,stroke-width:1px,color:#000;
classDef caat fill:#fde9d9,stroke:#d79b00,stroke-width:1px,color:#000;
%% Nodes
subgraph CAAT["CAAT Layer (Project 1)"]
RL["RL Telemetry Optimizer\n(Budget Engine)"]
TP["Telemetry Pipeline\n(eBPF / OpenTelemetry)"]
end
TL["Trace Loader\n(Span Parsing & Summaries)"]
VM["Vector Memory Store\n(Embeddings & Similarity)"]
LLM["LLM Reasoner\n(Trace-Native RAG)"]
RC["Root Cause Report\n(JSON + Narrative RCA)"]
%% Apply classes
class TP,RL caat;
class TL proc;
class VM store;
class LLM llm;
class RC output;
%% Main flow
TP --> TL
TL --> VM
VM --> LLM
LLM --> RC
%% Control / feedback influence
RL -. "adjusts sampling / policies" .-> TP
- Telemetry Pipeline (CAAT): Provides trace data from eBPF, OpenTelemetry, and service instrumentation.
- Trace Loader: Converts raw spans into structured span summaries.
- Vector Memory Store: Embeds spans using Sentence-Transformers; supports similarity search.
- LLM Reasoner: Augmented prompt + context retrieved from vector memory → produces structured RCA.
-
Root Cause Report: JSON output with fields such as
root_cause,service_chain,reasoning.
- Code stored in
projects/t_rag/src/t_rag/. - Includes components:
trace_loader.pyvector_memory.pyllm_reasoner.pyservice.py- configurable via
config.py
- Example trace:
projects/t_rag/examples/sample_trace.json - Dependencies:
requirements.txt
cd MindOps
pip install -r projects/t_rag/requirements.txt
export OPENAI_API_KEY="sk-xxxx"
python -m t_rag.service --trace projects/t_rag/examples/sample_trace.jsonT-RAG consumes CAAT-optimized trace data.
CAAT decides what to collect (cost-aware).
T-RAG decides what it means (LLM reasoning).
- persistent vector database (Weaviate, Pinecone)
- multi-signal ingestion (logs + metrics)
- multi-agent RCA
MindOps — Closed‑loop observability for faster RCA, lower cost, and safer telemetry.
🧭 Explore: Home · Projects · Orchestrator · Control Plane
🏢 Enterprise: Adoption Guide · Security Model · Day‑Zero Demo
📣 Stay current: Dominant Forces in AI — curated research, trends, and battle‑tested playbooks.