You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Added Overview section with problem statement and solution
- Added How It Works explanation of the agentic RAG flow
- Added Component Selection table explaining technology choices
- Fixed visual vertical bar alignment in architecture diagram
- Added note for users wanting to use their own dataset
Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>
Copy file name to clipboardExpand all lines: examples/mcp_rag_demo/README.md
+44-2Lines changed: 44 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,52 @@ limitations under the License.
17
17
18
18
# MCP RAG Demo with NVIDIA NIMs
19
19
20
-
This example demonstrates how to expose custom tools via the Model Context Protocol (MCP) using NVIDIA NeMo Agent toolkit with NVIDIA NIM integration. It showcases semantic search, filtering, and reranking of support tickets using NVIDIA NIMs for embedding, LLM reasoning, and reranking.
20
+
## Overview
21
+
22
+
### The Problem
23
+
24
+
Enterprise AI applications need to connect Large Language Models (LLMs) to external data sources and tools. However, each integration typically requires custom code, leading to:
25
+
26
+
-**Fragmented tool ecosystems** - Different frameworks require different integration patterns
27
+
-**Vendor lock-in** - Tools built for one AI platform don't work with others
28
+
-**Security complexity** - Each integration needs its own authentication handling
29
+
-**Maintenance burden** - Updates to tools require changes across multiple integrations
30
+
31
+
### The Solution
32
+
33
+
This example demonstrates how to solve these challenges using the **Model Context Protocol (MCP)** - an open standard that enables AI applications to securely connect to external tools through a unified interface. By exposing tools via MCP, they become instantly accessible to any MCP-compatible client including Claude Desktop, Cursor IDE, and custom agents.
34
+
35
+
### How It Works
36
+
37
+
The demo implements an **Agentic RAG (Retrieval-Augmented Generation)** system for searching support tickets:
38
+
39
+
1.**User asks a question** via the chat UI (e.g., "Find critical GPU driver issues")
40
+
2.**ReAct Agent reasons** about which tools to use and in what order
41
+
3.**MCP Tools execute** - semantic search, filtering, and reranking operations
42
+
4.**NVIDIA NIMs process** the requests using GPU-accelerated AI models
43
+
5.**Agent synthesizes** the results into a coherent response
44
+
45
+
### Component Selection
46
+
47
+
| Component | Technology | Why This Choice |
48
+
|-----------|------------|-----------------|
49
+
|**Protocol**| MCP (Streamable HTTP) | Open standard with auth support, works with any MCP client |
|**Vector Database**| Milvus | GPU-accelerated with cuVS, scales to billions of vectors |
52
+
|**Embeddings**|`nvidia/nv-embedqa-e5-v5`| High-quality 1024-dim embeddings optimized for Q&A retrieval |
53
+
|**LLM**|`meta/llama-3.1-70b-instruct`| Strong reasoning for agent orchestration and response generation |
54
+
|**Reranker**|`nvidia/llama-3.2-nv-rerankqa-1b-v2`| Improves retrieval precision by reordering results by relevance |
55
+
56
+
---
21
57
22
58
## Table of Contents
23
59
24
60
-[MCP RAG Demo with NVIDIA NIMs](#mcp-rag-demo-with-nvidia-nims)
61
+
-[Overview](#overview)
62
+
-[The Problem](#the-problem)
63
+
-[The Solution](#the-solution)
64
+
-[How It Works](#how-it-works)
65
+
-[Component Selection](#component-selection)
25
66
-[Table of Contents](#table-of-contents)
26
67
-[Key Features](#key-features)
27
68
-[Architecture](#architecture)
@@ -67,7 +108,7 @@ This demo uses a 3-terminal architecture:
67
108
68
109
```
69
110
┌─────────────┐ REST ┌─────────────────┐
70
-
│ NAT UI │ ◄──────────────────► │ NAT UI Server │
111
+
│ NAT UI │ ◄──────────────────► │ NAT UI Server │
71
112
│ (Browser) │ │ (MCP Client) │
72
113
└─────────────┘ └────────┬────────┘
73
114
│
@@ -127,6 +168,7 @@ docker-compose up -d
127
168
```
128
169
129
170
### Load Sample Data
171
+
**Note:** The sample dataset is synthetic. To use your own data, modify `load_support_tickets.py` with your Milvus connection and data schema, then update the tool queries in `register.py` to match your fields.
0 commit comments