-
Notifications
You must be signed in to change notification settings - Fork 371
kaito benchmark blog #5497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
kaito benchmark blog #5497
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,392 @@ | ||||||
| --- | ||||||
| title: "Benchmarking KAITO RAG: Measuring Performance Gains for Document and Code Q&A" | ||||||
| date: "2025-12-08" | ||||||
| description: "Comprehensive benchmark results comparing RAG vs baseline LLM performance across document question answering and code modification tasks with KAITO on AKS." | ||||||
| authors: ["bangqi-zhu"] | ||||||
| tags: | ||||||
| - ai | ||||||
| - kaito | ||||||
| - rag | ||||||
| - benchmarking | ||||||
| - performance | ||||||
| --- | ||||||
|
|
||||||
| Retrieval-Augmented Generation (RAG) has emerged as a powerful technique for enhancing LLM accuracy by grounding responses in relevant context. But how much does RAG actually improve performance? We developed comprehensive benchmarking tools to quantify RAG effectiveness across two critical use cases: document question answering and code issue resolution. | ||||||
|
|
||||||
| In this post, we share our methodology, results, and insights from benchmarking [KAITO's RAG service](https://kaito-project.github.io/kaito/docs/rag/) on AKS. The findings reveal where RAG excels and where challenges remain. | ||||||
|
|
||||||
| <!-- truncate --> | ||||||
|
|
||||||
| ## Why Benchmark RAG? | ||||||
|
Comment on lines
+18
to
+20
|
||||||
|
|
||||||
| When evaluating RAG systems, subjective impressions aren't enough. You need quantitative metrics to answer critical questions: | ||||||
|
|
||||||
| - **How much does RAG improve answer quality?** Traditional LLMs rely solely on pre-trained knowledge, which can be outdated or incomplete for domain-specific queries. | ||||||
| - **Is RAG cost-effective?** Token usage directly impacts operational costs at scale. | ||||||
| - **Where does RAG struggle?** Understanding failure modes guides system improvements. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| To address these questions, we built two specialized benchmarking suites that test RAG in fundamentally different scenarios. | ||||||
|
|
||||||
| ## Two Distinct Testing Scenarios | ||||||
|
|
||||||
| RAG performance varies significantly based on the task. We designed benchmarks for two key use cases: | ||||||
|
|
||||||
| | Scenario | Focus | Validation Method | Key Metric | | ||||||
| |----------|-------|-------------------|------------| | ||||||
| | **Document Q&A** | Factual recall and comprehension | LLM-as-judge scoring | Answer accuracy (0-10) | | ||||||
| | **Code Modification** | Practical implementation changes | Unit test execution | Success rate (pass/fail) | | ||||||
|
|
||||||
| Let's dive into each benchmark and its results. | ||||||
|
|
||||||
| --- | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ## Document Q&A Benchmark | ||||||
|
|
||||||
| ### Methodology | ||||||
|
|
||||||
| The document benchmark evaluates how well RAG answers questions based on indexed content: | ||||||
|
|
||||||
| 1. **Index Documents**: Pre-index your documents (PDFs, reports, manuals) in the RAG system | ||||||
| 2. **Generate Test Questions**: Automatically create 20 questions from indexed content: | ||||||
| - 10 closed questions (factual, specific answers) | ||||||
| - 10 open questions (analysis, comprehension) | ||||||
| 3. **Compare Responses**: Both RAG and pure LLM answer the same questions | ||||||
| 4. **LLM Judge Evaluation**: A separate LLM scores each answer (0-10 scale) | ||||||
| 5. **Analyze Results**: Compare scores, token usage, and performance improvement | ||||||
|
|
||||||
| **Architecture Flow:** | ||||||
|
|
||||||
|  | ||||||
|
||||||
|  | |
|  |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.
|  | |
|  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on the context of these inputs and how they're used for benchmarking against the chosen LLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --- |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.
|  | |
|  |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image reference is missing descriptive alt text. According to the blog post guidelines, all images MUST have descriptive alt text for accessibility and SEO. The alt text should describe what the image shows, not just repeat the caption.
|  | |
|  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| RAG's 60% success validates the TOP-4 filtering approach! This proves that: | |
| RAG's 60% success validates the TOP-4 filtering approach, demonstrating that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this referencing OSS installation instructions and not the KAITO add-on?
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbered list formatting is inconsistent. Item 2 "RAG Engine Deployed" should be numbered as "2." to continue the sequence. However, there's a list item "1. Index Your Content" starting at line 297, which appears to be a third item but is numbered as 1. Please renumber this as "3." to maintain proper list sequence.
| 1. **Index Your Content**: | |
| 3. **Index Your Content**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the emojis. This screams "AI generated" at me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ✅ **RAG Excels At:** | |
| **RAG Excels At:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 💡 **Optimization Insights:** | |
| **Optimization Insights:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not really descriptive link text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing hero image after the truncate marker. According to the blog post guidelines, a hero image should be placed immediately after
<!-- truncate -->using the pattern. The required structure is:Consider adding a hero image that visually represents the blog post's main topic (benchmarking KAITO RAG).