Skip to content

toller892/Oh-Code-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌲 CodeTree

Vectorless RAG for Code Repositories

Navigate your codebase like a human expert β€” using LLM reasoning, not vector similarity.

Python 3.10+ License: MIT GitHub stars


πŸ€” The Problem

Traditional RAG (Retrieval-Augmented Generation) for code has fundamental limitations:

Problem Description
❌ Vector similarity β‰  Code relevance "login" and "logout" have similar embeddings, but they're completely different!
❌ Chunking destroys structure Splitting a class across chunks loses critical context
❌ Can't follow call chains "Who calls this function?" is nearly impossible with vectors
❌ No architecture understanding Vectors don't know that auth/ is for authentication

πŸ’‘ The Solution

CodeTree takes a different approach β€” it builds a hierarchical tree index of your codebase and uses LLM reasoning to navigate it, just like a human developer would:

  • βœ… AST-based parsing preserves code structure
  • βœ… LLM reasons about which files are relevant
  • βœ… Understands module relationships and dependencies
  • βœ… Can trace function calls across files

✨ Features

Feature Description
🚫 No Vector Database Uses code structure + LLM reasoning instead of embedding similarity
🌳 AST-Based Indexing Parses actual code structure β€” functions, classes, imports, dependencies
πŸ”— Cross-File Intelligence Tracks imports, function calls, and dependencies across your entire codebase
🧠 Reasoning-Based Retrieval LLM navigates the code tree like a human expert
πŸ’¬ Natural Language Queries Ask questions in plain English
πŸ”’ Privacy-First Works with local models (Ollama). Your code never leaves your machine

πŸ“Š Comparison: Vector RAG vs CodeTree

Feature Vector RAG CodeTree
Understands code structure ❌ βœ…
Cross-file references ❌ βœ…
"Who calls this function?" ❌ βœ…
No chunking headaches ❌ βœ…
Explainable retrieval ❌ βœ…
Works offline ⚠️ βœ…
No vector DB needed ❌ βœ…

πŸš€ Quick Start

Installation

pip install codetree-rag

Or from source:

git clone https://github.com/toller892/Oh-Code-Rag.git
cd Oh-Code-Rag
pip install -e .

Configuration

Set your LLM API key:

export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."

Basic Usage

from codetree import CodeTree

# Index your repository
tree = CodeTree("/path/to/your/repo")
tree.build_index()

# Ask questions about the code
answer = tree.query("How does the authentication system work?")
print(answer)

CLI Usage

# Index a repository
codetree index /path/to/repo

# Query the codebase  
codetree query "Where is database connection handled?"

# Interactive chat mode
codetree chat

# Show code structure
codetree tree

# Find symbol references
codetree find "UserService"

🎯 Use Cases

πŸ‘¨β€πŸ’» For Developers

Onboarding to New Codebases:

  • "What's the overall architecture of this project?"
  • "How do requests flow from API to database?"
  • "Where should I add a new payment method?"

Code Review & Understanding:

  • "What does the processOrder function do?"
  • "Who calls the validateUser method?"
  • "What happens if authentication fails?"

🏒 Industry Applications

Industry Use Case Example Query
FinTech Audit & Compliance "How is user data encrypted?"
Healthcare Security Review "Where is patient data accessed?"
E-commerce Feature Development "How does the cart system work?"
DevOps Incident Response "What services depend on Redis?"
Education Code Learning "Explain the MVC pattern in this app"

πŸ”¬ Research & Analysis

  • Legacy Code Migration: Understand old systems before rewriting
  • Security Auditing: Find all database queries, API endpoints
  • Documentation Generation: Auto-generate architecture docs
  • Dependency Analysis: Map out service dependencies

πŸ”¬ Real-World Examples

Example 1: Understanding Project Architecture

Query:

from codetree import CodeTree

tree = CodeTree("./my-project")
tree.build_index()

answer = tree.query("What's the overall architecture? What are the core modules?")
print(answer)

Output:

## Project Architecture

This project follows a modular architecture with these core components:

1. **CodeTree (core.py)** - Main entry point
   - `build_index()`: Builds the code tree
   - `query()`: Natural language queries
   - `find()`: Symbol search

2. **CodeIndexer (indexer.py)** - Index construction
   - Recursively parses directories
   - Builds TreeNode hierarchy
   
3. **CodeParser (parser.py)** - AST parsing
   - Supports Python, JS, Go, Rust, Java
   - Extracts functions, classes, imports

4. **CodeRetriever (retriever.py)** - LLM-based retrieval
   - Two-stage: retrieve β†’ answer
   - Uses reasoning prompts

## Data Flow
User Query β†’ CodeTree β†’ Retriever β†’ LLM Reasoning β†’ File Selection β†’ Answer

Example 2: Finding Function Usage

Query:

refs = tree.find("authenticate")
print(refs)

Output:

πŸ“ Found 5 references to 'authenticate':

  [function]  src/auth/login.py:45 β†’ authenticate
  [function]  src/auth/oauth.py:78 β†’ authenticate_oauth  
  [import]    src/api/middleware.py β†’ from auth import authenticate
  [import]    src/api/routes.py β†’ from auth.login import authenticate
  [class]     src/auth/base.py:12 β†’ Authenticator

Example 3: Tracing Code Flow

Query:

answer = tree.query("How does a user login request flow through the system?")
print(answer)

Output:

## Login Request Flow

1. **Entry Point**: `src/api/routes.py`
   - @app.post("/login") routes to auth_service.authenticate()

2. **Authentication**: `src/auth/service.py`
   - Validates credentials against database
   - Generates JWT token on success
   
3. **Database**: `src/db/users.py`
   - get_user_by_email() fetches user record
   - verify_password() checks hash

4. **Response**: Returns JWT token or 401 error

πŸ—οΈ How It Works

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        CodeTree                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚   CodeParser ──────▢ CodeIndexer ──────▢ CodeIndex (JSON)   β”‚
β”‚   (AST Parse)        (Build Tree)        (Store)            β”‚
β”‚                                              β”‚               β”‚
β”‚                                              β–Ό               β”‚
β”‚   Answer ◀────────── Retrieve ◀────────── CodeRetriever    β”‚
β”‚   (Markdown)         (Read Files)         (LLM Reasoning)   β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Two-Stage Retrieval Process

Stage 1: Reasoning-Based Navigation

User: "How does authentication work?"
                    β”‚
                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LLM analyzes code tree structure:                           β”‚
β”‚                                                             β”‚
β”‚ "Authentication relates to auth module...                   β”‚
β”‚  Let me check src/auth/ directory...                        β”‚
β”‚  login.py and oauth.py look relevant...                     β”‚
β”‚  Also need to check who imports these..."                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
Selected Files: [src/auth/login.py, src/auth/oauth.py, ...]

Stage 2: Answer Generation

Read selected files β†’ Generate comprehensive answer with code snippets

πŸ—£οΈ Supported Languages

Language Extensions Status
Python .py, .pyi βœ… Full
JavaScript .js, .jsx, .mjs βœ… Full
TypeScript .ts, .tsx βœ… Full
Go .go βœ… Full
Rust .rs βœ… Full
Java .java βœ… Full
C/C++ .c, .cpp, .h 🚧 Coming Soon

βš™οΈ Configuration

Create .codetree.yaml in your project:

# LLM Configuration
llm:
  provider: openai          # openai, anthropic, ollama
  model: gpt-4o
  temperature: 0.0
  max_tokens: 4096

# For local/private deployment
# llm:
#   provider: ollama
#   model: llama3
#   base_url: http://localhost:11434

# Index Settings  
index:
  languages:
    - python
    - javascript
    - typescript
    - go
  exclude:
    - node_modules
    - __pycache__
    - .git
    - venv
    - dist
  max_file_size: 100000    # Skip files larger than 100KB

πŸ“ˆ Performance

Metric Small Repo (<100 files) Medium Repo (<1000 files) Large Repo (<10000 files)
Index Time < 5s < 30s < 5min
Index Size < 100KB < 1MB < 10MB
Query Time 2-5s 3-8s 5-15s

Times depend on LLM provider latency


🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas to contribute:

  • 🌍 Add language parsers (C++, Ruby, PHP, etc.)
  • πŸ§ͺ Improve test coverage
  • πŸ“– Documentation and examples
  • πŸš€ Performance optimizations
  • 🎨 CLI improvements

πŸ”Œ MCP Server (Claude Desktop & More)

CodeTree works as an MCP (Model Context Protocol) server, compatible with Claude Desktop, Cline, Continue, and other MCP clients.

Installation

pip install codetree-mcp

Setup for Claude Desktop

Add to your Claude Desktop config:

{
  "mcpServers": {
    "codetree": {
      "command": "python",
      "args": ["/path/to/Oh-Code-Rag/mcp/server.py"],
      "env": {
        "OPENAI_API_KEY": "sk-your-key-here"
      }
    }
  }
}

MCP Tools

Tool Description
codetree_index Index a repository
codetree_query Ask questions about code
codetree_tree Show code structure
codetree_find Find symbol references
codetree_stats Get repo statistics

See mcp/README.md for full documentation.


πŸ€– Clawdbot Skill

CodeTree also comes as a Clawdbot skill for AI assistant integration.

Installation

pip install codetree-skill

Or copy the skill/ folder to your Clawdbot skills directory:

cp -r skill/ ~/.clawdbot/skills/codetree/

Skill Commands

# Index a repo
./scripts/codetree.sh index /path/to/repo

# Query code
./scripts/codetree.sh query /path/to/repo "How does auth work?"

# Show structure
./scripts/codetree.sh tree /path/to/repo

# Find symbol
./scripts/codetree.sh find /path/to/repo "UserService"

See skill/SKILL.md for full documentation.


πŸ“„ License

MIT License - see LICENSE for details.


πŸ™ Acknowledgments

Inspired by PageIndex β€” vectorless RAG for documents.


⭐ Star History

Star History Chart


If you find CodeTree useful, please give us a ⭐!

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors