knot is a high-performance codebase indexer that extracts structural and semantic information from source code, enabling AI agents to understand, analyze, and navigate large code repositories. Currently supports Java, Kotlin, TypeScript, JavaScript/Node.js, Rust, Python, Groovy, C/C++, HTML, and CSS/SCSS, plus Build Systems (Maven pom.xml, Gradle build.gradle, Jenkins pipeline, Cargo.toml), Configuration Files (YAML, JSON, .properties — optional), Kubernetes + Helm (optional), and Cross-Repo Dependency Linking with full cross-language linking.
For recent release notes see CHANGELOG.md.
The indexer automatically builds:
- Vector Search Database (Qdrant) — semantic understanding via embeddings
- Graph Database (Neo4j) — architectural relationships via call graphs
This dual-database approach powers both:
- MCP (Model Context Protocol) Server — Exposes three tools to any LLM client (Claude, Gemini, ChatGPT, Cursor, etc.)
- CLI Tool — Standalone
knotcommand for terminal and scripting environments
🔍 Code Intelligence Tools
search_hybrid_context: Semantic + structural search. Find code by meaning, class name, method signature, docstrings, or comments. Returns full context including dependencies.find_callers: Reverse dependency lookup. Identify dead code, perform impact analysis, or understand the full call chain of any function/method. When multiple entities share the same name (e.g.,find_nearest_entity_by_linein different files), results are automatically grouped by target showing which specific entity each caller references. Supports cross-repository call resolution viaDEPENDS_ONgraph edges.explore_file: File anatomy inspection. Quickly see all classes, interfaces, methods, and functions in a file with signatures and documentation.list_repo_dependencies(MCP) /knot deps(CLI): Dependency graph visualization. Show which repositories depend on each other, forward and reverse, with transitive resolution.list_repositories/knot repos: Repository inventory. List every indexed repository along with its entity count, file count, build system, and primary language. Supports optional case-insensitive name filtering via--filter(CLI) orfilterparameter (MCP). Useful for orientation, sanity-checking indexing runs, and discovering which languages and build systems are present in the workspace.
🏗️ Multi-Language Support
- Java: Full AST extraction with package-aware FQN resolution (e.g.,
com.example.app.UserService), class inheritance (EXTENDS), interface implementation (IMPLEMENTS), annotation tracking, and field-access method invocation resolution - Kotlin: Complete support for Kotlin codebases with classes, interfaces, objects, companion objects, functions, methods, and properties. Fully compatible with tree-sitter-kotlin-ng grammar.
- TypeScript/TSX/CTS: Complete support for modern JavaScript/TypeScript codebases, including CommonJS TypeScript files
- JavaScript/Node.js: Vanilla JS, Node.js, and module systems (
.js,.mjs,.cjs,.jsx) - Hybrid Web Ecosystem: Cross-language linking between JavaScript, HTML, and CSS for full-stack SPA analysis
- HTML: Custom elements (Web Components, Angular),
idandclassattribute indexing for cross-language CSS search - JSX/TSX Attributes: Extracts
idandclassNamefrom React components for unified HTML/CSS discovery - CSS/SCSS: Stylesheet indexing with class/ID selector extraction and variable tracking (CSS/SCSS variables, mixins, functions)
- Rust: Struct, enum, union, trait, function, method, module extraction with trait implementation tracking (IMPLEMENTS relationships) and macro invocation references. Methods are indexed with the qualified FQN
Type::method(e.g.,KnotMcpHandler::new,WidgetA::new,Logger::new) and qualified calls from top-level functions resolve to the right target by receiver. Braced import/use capture —use foo::{Bar, Baz}anduse foo::Bar as Bazproduce explicit REFERENCES edges for all imported names, including traits imported solely to bring methods into scope. All Rust entity FQNs are now anchored at the owning crate and module path (e.g.knot::config::Config,knot::pipeline::parser::languages::rust::qualify_rust_fqns), so two crates that declare a type with the same bare name no longer collide. Files outsidesrc/(tests, benches, examples) receive a__fixture::<path>::<Entity>FQN prefix (e.g.__fixture::tests::testing_files::sample::Config), and files without aCargo.tomlancestor receive__loose::<path>::<Entity>, preventing name collisions with real source entities. CONTAINS relationships useenclosing_class_fqnfor exact disambiguation when multiple entities share the same class name. The on-disk index state file (.knot/index_state.json) carries aversionfield; opening a state file from an older version prints an error with instructions to runknot-indexer --clean. - Python: Full Python extraction with class, function, method support, constants, module-level imports,
ValueReferencetracking for keyword arguments, class inheritance (EXTENDS), decorator extraction (@property,@staticmethod,@route(...),@dataclass), generic type hints (List[str],Optional[Dict],*args/**kwargs), Py2/Py3 exception syntax compatibility, andself.method()resolution with inherited method walking. Capturesclass_definition,function_definition(including async via optionalasyncmodifier), lambda assignments, and distinguishes methods from functions via parent context detection. Class instantiation (ClassName(...)) is automatically redirected toClassName.__init__sofind_callers ClassName.__init__lists every constructor call site (with fallback to inherited__init__via the extends chain); only class/struct kinds trigger the redirect — functions keep the legacy behavior. - Groovy: Full Groovy language support via hybrid tree-sitter + ad-hoc lexical parser. Extracts classes, interfaces, traits, enums, typed/
def/quoted methods (incl. Spock specs), constructors, closures, script-level variables, fields/properties with visibility modifiers, nested classes, and decorators. Tracks package FQN and enclosing class relationships. Multi-line signatures (closure default params), assignment-vs-declaration disambiguation, innermost assignment for nested closures, UUID collision fix for duplicate method names,find_callersaccurately tracks private methods including those in anonymousnew AnActionclosures. - Build Systems: Maven
pom.xml(dependencies + plugins via roxmltree), Gradlebuild.gradle(deps + plugins + tasks), andJenkinsfilepipeline (stages + steps) extraction. - Cargo.toml: Rust package manager support with package metadata, features, workspace members, and multi-format dependency parsing (simple, table, git, path).
- Configuration Files: YAML (.yml/.yaml), JSON (.json), and Java Properties (.properties) with leaf-key granularity. Special handling for package.json (npm dependencies as BuildDependency, scripts as ConfigProperty).
- Kubernetes + Helm: K8s manifest parsing (Deployment, Service, ConfigMap, Secret, Ingress, Namespace) with label/annotation tracking and cross-resource references. Helm chart indexing (Chart.yaml metadata, values.yaml key-value pairs, template variable extraction via {{ .Values.X }}).
- C/C++: Complete C/C++ support with namespace-aware FQN resolution (
Engine::MyClass::start), class/struct extraction, function/method tracking, macro definition and usage detection (uppercase identifier heuristic), type reference tracking (declarations,newexpressions), and full call graph analysis. Supports.c,.h,.cpp,.hpp,.cc,.cxx,.hh,.hxxextensions via tree-sitter-c and tree-sitter-cpp parsers. Includes intelligent auto-detection for.hheaders to parse them correctly as C or C++ based on their contents. - Markdown: Documentation indexing with
MarkdownDocument(one per.md/.markdownfile) andMarkdownSection(one per ATX heading H1–H6). Section bodies — including paragraphs, fenced code blocks, lists, and tables — are captured intoembed_textfor full semantic search over documentation content, not just heading titles. FQNs are hierarchical and file-scoped (e.g.README.md::Setup > Installation > Linux), so same-named headings in different files or under different parents disambiguate cleanly. Section boundaries respect heading depth: a section's body extends until the next heading of equal or higher level, ensuring### Linuxunder## Installationdoes not bleed into a sibling## Configuration. Headings with inline markdown (backticks, em-dash, links, emoji) parse without losing their bodies, and realstart_line/end_linepositions are computed via tree-sitter for each section.
📚 Rich Comment Extraction
- Captures docstrings (JavaDoc, JSDoc) preceding declarations
- Extracts inline comments within method/function bodies
- Respects nesting boundaries (class comments don't capture method comments)
- Intelligently aggregates comment blocks
📊 Dual-Database Architecture
- Qdrant: Vector search for semantic code understanding
- Neo4j: Graph relationships for structural navigation
🚀 High Performance
- Parallel Streaming Pipeline: Overlaps CPU-bound embedding with I/O-bound ingestion via MPSC channels
- Incremental Indexing: Uses SHA-256 hashes to skip unchanged files
- Real-time Watch Mode: Automatically re-indexes changed files in seconds via
--watch - CPU Parallelism: AST extraction via Rayon
- Scalable: Configurable batch processing and constant memory footprint (~2GB) regardless of repository size
- Performance Benchmarking: Three-level validation framework
- Unit benchmarks: Criterion-based benchmarks for parse, embed, and graph write throughput (
benches/) - E2E benchmarks: Full pipeline metrics capture with per-stage timing (
tests/benchmark_e2e.sh) - CI regression tracking: Automated baseline comparison against tolerance thresholds (
scripts/compare_perf_metrics.sh)
- Unit benchmarks: Criterion-based benchmarks for parse, embed, and graph write throughput (
| Component | Version | Notes |
|---|---|---|
| Docker | 20.10+ | For running Qdrant and Neo4j |
| qdrant | 1.x | Vector database (docker) |
| neo4j | 5.x | Graph database (docker) |
Go to the Releases page and download the native executable for your platform.
Install knot binaries (CLI, MCP server, and indexer):
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/raultov/knot/releases/latest/download/knot-installer.sh | shInstall agent-skills for your AI (Optional): Paste this into your LLM agent (Claude Code, OpenCode, Cursor, etc.):
Install the knot agent skills by following the instructions at: https://raw.githubusercontent.com/raultov/knot/master/README.md
The first command installs the knot binary to your PATH. The second (optional) allows your AI assistant to automatically download the agent skill index (.knot-agent.md) and run the installer to extract comprehensive guides for using knot CLI with AI agents and code analysis tools.
System Requirements:
- Linux: glibc 2.38+ (Ubuntu 24.04+, Debian 13+, Fedora 39+, Arch)
- macOS: Modern versions supported
- Windows: Use Docker (Option B)
Docker images provide universal compatibility for any Linux distribution and Windows.
Build the image:
docker build -t knot:latest . --network=hostRun the indexer:
# Use --network host to connect to databases running on your host machine
docker run --rm \
-v /path/to/your/repo:/workspace \
-e KNOT_REPO_PATH=/workspace \
-e KNOT_NEO4J_PASSWORD=your-password \
--network host \
knot:latest \
knot-indexerRun the CLI tool:
docker run --rm \
-v /path/to/your/repo:/workspace \
-e KNOT_REPO_PATH=/workspace \
-e KNOT_NEO4J_PASSWORD=your-password \
--network host \
knot:latest \
knot search "user login flow"Run the MCP server:
docker run --rm \
-e KNOT_REPO_PATH=/workspace \
-e KNOT_NEO4J_PASSWORD=your-password \
--network host \
knot:latest \
knot-mcpNote: Uses Debian Trixie (glibc 2.38+) and includes ONNX Runtime for full functionality.
cargo install --git https://github.com/raultov/knot1. Start infrastructure with Docker:
docker compose up -d2. Clone and build:
git clone https://github.com/raultov/knot
cd knot
cargo build --release3. Configure:
mkdir -p ~/.config/knot
cp .env.example ~/.config/knot/.env
$EDITOR ~/.config/knot/.env # Set KNOT_REPO_PATH and Neo4j credentials4. Index a codebase:
./target/release/knot-indexer5. Query via CLI:
./target/release/knot search "your query"6. Start the MCP server:
./target/release/knot-mcpOption A: Let an LLM do it
Paste this into any LLM agent (Claude Code, OpenCode, Cursor, etc.):
Install the knot agent skills by following the instructions at: https://raw.githubusercontent.com/raultov/knot/master/README.md
Option B: Terminal (Manual)
curl -sO https://raw.githubusercontent.com/raultov/knot/master/.knot-agent.md && curl -fsSL https://raw.githubusercontent.com/raultov/knot/master/scripts/install-agent-skills.sh | bashDownload knot binaries (CLI + MCP server):
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/raultov/knot/releases/latest/download/knot-installer.sh | shComprehensive documentation for using knot tools. The agent skills installer extracts:
- search.md — Semantic code discovery guide with examples
- callers.md — Reverse dependency lookup with critical usage rules
- explore.md — File anatomy inspection guide
- deps.md — Repository dependency graph guide
- repos.md — Indexed repository inventory
- workflows.md — Common patterns and best practices
For quick reference without downloading, see .knot-agent.md.
The knot CLI provides the same capabilities as the MCP server via command-line commands, making it ideal for:
- Terminal-only environments
- Bash scripting and automation
- CI/CD pipelines
- Direct integration with other tools
Three main commands:
knot search "user authentication" --max-results 10 --repo my-appFind code entities by meaning, class names, docstrings, or comments.
knot callers "LoginService" --repo my-appFind all code that references a specific entity (dead code detection, impact analysis, call chains). When multiple entities share the same name in different files, results are automatically grouped by target with file locations and signatures.
knot explore "src/services/auth.ts" --repo my-appList all classes, methods, functions in a file with signatures and documentation.
knot deps my-app --depth 2 # Show forward dependencies (transitive)
knot deps my-app --reverse # Show who depends on this repoVisualize auto-discovered dependencies between indexed repositories with transitive resolution up to 3 levels deep.
knot repos # Table with REPO / BUILD SYSTEM / LANGUAGE / FILES / ENTITIES
knot repos --filter app # Case-insensitive name filter (substring match)
knot repos --output json # Machine-readable list
knot repos --output markdown # GFM table for chat UIsShow the status of every repository currently indexed in the graph database — useful for orientation, sanity-checking that an indexing run completed, and discovering which languages and build systems are present across the workspace. Use --filter to quickly locate a specific repository when working with multiple indexed codebases.
For detailed CLI usage guide, see .knot-agent.md — a machine-readable skill that teaches LLMs how to use knot CLI for autonomous code analysis.
# First run: indexes all files
knot-indexer --repo-path /path/to/your/repo --neo4j-password secret
# Subsequent runs: only re-indexes changed files (fast!)
knot-indexer --repo-path /path/to/your/repo --neo4j-password secret
# NEW: Real-time Watch mode
knot-indexer --watch --repo-path /path/to/your/repo --neo4j-password secretHow it works:
- Tracks file content via SHA-256 hashes in
.knot/index_state.json - Stores the downloaded
fastembedmodel in.knot/fastembed_cache/to keep the workspace clean - Automatically detects: modified, added, and deleted files
- Only re-parses and re-embeds changed files
- Preserves graph relationships to unchanged files
- Processes entities in memory-efficient 512-entity chunks
Performance:
- Initial index (3800 files): ~60 minutes on standard hardware
- Incremental update (3 files changed): ~5-10 seconds
- Memory usage: Constant ~2GB regardless of repository size
# Force complete re-index (deletes all existing data)
knot-indexer --clean --repo-path /path/to/your/repo --neo4j-password secretUse --clean when:
- You want to rebuild the entire index from scratch
- You've changed Tree-sitter queries or embedding models
- Troubleshooting indexing issues
To ensure indexer stability, run the E2E integration test suite:
# Run all language E2E tests (TypeScript, Java, JavaScript, Web, Kotlin, Rust, ...)
./tests/run_all_e2e_fast.sh
# Run only Kotlin E2E tests
./tests/run_kotlin_e2e.sh
# Run only Rust E2E tests
./tests/run_rust_e2e.shSee tests/KOTLIN_E2E_TESTS.md for detailed coverage and troubleshooting.
The MCP server exposes three tools to any compatible AI client:
Find code by meaning or keywords
Query: "How is user authentication implemented?"
Result: All auth-related code, signatures, docstrings, and dependencies
Capabilities:
- Semantic search by functionality
- Class/method/function name lookup
- Docstring and inline comment search
- Architectural pattern discovery
- Full dependency context
Find who calls a specific function
Query: "Find callers of getCurrentTimeInSeconds"
Result: All code that invokes this function + file locations
Advanced: Search by Signature
# Find by full signature (Java)
echo '{"method":"tools/call","params":{"name":"find_callers","arguments":{"entity_name":"registerUser(String"}}}' | knot-mcp
# Find by parameter type (Kotlin)
echo '{"method":"tools/call","params":{"name":"find_callers","arguments":{"entity_name":"findById(Int"}}}' | knot-mcp
# Find by type annotation (TypeScript)
echo '{"method":"tools/call","params":{"name":"find_callers","arguments":{"entity_name":"(EventData"}}}' | knot-mcpUse Cases:
- Dead Code Detection: Zero callers = unused code
- Impact Analysis: "What breaks if I modify this?"
- Refactoring Safety: Find all references before removing
Understand file structure
Query: "What's in BrowserService.ts?"
Result: All classes, methods, and functions with signatures and docs
Use Cases:
- Quick file navigation
- Module structure overview
- Finding all methods in a class without reading line-by-line
knot works with any MCP-compatible AI client:
- ✅ Claude Desktop (Anthropic)
- ✅ Gemini CLI (Google)
- ✅ ChatGPT CLI / GPT (OpenAI)
- ✅ Cursor (AI IDE)
- ✅ Any standard MCP client
Add to claude_desktop_config.json:
{
"mcpServers": {
"knot": {
"command": "/absolute/path/to/knot/target/release/knot-mcp",
"env": {
"KNOT_REPO_PATH": "/path/to/indexed/repo",
"KNOT_QDRANT_URL": "http://localhost:6334",
"KNOT_NEO4J_URI": "bolt://localhost:7687",
"KNOT_NEO4J_USER": "neo4j",
"KNOT_NEO4J_PASSWORD": "your-password"
}
}
}
}{
"mcpServers": {
"knot": {
"command": "/absolute/path/to/knot/target/release/knot-mcp",
"env": {
"KNOT_REPO_PATH": "/path/to/indexed/repo",
"KNOT_QDRANT_URL": "http://localhost:6334",
"KNOT_NEO4J_URI": "bolt://localhost:7687",
"KNOT_NEO4J_USER": "neo4j",
"KNOT_NEO4J_PASSWORD": "your-password"
}
}
}
}Similar JSON configuration in your client's MCP configuration file.
All options can be set via CLI flags, environment variables, or a ~/.config/knot/.env file.
Priority (highest to lowest): CLI flags > environment variables > .env file.
| Env Variable | CLI Flag | Default | Description |
|---|---|---|---|
KNOT_REPO_PATH |
--repo-path |
(required) | Root directory of the repository to index |
KNOT_REPO_NAME |
--repo-name |
(auto-detected) | Repository name for multi-repo isolation (auto-detected from last path component) |
KNOT_QDRANT_URL |
--qdrant-url |
http://localhost:6334 |
Qdrant server URL |
KNOT_QDRANT_COLLECTION |
--qdrant-collection |
knot_entities |
Qdrant collection name |
KNOT_NEO4J_URI |
--neo4j-uri |
bolt://localhost:7687 |
Neo4j Bolt URI |
KNOT_NEO4J_USER |
--neo4j-user |
neo4j |
Neo4j username |
KNOT_NEO4J_PASSWORD |
--neo4j-password |
(required) | Neo4j password |
KNOT_EMBED_DIM |
--embed-dim |
384 |
Embedding vector dimension |
KNOT_BATCH_SIZE |
--batch-size |
64 |
Entities per batch |
KNOT_CLEAN |
--clean |
false |
Force full re-index (delete all existing data) |
KNOT_CUSTOM_CA_CERTS |
--custom-ca-certs |
(none) | Path to CA certificate bundle for corporate SSL proxies |
KNOT_INCLUDE_CONFIG_FILES |
--include-config-files |
false |
Include YAML/JSON/properties/K8s/Helm files in the index |
RUST_LOG |
(env only) | info |
Log level: trace, debug, info, warn, error |
The built-in extraction queries (queries/java.scm, queries/typescript.scm) can be overridden without recompiling:
KNOT_CUSTOM_QUERIES_PATH=/path/to/my/queries ./target/release/knot-indexerPlace java.scm and/or typescript.scm in your custom directory. Missing files fall back to built-in defaults.
In restricted corporate environments with SSL-inspecting proxies, you may need to provide a custom CA certificate bundle so that knot can download the embedding model from HuggingFace.
Via environment variable:
export KNOT_CUSTOM_CA_CERTS=/etc/ssl/certs/corporate-bundle.pem
./target/release/knot-indexer --repo-path /path/to/repo --neo4j-password secretVia CLI flag:
./target/release/knot-indexer \
--custom-ca-certs /etc/ssl/certs/corporate-bundle.pem \
--repo-path /path/to/repo \
--neo4j-password secretVia .env file:
echo "KNOT_CUSTOM_CA_CERTS=/etc/ssl/certs/corporate-bundle.pem" >> ~/.config/knot/.env
./target/release/knot-indexerThis works for all three binaries: knot-indexer, knot-mcp, and knot.
Step 1: Index a Java project
./target/release/knot-indexer --repo-path /home/user/my-java-app --neo4j-password secretStep 2: Query via CLI (Instant search)
./target/release/knot search "authentication logic"
./target/release/knot callers "UserService.login"Step 3: Start MCP server (For AI Agents)
./target/release/knot-mcpStep 4: Use with Claude Desktop
- Claude will list the three tools in its Tools menu
- Ask: "Search for all authentication logic"
- Ask: "Find who calls the login method"
- Ask: "Explore the structure of UserService.java"
knot includes a universal .prompt file in its root directory that automatically configures modern AI coding agents (Cursor, Cline, opencode, Claude, etc.) to use the knot-mcp tools correctly.
The directive explicitly instructs AI agents to prioritize:
search_hybrid_context— for semantic code discovery (instead ofgrep)find_callers— for reverse dependency analysis (instead of finding references manually)explore_file— for file structure inspection (instead of reading line-by-line)
This ensures that when you ask an AI agent to analyze, refactor, or understand your code, it leverages the full power of the vector and graph databases rather than falling back to context-blind regex searches. The .prompt file is universal and tool-agnostic, working with any LLM client that reads codebase directives.
Contributions are welcome! Please ensure:
- All code passes
cargo clippy - Code is formatted with
cargo fmt - Changes are compatible with Rust 2024 edition
- All new functionality includes unit tests
- Performance regressions are validated with the benchmark framework before submitting PRs
The project includes a three-level benchmarking framework to validate optimizations and detect regressions:
Level 1 — Unit Benchmarks (Criterion):
cargo bench --bench pipeline_bench # Parse + prepare throughput per language
cargo bench --bench graph_upsert_bench # Neo4j UNWIND batching speedup (needs Neo4j)
cargo bench --bench channel_backpressure_bench # Bounded channel overheadLevel 2 — E2E Integration Benchmarks:
# Full pipeline metrics with memory and per-stage timing
./tests/benchmark_e2e.sh --focus rust_e2e --output-dir /tmp/perf_results
# Compare against baseline (fails CI if tolerance exceeded)
scripts/compare_perf_metrics.sh /tmp/perf_results .perf_metrics/baseline.jsonBaseline files: .perf_metrics/baseline.json stores the last known good metrics (committed, updated on main/master merges). Tolerance thresholds in .perf_metrics/threshold_tolerances.json control regression gates (±5% time, ±10% memory by default).
CI Integration: The test-performance job in .github/workflows/ci.yml runs after all E2E correctness tests pass, comparing results against baseline and fails the build on regression.
This project is licensed under the MIT License. See LICENSE for details.
For the full release history see CHANGELOG.md.
- Homogenize all E2E test suites to use the per-suite fixture directory architecture (
E2E_DATA_DIR/docker-compose.yml) already adopted byrun_cpp_e2e.sh, for better isolation in standalone mode and parallel-safe execution - Run the
test-unitgate also on push to master (currently only runs on tag push viarelease.yml) so unit-test regressions are caught at merge time, not at release time - Varnish VCL support
- Go support
- C# support
- IDE plugins (VS Code, IntelliJ, Vim)
- Language Server Protocol (LSP) integration
- Automated Code Review tool (MCP-based)
- CLI commands (opencode, claude, agy) to index repos
- Ruby support
For issues, feature requests, or discussions, please open a GitHub issue.

