CustomKB transforms documents into AI-powered, searchable knowledgebases using vector embeddings, FAISS indexing, and LLM integration. It supports multiple AI providers (OpenAI, Anthropic, Google, xAI, Ollama) and delivers contextually relevant answers from your data.
- How It Works
- Key Features
- Prerequisites
- Installation
- Quick Start
- Core Commands
- Configuration
- Architecture
- Advanced Features
- Security
- Performance Optimization
- MCP Server Integration
- Troubleshooting
- FAQ
- Contributing
- License
CustomKB follows a three-stage pipeline:
1. Document Processing (customkb database)
├─ Text extraction (Markdown, HTML, PDF, code, plain text)
├─ Language detection (27+ languages)
├─ Intelligent chunking (configurable token ranges, context-aware)
└─ Metadata extraction (filenames, categories, timestamps)
2. Embedding Generation (customkb embed)
├─ Vector embeddings via OpenAI, Google, or local models
├─ Batch processing with checkpoints and retry logic
├─ FAISS index creation (auto-selects Flat/IVF/HNSW by dataset size)
└─ Two-tier caching (memory LRU + disk)
3. Semantic Query (customkb query)
├─ Query embedding + optional enhancement (spelling, synonyms)
├─ Vector similarity search (k-NN via FAISS)
├─ Optional hybrid search (vector + BM25 keyword matching)
├─ Optional cross-encoder reranking for precision
├─ Context assembly from top results
└─ LLM response generation with retrieved context
| Category | Features |
|---|---|
| Search | Semantic search by meaning; hybrid vector + BM25 keyword search; cross-encoder reranking (+20-40% accuracy) |
| AI Providers | OpenAI, Anthropic, Google, xAI, and local models via Ollama |
| Documents | Markdown, HTML, PDF, code files, plain text; 27+ languages with auto-detection |
| Performance | Auto-tuning memory tiers (4GB–128GB+); GPU acceleration (CUDA 11/12); concurrent batch processing; two-tier LRU cache |
| Security | Zero pickle (JSON-only serialization); SQL injection protection; path traversal prevention; parameterized queries; no API keys in logs |
| Reliability | Checkpoint saving; exponential backoff with retry; graceful error handling; atomic DB operations |
- Python 3.12+
- SQLite 3.45+ (included with Python)
- RAM 4GB+ (8GB+ recommended)
- GPU (optional): NVIDIA with CUDA 11 or 12
- API Keys: For chosen AI providers (OpenAI, Anthropic, Google, xAI) — or use local Ollama models for zero API dependency
git clone https://github.com/Open-Technology-Foundation/customkb.git
cd customkb
# Install uv if not already available
# See: https://docs.astral.sh/uv/getting-started/installation/
# Install dependencies (auto-creates .venv)
uv sync --extra faiss-gpu-cu12 --extra mcp --extra test
# CPU-only (no NVIDIA GPU):
# uv sync --extra faiss-cpu --extra mcp --extra test
# Or auto-detect FAISS variant:
# ./setup/install_faiss.shsudo ./setup/nltk_setup.py download cleanup# System-wide (requires sudo)
sudo mkdir -p /var/lib/vectordbs && sudo chown $USER:$USER /var/lib/vectordbs
export VECTORDBS="/var/lib/vectordbs"
# Or user-local (no sudo)
mkdir -p "$HOME/knowledgebases"
export VECTORDBS="$HOME/knowledgebases"Add to your shell profile (~/.bashrc):
export VECTORDBS="/var/lib/vectordbs" # or $HOME/knowledgebasesexport OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key" # Optional
export XAI_API_KEY="your-xai-key" # Optional# 1. Create knowledgebase directory
mkdir -p "$VECTORDBS/myproject"
# 2. Create configuration
cat > "$VECTORDBS/myproject/myproject.cfg" << 'EOF'
[DEFAULT]
vector_model = text-embedding-3-small
query_model = gpt-4o-mini
db_min_tokens = 200
db_max_tokens = 400
EOF
# 3. Process documents
customkb database myproject docs/*.md *.txt
# 4. Generate embeddings
customkb embed myproject
# 5. Query
customkb query myproject "What are the main features?"customkb database <kb_name> [files...] [options]| Option | Description |
|---|---|
-l, --language |
Stopwords language (en, fr, de, etc.) |
--detect-language |
Auto-detect language per file |
-f, --force |
Reprocess existing files |
-v / -q |
Verbose / quiet output |
customkb embed <kb_name> [options]| Option | Description |
|---|---|
-r, --reset-database |
Reset embedding status and regenerate all |
-v / -q |
Verbose / quiet output |
customkb query <kb_name> "<question>" [options]| Option | Description |
|---|---|
-c, --context-only |
Return context without AI response |
-m, --model |
AI model to use |
-k, --top-k |
Number of results (default: 50) |
-s, --context-scope |
Context segments per result |
-t, --temperature |
Response creativity (0.0–2.0) |
-f, --format |
Output format: xml, json, markdown, plain |
-p, --prompt-template |
Style: default, instructive, scholarly, concise, analytical, conversational, technical |
--categories |
Filter by categories (comma-separated) |
--context-files |
Additional context files to include |
customkb categorize <kb_name> [options]| Option | Description |
|---|---|
-S, --sample N |
Process only N sample articles |
-f, --full |
Process all articles |
--fresh |
Ignore checkpoint, reprocess all |
--import |
Import categories to database |
--list |
List existing categories and counts |
-m, --model |
AI model (default: claude-haiku-4-5) |
-s, --sampling T-M-B |
Chunk sampling (e.g., 5-10-5) |
-c, --confidence-threshold |
Minimum confidence (default: 0.5) |
-D, --no-dedup |
Disable category deduplication |
--dedup-threshold |
Similarity threshold for dedup (default: 85.0) |
| Command | Description |
|---|---|
customkb edit <kb_name> |
Open KB config in $EDITOR |
customkb optimize <kb_name> |
Auto-optimize performance settings |
customkb optimize --analyze |
Show system recommendations |
customkb verify-indexes <kb_name> |
Check database index health |
customkb bm25 <kb_name> |
Build BM25 hybrid search index |
customkb convert-encoding <files> |
Convert files to UTF-8 |
customkb version |
Show version information |
CustomKB uses INI-style configuration with environment variable overrides.
Priority order: Environment variables > .cfg file > defaults.
[DEFAULT]
# Models
vector_model = text-embedding-3-small
query_model = gpt-4o-mini
# Text Processing
db_min_tokens = 200 # Minimum chunk size
db_max_tokens = 400 # Maximum chunk size
# Query Settings
query_max_tokens = 4096 # Max tokens in LLM response
query_top_k = 30 # Chunks to retrieve
query_temperature = 0.1 # 0=precise, 2=creative
query_role = You are a helpful expert assistant.
# Output
reference_format = json # xml, json, markdown, plain
query_prompt_template = technical
[ALGORITHMS]
similarity_threshold = 0.6 # Minimum similarity (0–1)
enable_hybrid_search = true # Vector + keyword search
bm25_weight = 0.5 # BM25 weight in hybrid mode
bm25_max_results = 1000
enable_reranking = true
reranking_model = cross-encoder/ms-marco-MiniLM-L-6-v2
reranking_top_k = 30
[PERFORMANCE]
embedding_batch_size = 100
cache_thread_pool_size = 4
memory_cache_size = 10000
checkpoint_interval = 10
[API]
api_call_delay_seconds = 0.05
api_max_concurrency = 8
api_max_retries = 20db_min_tokens/db_max_tokens: Smaller chunks = more precise retrieval; larger = more context per resultsimilarity_threshold: Lower (0.5) for broader recall, higher (0.7) for strict relevanceenable_hybrid_search: Recommended for technical documentationquery_temperature: 0.0–0.3 for factual answers, 0.7–1.0 for creative responses
customkb # Bash wrapper (activates .venv, calls customkb.py)
customkb.py # CLI entry point, match/case command dispatch
├── config/
│ ├── config_manager.py # KnowledgeBase class, config loading, name resolution
│ └── models.py # Pydantic config models with env var overrides
├── categorize/
│ ├── categorize_manager.py # LLM-based document categorization
│ ├── category_deduplicator.py # Fuzzy deduplication via rapidfuzz
│ └── import_to_db.py # Category import pipeline to SQLite
├── database/
│ ├── db_manager.py # Document processing pipeline
│ ├── connection.py # SQLite connection with WAL + PRAGMA tuning
│ ├── chunking.py # Text chunking with overlap (langchain splitters)
│ ├── index_manager.py # Database index creation and verification
│ └── migrations.py # Schema migrations with version tracking
├── embedding/
│ ├── embed_manager.py # Embedding orchestration, FAISS index management
│ ├── providers.py # Embedding provider abstraction (OpenAI, Google, local)
│ ├── litellm_provider.py # LiteLLM unified embedding interface
│ ├── bm25_manager.py # BM25 index for hybrid search
│ ├── rerank_manager.py # Cross-encoder reranking with score caching
│ ├── cache.py # Thread-safe two-tier cache (memory LRU + disk)
│ ├── batch.py # Batch progress tracking with ETA
│ └── index.py # FAISS index type auto-selection
├── query/
│ ├── query_manager.py # Re-export hub for all query submodules
│ ├── processing.py # Query orchestration pipeline
│ ├── search.py # FAISS vector search, hybrid search, result assembly
│ ├── response.py # LLM response generation (multi-provider)
│ ├── llm.py # Unified LLM interface via LiteLLM
│ ├── embedding.py # Query embedding generation and caching
│ ├── enhancement.py # Spelling correction, synonym expansion
│ ├── formatters.py # Output formatting (XML, JSON, Markdown, plain)
│ └── prompt_templates.py # Response style templates
├── models/
│ └── model_manager.py # Model registry from Models.json (aliases, providers)
├── mcp_server/
│ └── server.py # MCP server for Claude Code integration
└── utils/
├── security_utils.py # Input validation, path sanitization, safe SQL
├── text_utils.py # Text cleaning, entity preservation
├── logging_config.py # Centralized KB-specific logging
├── optimization_manager.py # Memory tier auto-optimization
├── performance_analyzer.py # System profiling and recommendations
├── exceptions.py # Custom exception hierarchy
├── faiss_loader.py # FAISS loading with GPU/CPU fallback
├── gpu_utils.py # GPU detection and memory management
├── resource_manager.py # Thread pool and resource lifecycle
├── language_detector.py # Language detection with caching
├── encoding_converter.py # File encoding conversion to UTF-8
├── enums.py # Shared enumerations
└── context_managers.py # Database and resource context managers
- Database: Files → text extraction → language-aware chunking → SQLite storage with metadata
- Embed: Chunks → vector embeddings (via LiteLLM) → FAISS index (auto-selected type) → two-tier cache
- Query: Question → embedding → FAISS search → optional BM25 hybrid → optional reranking → context assembly → LLM response
$VECTORDBS/<kb_name>/
├── <kb_name>.cfg # Configuration (required)
├── <kb_name>.db # SQLite database (chunks, metadata, categories)
├── <kb_name>.faiss # FAISS vector index
└── <kb_name>.bm25 # BM25 index (optional, for hybrid search)
- Lazy imports: Heavy modules loaded only when their command is invoked (fast CLI startup)
- Config-driven: All thresholds, batch sizes, and behaviors configurable via
.cfgwith env var overrides - Provider-agnostic: LiteLLM abstraction layer for embeddings and LLM calls
- Safe by default: JSON serialization only, parameterized SQL, validated file paths
| Model | Provider | Dimensions | Notes |
|---|---|---|---|
text-embedding-3-small |
OpenAI | 1536 | Cost-effective default |
text-embedding-3-large |
OpenAI | 3072 | Best quality |
text-embedding-ada-002 |
OpenAI | 1536 | Legacy |
gemini-embedding-001 |
768/1536/3072 | 30k token context, Matryoshka support |
| Provider | Models |
|---|---|
| OpenAI | GPT-5.x, GPT-4.1, GPT-4o, o3, o4-mini |
| Anthropic | Claude Opus 4.5, Sonnet 4.5, Haiku 4.5, Opus 4.1 |
| Gemini 2.5 Pro/Flash, Gemini 1.5 Pro | |
| xAI | Grok 4, Grok 4-fast |
| Ollama | Llama 3.3, Gemma 3, DeepSeek R1, Qwen 2.5, Mistral, Phi-4 |
Model aliases are resolved via Models.json. Run customkb query <kb> --model <alias> to use any supported model.
customkb query myproject "question" --prompt-template <template>Available: default, instructive, scholarly, concise, analytical, conversational, technical
customkb query myproject "search" --format json # Structured API output
customkb query myproject "search" --format xml # XML with references
customkb query myproject "search" --format markdown # Documentation-friendly
customkb query myproject "search" --format plain # Plain textAll of these resolve to the same knowledgebase:
customkb query myproject "test"
customkb query myproject.cfg "test"
customkb query $VECTORDBS/myproject "test"
customkb query $VECTORDBS/myproject/myproject.cfg "test"- Safe serialization: JSON-only format for all caches, checkpoints, and indexes (zero pickle)
- SQL injection prevention: Parameterized queries and table name validation throughout
- Path traversal protection: All file paths validated before access
- API key security: Environment variable storage, never logged or exposed in errors
- Atomic operations: Database writes with rollback support
- Input validation: All user-provided parameters sanitized at system boundaries
- Store API keys in environment variables, never in config files
- Restrict
$VECTORDBSdirectory permissions to the application user - Run behind authentication proxy for network-exposed deployments
- Use local Ollama models for maximum data privacy
customkb optimize myproject # Apply optimizations
customkb optimize --analyze # Show recommendations
customkb optimize myproject --dry-run # Preview changesCustomKB auto-configures based on available system memory:
| Memory | Tier | Batch Size | Cache Size | Hybrid Search |
|---|---|---|---|---|
| <16GB | Low | 50 | 5,000 | Disabled |
| 16–64GB | Medium | 100 | 10,000 | Available |
| 64–128GB | High | 200 | 20,000 | Enabled |
| >128GB | Very High | 300 | 50,000 | Enabled |
CUDA-enabled NVIDIA GPUs accelerate cross-encoder reranking (20-40% faster) and FAISS index operations:
uv sync --extra faiss-gpu-cu12 # Install GPU FAISS
./scripts/benchmark_gpu.py # Benchmark GPU vs CPUcustomkb verify-indexes myproject # Check index health
customkb bm25 myproject # Build hybrid search indexCustomKB includes a Model Context Protocol server for integration with Claude Code and other MCP-compatible tools:
uv sync --extra mcp # Install MCP dependenciesThe MCP server exposes knowledgebase search as tools, allowing AI assistants to query your knowledgebases directly. See mcp_server/server.py for configuration.
"Knowledgebase not found"
ls -la $VECTORDBS/ # Verify KB exists
ls -la $VECTORDBS/myproject/myproject.cfg # Check config file"API rate limit exceeded" — Increase delay in config:
api_call_delay_seconds = 0.1
api_max_concurrency = 4"Out of memory during embedding"
customkb optimize myproject # Auto-adjust for your system
# Or manually: embedding_batch_size = 50Low similarity scores — Adjust search parameters:
similarity_threshold = 0.5
enable_hybrid_search = true
# Or use stronger model: vector_model = text-embedding-3-largecustomkb query myproject "test" -v -d # Verbose + debug
./scripts/diagnose_crashes.py myproject # Run diagnostics| Script | Purpose |
|---|---|
scripts/benchmark_gpu.py |
GPU vs CPU performance comparison |
scripts/benchmark_vectordb.py |
Vector database benchmarking |
scripts/show_optimization_tiers.py |
Display memory tier settings |
scripts/emergency_optimize.py |
Conservative recovery settings |
scripts/clean_corrupted_cache.py |
Clean corrupted cache files |
scripts/rebuild_bm25_filtered.py |
Rebuild BM25 indexes with filters |
scripts/upgrade_bm25_tokens.py |
Upgrade database for BM25 tokens |
scripts/diagnose_crashes.py |
Analyze crash logs and system state |
scripts/security-check.sh |
Run security validation checks |
uv sync --extra test # Install test dependencies
python run_tests.py # All tests
python run_tests.py --unit # Unit tests only
python run_tests.py --integration # Integration tests only
python run_tests.py --safe --memory-limit 2048 # With safety limits
python run_tests.py --coverage # Generate coverage reportTest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow, @pytest.mark.requires_api, @pytest.mark.performance, @pytest.mark.requires_data, @pytest.mark.resource_intensive
Can I use CustomKB without API keys? Yes. Use local Ollama models for both embeddings and queries. No external API calls required.
How much does embedding cost?
OpenAI text-embedding-3-small: ~$0.02/1M tokens. A 500-page manual (~250k tokens) costs ~$0.005. Ollama is free.
Is my data private? Documents stay local. Only text chunks are sent to API providers during embedding/query. For maximum privacy, use Ollama.
Can I use multiple embedding models in one KB?
No. Each KB uses one embedding model. To switch, create a new KB or regenerate with --reset-database.
How do I update a KB when documents change?
customkb database myproject docs/*.md --force
customkb embed myprojectMaximum KB size? Tested to 10M+ chunks (~4GB database). FAISS scales to billions of vectors.
| Variable | Purpose |
|---|---|
VECTORDBS |
Knowledgebase base directory (default: /var/lib/vectordbs) |
OPENAI_API_KEY |
OpenAI API key |
ANTHROPIC_API_KEY |
Anthropic API key |
GOOGLE_API_KEY |
Google/Gemini API key |
XAI_API_KEY |
xAI API key |
NLTK_DATA |
NLTK data location (optional) |
- Fork and clone the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Set up dev environment:
uv sync --extra test --extra dev - Make changes, add tests, run
python run_tests.py - Submit a pull request
See docs/DEVELOPMENT.md for detailed development guidelines.
GPL-3.0 License — see LICENSE file.
Copyright © 2025-2026 Indonesian Open Technology Foundation
Maintained by the Indonesian Open Technology Foundation