Thank you for considering contributing to CIDX! This guide will help you set up your development environment and understand our development workflow.
- Python 3.9 or higher
- Git
- VoyageAI API key (for testing semantic search features)
-
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/code-indexer.git cd code-indexer -
Initialize submodules
git submodule update --init --recursive
This pulls required dependencies:
third_party/hnswlib- HNSW vector index librarytest-fixtures/multimodal-mock-repo- E2E test fixtures (if present)
-
Install development dependencies
python3 -m pip install -e ".[dev]" --break-system-packagesThis installs CIDX in editable mode with all development dependencies including:
- pytest (testing framework)
- mypy (type checking)
- ruff (linting and formatting)
- pre-commit (git hooks)
-
Install pre-commit hooks (CRITICAL)
pre-commit install
This installs git hooks that automatically check your code before each commit. All contributors must install these hooks to ensure code quality.
All commits are automatically validated for:
- Linting: Ruff checks for code quality issues and auto-fixes many of them
- Formatting: Ruff-format ensures consistent code style
- Type Checking: Mypy validates type annotations on
src/code - Standard Checks: Trailing whitespace, EOF newlines, YAML syntax, etc.
What happens when you commit:
git add my_changes.py
git commit -m "Add feature"
# Pre-commit hooks run automatically
# If checks fail, files are auto-fixed when possible
# Re-stage and commit again:
git add my_changes.py
git commit -m "Add feature"Manual pre-commit execution:
# Run on all files (useful after pulling changes)
pre-commit run --all-files
# Run on staged files only
pre-commit runCIDX v8.0+ uses a container-free, filesystem-based architecture:
-
CLI Mode (Direct, Local)
- Direct command-line tool for local semantic code search
- Vectors stored in
.code-indexer/index/as JSON files - No daemon, no server, no network required
-
Daemon Mode (Local, Cached)
- Local RPyC-based background service for faster queries
- In-memory HNSW/FTS index caching
- Unix socket communication (
.code-indexer/daemon.sock)
- VoyageAI - Only supported embedding provider (voyage-3, voyage-3-large, voyage-code-3)
- FilesystemVectorStore - Container-free vector storage
- HNSW - Graph-based approximate nearest neighbor search
- Tantivy - Full-text search (FTS) with regex support
CIDX maintains zero linting errors:
- Ruff: 0 errors
- Mypy: 0 errors (on
src/code) - Ruff-format: All files formatted consistently
Run linting manually with ./lint.sh:
# Check and auto-fix linting issues
./lint.sh
# Or manually:
ruff check src/ tests/
ruff format src/ tests/
mypy src/- All functions in
src/should have type annotations - Use
from typing importfor type hints - Use
cast()when mypy needs help inferring types - Tests (
tests/) don't require full type annotations
- Follow PEP 8 (enforced by ruff)
- Use descriptive variable names
- Keep functions focused and small
- Document complex logic with comments
Follow this workflow during development:
1. Targeted unit tests (FAST - seconds)
|
v
2. Manual testing (verify feature works)
|
v
3. fast-automation.sh (FINAL GATE - must pass before done)
NEVER run fast-automation.sh after every small change. That wastes time.
Run specific tests related to your changes:
# Change base_client.py -> run related tests
pytest tests/unit/api_clients/test_base_*.py -v --tb=short
# Change handlers.py -> run handler tests
pytest tests/unit/server/mcp/test_handlers*.py -v --tb=short
# Run specific test function
pytest tests/unit/test_something.py::test_function_name -v
# Run tests matching pattern
pytest tests/ -k "test_scip" -vTargeted tests give fast feedback (seconds, not minutes).
Run only after ALL changes are complete:
# Full test suite (~6-7 minutes, 865+ tests)
./fast-automation.shPerformance Requirements:
- Must complete in under 10 minutes
- If exceeded, investigate with
pytest --durations=20 - Move inherently slow tests (>30s) to full-automation.sh
- Mark slow tests with
@pytest.mark.slow
| Suite | Tests | Time | When to Use |
|---|---|---|---|
| Targeted pytest | varies | seconds | During development |
| fast-automation.sh | 865+ | ~6-7 min | Final validation before commit |
| server-fast-automation.sh | varies | varies | Server-specific changes |
| full-automation.sh | all | 10+ min | Complete validation (ask user to run) |
- Use pytest for all tests
- Follow existing test patterns in the codebase
- Test files go in
tests/unit/,tests/integration/, ortests/e2e/ - Aim for >85% code coverage for new features
- Use real implementations where possible, minimize mocking
tests/
├── unit/ # Fast unit tests, no external dependencies
├── integration/ # Tests requiring multiple components
├── e2e/ # End-to-end workflow tests
│ ├── server/ # Server E2E tests
│ └── multimodal/ # Multimodal image vectorization tests
└── conftest.py # Shared fixtures
-
Create a feature branch
git checkout -b feature/your-feature-name
-
Make your changes
- Write code following quality standards
- Add/update tests as needed
- Update documentation if needed
-
Run targeted tests during development
pytest tests/unit/path/to/relevant_tests.py -v --tb=short
-
Run final validation
./fast-automation.sh
-
Commit your changes
git add . git commit -m "feat: description of change" # Pre-commit hooks run automatically
-
Push to your fork
git push origin feature/your-feature-name
-
Open a Pull Request
- Describe what you changed and why
- Reference any related issues
- Ensure CI checks pass
Use clear, descriptive commit messages:
feat: add semantic search caching
fix: resolve SCIP index corruption on Windows
docs: update installation guide for Python 3.12
refactor: simplify query parameter parsing
test: add coverage for temporal search edge cases
Prefixes:
feat:- New featurefix:- Bug fixdocs:- Documentation changesrefactor:- Code refactoringtest:- Test additions/changeschore:- Build/tooling changes
-
Ensure all checks pass
- Pre-commit hooks: pass
- Tests: pass (fast-automation.sh)
- Type checking: pass (mypy)
- Linting: pass (ruff)
-
Update documentation
- Update README.md if adding user-facing features
- Add docstrings to new functions/classes
- Update relevant guides in
docs/
-
Keep PRs focused
- One feature/fix per PR
- Split large changes into smaller PRs
- Avoid mixing refactoring with feature work
-
Respond to feedback
- Address reviewer comments
- Push additional commits to the same branch
- Request re-review when ready
When reviewing PRs:
- Check code quality and adherence to standards
- Verify tests cover new functionality
- Ensure documentation is updated
- Test locally if needed
- Be constructive and respectful
code-indexer/
├── src/code_indexer/ # Main source code
│ ├── __init__.py # Version definition
│ ├── cli.py # CLI entry point
│ ├── daemon/ # Daemon mode implementation
│ ├── indexing/ # Indexing pipeline
│ ├── scip/ # SCIP code intelligence
│ ├── server/ # Multi-user server
│ │ ├── mcp/ # MCP protocol handlers
│ │ ├── multi/ # Multi-repo search
│ │ └── routers/ # REST API routers
│ ├── services/ # Core services (VoyageAI, etc.)
│ └── storage/ # Vector storage (FilesystemVectorStore)
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
├── docs/ # Documentation
├── third_party/ # Git submodules
│ └── hnswlib/ # HNSW library
├── test-fixtures/ # Test fixture submodules
│ └── multimodal-mock-repo/ # Multimodal E2E test fixtures
├── fast-automation.sh # Fast test suite
├── full-automation.sh # Complete test suite
├── lint.sh # Linting script
└── CLAUDE.md # Development guidelines
| File | Purpose |
|---|---|
CLAUDE.md |
Comprehensive development guidelines and rules |
README.md |
User-facing documentation |
CHANGELOG.md |
Version history and release notes |
pyproject.toml |
Project configuration and dependencies |
When bumping version, update ALL of these files:
src/code_indexer/__init__.py- Primary source of truthREADME.md- Version badgeCHANGELOG.md- New version entrydocs/architecture.md- Version referencesdocs/query-guide.md- Version references
- Questions: Open a GitHub Discussion
- Bugs: Report via GitHub Issues
- Features: Suggest via GitHub Issues
- Development Guidelines: See
CLAUDE.mdfor comprehensive rules
By contributing, you agree that your contributions will be licensed under the MIT License.
Thank you for contributing to CIDX!