Skip to content

AEndrix03/context-packet-manager

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

70 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 Context Packet Manager

Transform your documentation and codebases into intelligent, queryable knowledge bases for RAG applications

Python 3.11+ Code style: black Type checked: mypy

Features β€’ Quick Start β€’ Architecture β€’ Plugins β€’ Docs


πŸš€ What is CPM?

CPM (Context Packet Manager) is a modular Python framework that transforms documentation, codebases, and knowledge repositories into chunked, embedded, FAISS-indexed context packets optimized for Retrieval Augmented Generation.

Why CPM?

  • πŸ”Œ Plugin Architecture - Extend without modifying core code
  • 🧩 Language-Aware Chunking - 40+ languages with AST/Tree-sitter parsing
  • ⚑ Incremental Builds - Hash-based caching for blazing fast rebuilds
  • πŸ€– Claude Desktop Integration - Native MCP support for AI assistants
  • πŸ“¦ Package Management - Versioned packets with semantic versioning
  • 🎯 Zero Config - Intelligent defaults, works out of the box

✨ Key Features

πŸ”Œ Extensible Plugin System

Create custom commands, builders, and retrievers without touching core code. Plugins auto-discover from .cpm/plugins/ and integrate seamlessly with the CLI.

cpm plugin:list              # List loaded plugins
cpm my-plugin:custom-command # Your command, integrated

🧩 Intelligent Chunking for 40+ Languages

CPM automatically detects and applies the optimal chunking strategy for your content. Can't find the right chunker? Use --builder custom-builder to plug in your own.

Language Strategy Approach
Python AST-based Function/class boundaries
Java Structure-aware Method scope preservation
JavaScript/TypeScript Tree-sitter Syntax-aware parsing
Markdown Header-based Hierarchy preservation
40+ more Tree-sitter/Fallback Universal coverage

Fully extensible: Implement your own builder for custom logic.

⚑ Incremental Building

Rebuild only what changed. SHA-256 hash-based caching reuses existing embeddings:

# First build: 250 chunks
[embed] missing_vectors shape=(250, 768)

# Edit one file, rebuild
[cache] new_chunks=251 reused=250 to_embed=1 removed=0
[embed] missing_vectors shape=(1, 768)

πŸ€– Claude Desktop Integration

Native Model Context Protocol (MCP) support. Expose your context packets as tools directly in Claude Desktop:

{
  "mcpServers": {
    "cpm": {
      "command": "cpm",
      "args": [
        "mcp:serve"
      ]
    }
  }
}

Claude can now search your docs, code, and knowledge bases conversationally!

πŸ“¦ Package Management

Versioned packets with semantic versioning, pinning, and pruning:

cpm pkg list                      # List installed packets
cpm pkg use my-packet@1.2.0       # Pin specific version
cpm pkg prune my-packet --keep 2  # Keep 2 latest versions

πŸƒ Quick Start

Installation

# Clone repository
git clone https://github.com/AEndrix03/component-rag.git
cd component-rag

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install CPM
pip install -e .

# Install dev dependencies (optional)
pip install -e ".[dev]"  # black, ruff, mypy, pytest

Initialize Workspace

# Create .cpm/ workspace structure
cpm init

# Verify installation
cpm doctor

Configure OpenAI-Compatible Embeddings Adapter

Point CPM to an adapter exposing POST /v1/embeddings:

cpm embed add \
  --name adapter-local \
  --url http://127.0.0.1:8080 \
  --model text-embedding-3-small \
  --dims 768 \
  --set-default

Minimal .cpm/config/embeddings.yml example:

default: adapter-local
providers:
  - name: adapter-local
    type: http
    url: http://127.0.0.1:8080
    model: text-embedding-3-small
    dims: 768
    http:
      path: /v1/embeddings
    hints:
      normalize: true

Supported hint headers sent by CPM connector:

  • X-Embedding-Dim
  • X-Embedding-Normalize
  • X-Embedding-Task
  • X-Model-Hint

See cpm_builtin/embeddings/README.md for full adapter spec, Docker Compose examples, and troubleshooting.

Build Your First Packet

# Start embedding server (or use remote service)
# (See embedding server docs for setup)

# Build a context packet from your docs
cpm build \
  --source ./docs \
  --destination ./packets/my-docs \
  --model jinaai/jina-embeddings-v2-base-code \
  --version 1.0.0

Build Scenarios

# 1) Standard build (default builder)
cpm build \
  --source ./docs \
  --name my-docs \
  --version 1.0.0 \
  --model jinaai/jina-embeddings-v2-base-code \
  --embed-url http://127.0.0.1:8876
# 2) LLM builder (explicit embedding model)
cpm build \
  --source C:\path\to\repo \
  --builder llm:cpm-llm-builder \
  --name repo-packet \
  --version 0.0.1 \
  --model BAAI/bge-base-en-v1.5 \
  --embed-url http://127.0.0.1:8876
# 3) Rebuild same packet/version to regenerate vectors + FAISS
# (useful if a previous run produced chunks/cache but no vectors/index)
cpm build \
  --source C:\path\to\repo \
  --builder llm:cpm-llm-builder \
  --name repo-packet \
  --version 0.0.1 \
  --model BAAI/bge-base-en-v1.5 \
  --embed-url http://127.0.0.1:8876
# 4) Migrate to a different embedder/model using workspace default provider
# (embed URL is resolved from .cpm/config/embeddings.yml default provider)
cpm build \
  --source C:\path\to\repo \
  --builder llm:cpm-llm-builder \
  --name repo-packet \
  --version 0.0.1 \
  --model intfloat/multilingual-e5-base
# 5) Re-embed an existing packet directly from docs.jsonl chunks
# (builder is not required in this mode)
cpm build embed \
  --source ./dist/repo-packet/0.0.1 \
  --model intfloat/multilingual-e5-base

Notes:

  • --packet-version remains supported as a compatibility alias, but --version is preferred.
  • --source and --builder are still required for deterministic rebuilds (cpm build run) because chunk generation depends on builder behavior and source content.
  • cpm build embed starts from an already built packet (docs.jsonl required) and regenerates vectors.f16.bin, faiss/index.faiss, and manifest.json.

Output:

[scan] files_indexed=145 chunks_total=1250
[cache] enabled: cached_vectors=0 dim=768
[embed] missing_vectors shape=(1250, 768)
[faiss] ntotal=1250
[done] build ok

Query Your Packet

# Query for relevant context (auto-detects retriever from project config)
cpm query \
  --packet my-docs \
  --query "authentication setup" \
  -k 5

# Or specify a custom retriever
cpm query --packet my-docs --query "auth" --retriever custom-retriever

Use with Claude Desktop

  1. Configure Claude Desktop

    Edit ~/.config/Claude/claude_desktop_config.json (Linux) or equivalent:

    {
      "mcpServers": {
        "cpm": {
          "command": "/path/to/.venv/bin/cpm",
          "args": ["mcp:serve"],
          "env": {
            "RAG_CPM_DIR": "/path/to/workspace/.cpm"
          }
        }
      }
    }
  2. Restart Claude Desktop

  3. Use in conversation:

    You: "What packets are available?"
    Claude: [calls lookup tool] I can see 3 context packets...
    
    You: "Search my-docs for authentication examples"
    Claude: [calls query tool] Here are the relevant sections...
    

πŸ—οΈ Architecture

CPM follows a modular, plugin-based architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         CPM                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

         cpm_cli                     CLI Entry Point
            β”‚
            β”œβ”€ Command Resolution
            └─ Token Parsing
                    β”‚
                    β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚     cpm_core         β”‚   Foundation Layer
         β”‚                      β”‚
         β”‚  β€’ CPMApp            β”‚   Application Bootstrap
         β”‚  β€’ FeatureRegistry   β”‚   Command/Plugin Registry
         β”‚  β€’ PluginManager     β”‚   Plugin Discovery/Loading
         β”‚  β€’ Workspace         β”‚   .cpm/ Management
         β”‚  β€’ EventBus          β”‚   Lifecycle Hooks
         β”‚  β€’ ServiceContainer  β”‚   Dependency Injection
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚           β”‚           β”‚
        β–Ό           β–Ό           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Plugins β”‚ β”‚Builtins β”‚ β”‚  Build   β”‚
  β”‚         β”‚ β”‚         β”‚ β”‚  System  β”‚
  β”‚ β€’ MCP   β”‚ β”‚β€’ Init   β”‚ β”‚β€’ Chunker β”‚
  β”‚ β€’ ...   β”‚ β”‚β€’ Doctor β”‚ β”‚β€’ Embedderβ”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚β€’ FAISS   β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Context Packet   β”‚
                    β”‚                   β”‚
                    β”‚  β€’ docs.jsonl     β”‚
                    β”‚  β€’ vectors.f16    β”‚
                    β”‚  β€’ faiss/index    β”‚
                    β”‚  β€’ manifest.json  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Package Structure

component-rag/
β”œβ”€β”€ cpm_core/           πŸ—οΈ  Foundation layer (app, plugins, registry)
β”œβ”€β”€ cpm_cli/            πŸ–₯️  CLI routing and command resolution
β”œβ”€β”€ cpm_builtin/        🧰  Built-in features (chunking, embeddings, packages)
└── cpm_plugins/        πŸ”Œ  Official plugins (MCP, etc.)

πŸ“š See Architecture Docs for detailed component documentation


πŸ”Œ Plugin System

CPM is built for extensibility. Create custom commands without touching core code.

Create a Plugin in 3 Steps

1. Create plugin directory:

mkdir -p .cpm/plugins/my-plugin
cd .cpm/plugins/my-plugin

2. Create plugin.toml:

[plugin]
id = "my-plugin"
name = "My Custom Plugin"
version = "1.0.0"
entrypoint = "entrypoint:register_plugin"

3. Create entrypoint.py:

from cpm_core.api import CPMAbstractCommand, cpmcommand
from cpm_core.plugin import PluginContext


@cpmcommand(name="hello", group="my-plugin")
class HelloCommand(CPMAbstractCommand):
    """Say hello to the user."""

    def configure(self, parser):
        parser.add_argument("--name", default="World")

    def run(self, args):
        print(f"Hello, {args.name}!")
        return 0


def register_plugin(ctx: PluginContext):
    ctx.logger.info("My plugin loaded!")

4. Use your plugin:

cpm my-plugin:hello --name CPM
# Output: Hello, CPM!

πŸ“– Plugin Development Guide


🧩 Intelligent Chunking

CPM automatically detects and selects the optimal chunking strategy for your content. If the default doesn't fit, simply implement your own builder and pass --builder your-builder during build.

Supported Strategies

Chunker Languages Key Feature
python_ast Python Preserves function/class boundaries
java Java Maintains method scope
treesitter_generic JS, TS, Go, Rust, C/C++, and 35+ more Syntax tree parsing
markdown Markdown, reStructuredText Header hierarchy
text Plain text Token-budget with overlap
brace_fallback C-style languages Brace-based sectioning

Extensibility at Every Level

Builders: CPM intelligently selects builders based on project structure. Need custom logic?

cpm build --source ./docs --builder my-custom-builder

Retrievers: Auto-detected from project configuration, or explicitly specified:

cpm query --packet my-docs --query "search" --retriever my-custom-retriever

Hierarchical Chunking: Built-in support for multi-level chunking:

config = ChunkingConfig(
    hierarchical=True,
    chunk_tokens=800,  # Parent chunk size
    micro_chunk_tokens=220,  # Child chunk size
    emit_parent_chunks=False,  # Only index children
)

πŸ“– Chunking Documentation


πŸ€– MCP Integration

CPM includes a built-in Model Context Protocol plugin for seamless Claude Desktop integration.

MCP Tools

lookup - List Packets

{
  "name": "lookup",
  "description": "List available context packets",
  "inputSchema": {
    "type": "object",
    "properties": {
      "cpm_dir": {
        "type": "string",
        "optional": true
      }
    }
  }
}

query - Semantic Search

{
  "name": "query",
  "description": "Search context packets for relevant information",
  "inputSchema": {
    "type": "object",
    "properties": {
      "packet": {
        "type": "string",
        "required": true
      },
      "query": {
        "type": "string",
        "required": true
      },
      "k": {
        "type": "number",
        "default": 5
      }
    }
  }
}

Integration Example

// Claude Desktop config
{
  "mcpServers": {
    "cpm": {
      "command": "cpm",
      "args": ["mcp:serve"],
      "env": {
        "RAG_CPM_DIR": "/path/to/.cpm",
        "RAG_EMBED_URL": "http://127.0.0.1:8876"
      }
    }
  }
}

Conversation with Claude:

User: Search my python-stdlib packet for file I/O examples

Claude: [Calls query tool]
Here are the most relevant sections from python-stdlib:

1. **File Operations (score: 0.92)**
   "The `open()` function is the primary way to work with files..."

2. **Context Managers (score: 0.89)**
   "Using `with open()` ensures proper file closure..."

πŸ“– MCP Plugin Documentation


πŸ“¦ Built-in Commands

Command Description
cpm init Initialize CPM workspace
cpm doctor Validate workspace and diagnose issues
cpm build Build a context packet from source
cpm pkg list List installed packets
cpm pkg use <pkg@version> Pin a packet version
cpm pkg prune <pkg> Remove old packet versions
cpm plugin:list List loaded plugins
cpm plugin:doctor Diagnose plugin issues
cpm mcp:serve Start MCP server for Claude

πŸ“– Command Reference


βš™οΈ Configuration

Environment Variables

Variable Purpose Default
RAG_CPM_DIR Workspace root directory .cpm
RAG_EMBED_URL Embedding server URL http://127.0.0.1:8876
CPM_CONFIG Main config file path .cpm/config/cpm.toml
CPM_EMBEDDINGS Embeddings config path .cpm/config/embeddings.yml

Workspace Structure

.cpm/
β”œβ”€β”€ packages/           # Installed context packets
β”‚   └── <name>/
β”‚       └── <version>/
β”œβ”€β”€ config/             # Configuration files
β”‚   β”œβ”€β”€ cpm.toml        # Main configuration
β”‚   └── embeddings.yml  # Embedding providers
β”œβ”€β”€ plugins/            # Workspace plugins
β”œβ”€β”€ cache/              # Query result caches
β”œβ”€β”€ state/              # Runtime state (pins, active versions)
β”œβ”€β”€ logs/               # Application logs
└── pins/               # Version pin files

πŸ“š Documentation

CPM includes comprehensive documentation for every component:

πŸ“– Core Documentation

🧰 Built-in Features

πŸ”Œ Plugins

πŸ—ΊοΈ Navigation


πŸ› οΈ Development

Prerequisites

  • Python 3.11+
  • Virtual environment recommended

Setup Development Environment

# Clone and install
git clone https://github.com/AEndrix03/component-rag.git
cd component-rag
python -m venv .venv
source .venv/bin/activate

# Install with dev dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=cpm_core --cov=cpm_builtin --cov=cpm_cli

# Run specific test file
pytest tests/test_core.py

# Run with verbose output
pytest -v

Code Quality

# Format code
black .

# Lint
ruff check .

# Type check
mypy .

🀝 Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for new functionality
  4. Ensure code quality (black, ruff, mypy pass)
  5. Commit with clear messages (git commit -m 'Add amazing feature')
  6. Push to your fork (git push origin feature/amazing-feature)
  7. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guide
  • Use type hints for all functions
  • Write docstrings for public APIs
  • Add tests for bug fixes and new features
  • Update documentation for user-facing changes

πŸ“Š Performance

Build Performance

  • Scanning: ~5,000 files/second
  • Chunking: ~2,000 files/second (language-dependent)
  • Incremental builds: 90%+ cache hit rate for small edits

Query Performance

  • FAISS search: Sub-millisecond on 100k vectors
  • Scalability: Tested with 10M+ vector indices
  • Memory: ~4KB per vector (768-dim float32)

πŸ“„ License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Built with FAISS for efficient vector search
  • Uses Sentence Transformers for embeddings
  • Tree-sitter integration for multi-language parsing
  • FastMCP for Model Context Protocol support

⬆ Back to Top

Made with ❀️ for Everyone

OCI Packaging

CPM supports packaging packets for standard OCI registries (Harbor, GHCR, GitLab, Nexus OCI compatible).

  • Packet tag mapping: name@version -> <registry>/<project>/<name>:<version>
  • Immutable identity: always consume by digest (@sha256:...) after resolve
  • OCI staging layout includes:
    • packet.manifest.json
    • packet.lock.json (when present)
    • payload/ (cpm.yml, manifest.json, docs.jsonl, vectors.f16.bin, faiss/index.faiss)

Digest form example:

registry.local/project/demo@sha256:<digest>

OCI Install and Publish

Example publish/install/query flow with OCI registries:

# Publish a built packet directory
cpm publish --from-dir ./dist/demo/1.0.0 --registry registry.local/project

# Install from OCI by name@version
cpm install demo@1.0.0 --registry registry.local/project

# Query uses selected model from install lock when available
cpm query --packet demo --query "authentication setup" -k 5
# Publish/install without vectors (chunks + metadata only)
cpm publish --from-dir ./dist/demo/1.0.0 --registry registry.local/project --no-embed
cpm install demo@1.0.0 --registry registry.local/project --no-embed

# Then generate vectors locally with your preferred model/provider
cpm build embed --source ./.cpm/packages/demo/1.0.0 --model intfloat/multilingual-e5-base

For Harbor, use the project/repository form in --registry, for example:

harbor.local/my-project

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages