Developer Guide

This comprehensive guide explains how to extend CodeHem, use its APIs, and develop language plugins.

See also for end users of the Python library: docs/codehem-sdk-user.md.

Core API Reference
Plugin Development
Architecture Overview
Testing Guidelines
Performance Considerations

Core API Reference

CodeHem Class

The main entry point for all CodeHem operations.

Initialization

from codehem import CodeHem

# Initialize with specific language
hem = CodeHem("python")          # Python support
hem = CodeHem("typescript")      # TypeScript/JavaScript support

# Auto-detect language from code
code = "def example(): pass"
hem = CodeHem.from_raw_code(code)

# From file path (auto-detects language)
hem = CodeHem.from_file("/path/to/file.py")

Core Methods

`extract(code: str) -> CodeElementsResult`

Extracts all code elements from the provided code.

result = hem.extract(code)

# Access extracted elements
for cls in result.classes:
    print(f"Class: {cls.name} at line {cls.range.start_line}")
    for method in cls.methods:
        print(f"  Method: {method.name}")

for func in result.functions:
    print(f"Function: {func.name}")

for import_stmt in result.imports:
    print(f"Import: {import_stmt.name}")

`get_text_by_xpath(code: str, xpath: str, return_hash: bool = False)`

Retrieves text content for a specific XPath query.

# Simple text retrieval
text = hem.get_text_by_xpath(code, "MyClass.method_name[method]")

# With hash for conflict detection
text, hash_value = hem.get_text_by_xpath(code, xpath, return_hash=True)

`apply_patch(xpath: str, new_code: str, mode: str, original_hash: str = None, dry_run: bool = False, return_format: str = "text")`

Applies modifications to code at the specified XPath location.

Parameters:

xpath: XPath query targeting the element to modify
new_code: The new code content
mode: Modification mode ("replace", "append", "prepend")
original_hash: Hash of the original content for conflict detection
dry_run: If True, returns a preview without applying changes
return_format: Response format ("text" or "json")

Return Values:

Text format:

modified_code = hem.apply_patch(xpath, new_code, "replace")

JSON format:

result = hem.apply_patch(xpath, new_code, "replace", return_format="json")
# Returns: {
#   "status": "ok",
#   "lines_added": 3,
#   "lines_removed": 1,
#   "modified_code": "...",
#   "new_hash": "abc123..."
# }

Builder Methods

`new_function(code: str, name: str, args: List[str], body: List[str], **kwargs)`

Creates a new function and inserts it into the code.

result = hem.new_function(
    code=existing_code,
    name="calculate_sum",
    args=["numbers"],
    body=[
        "total = 0",
        "for num in numbers:",
        "    total += num", 
        "return total"
    ],
    decorators=["@staticmethod"],
    return_format="json"
)

`new_class(code: str, name: str, methods: List[Dict], **kwargs)`

Creates a new class with specified methods.

result = hem.new_class(
    code=existing_code,
    name="Calculator",
    methods=[
        {
            "name": "__init__",
            "args": ["self"],
            "body": ["pass"]
        },
        {
            "name": "add",
            "args": ["self", "a", "b"],
            "body": ["return a + b"]
        }
    ],
    return_format="json"
)

`new_method(code: str, parent: str, name: str, args: List[str], body: List[str], **kwargs)`

Adds a new method to an existing class.

result = hem.new_method(
    code=existing_code,
    parent="Calculator",
    name="multiply",
    args=["self", "a", "b"],
    body=["return a * b"],
    decorators=["@property"],
    return_format="json"
)

Workspace Class

For repository-level operations on multiple files.

Initialization

from codehem import CodeHem

# Open workspace (indexes all supported files)
workspace = CodeHem.open_workspace("/path/to/repository")

Methods

`find(name: str = None, kind: str = None, file_pattern: str = None) -> List[Tuple[str, str]]`

Searches for elements across the entire workspace.

# Find all methods named 'calculate'
matches = workspace.find(name="calculate", kind="method")
for file_path, xpath in matches:
    print(f"Found in {file_path}: {xpath}")

# Find all classes
classes = workspace.find(kind="class")

# Find in specific files
python_functions = workspace.find(kind="function", file_pattern="*.py")

`apply_patch(file_path: str, xpath: str, new_code: str, **kwargs)`

Applies a patch to a specific file in the workspace.

result = workspace.apply_patch(
    file_path="src/calculator.py",
    xpath="Calculator.add[method]",
    new_code="def add(self, a, b):\n    return a + b + 1",
    mode="replace",
    return_format="json"
)

XPath Query Syntax

CodeHem uses XPath-like queries to target code elements:

Basic Patterns

# Classes
"ClassName[class]"                    # Target class
"MyClass[class]"                      # Specific class named MyClass

# Methods
"ClassName.method_name[method]"       # Method in class
"Calculator.add[method]"              # Specific method

# Functions
"function_name[function]"             # Standalone function
"helper_function[function]"           # Specific function

# Properties
"ClassName.property_name[property]"   # Property in class
"User.name[property]"                # Specific property

# Static properties
"ClassName.CONSTANT[static_property]" # Static/class variable

# Imports (special case)
"imports[imports]"                    # All imports section

Advanced Patterns

# TypeScript interfaces
"InterfaceName[interface]"           # Interface declaration
"User[interface]"                    # Specific interface

# Type aliases
"TypeName[type_alias]"               # Type alias
"Result[type_alias]"                 # Specific type alias

# Enums
"EnumName[enum]"                     # Enum declaration
"Status[enum]"                       # Specific enum

# Namespaces (TypeScript)
"NamespaceName[namespace]"           # Namespace
"Utils[namespace]"                   # Specific namespace

Plugin Development

Overview

CodeHem uses a plugin architecture where each language is implemented as a separate plugin. Plugins register via Python entry-points and provide language-specific implementations.

Plugin Structure

Each language plugin follows this structure:

codehem_lang_<language>/
├── __init__.py
├── service.py              # Main language service
├── detector.py             # File type detection
├── components/             # Component implementations
│   ├── __init__.py
│   ├── parser.py          # Code parsing
│   ├── navigator.py       # AST navigation
│   ├── extractor.py       # Element extraction
│   └── post_processor.py  # Post-processing
├── formatting/            # Code formatting
│   ├── __init__.py
│   └── formatter.py
├── manipulator/          # Code manipulation
│   ├── __init__.py
│   └── handlers.py
├── extractors/           # Element extractors
│   ├── __init__.py
│   └── specific_extractors.py
└── node_patterns.json    # AST pattern definitions

Creating a Plugin

1. Use the Cookiecutter Template

pipx cookiecutter gh:codehem/codehem-lang-template

This creates a skeleton plugin with all necessary components.

2. Define Entry Points

In your plugin's setup.py or pyproject.toml:

[project.entry-points."codehem.languages"]
java = "codehem_lang_java:JavaLanguageService"

3. Implement Core Components

Language Service

from codehem.core.registry import language_service
from codehem.core.components.interfaces import ILanguageService

@language_service("java")
class JavaLanguageService(ILanguageService):
    def __init__(self):
        self.file_extensions = ['.java']
        self.language_name = "java"
    
    def detect_language(self, file_path: str, content: str = None) -> bool:
        """Detect if file/content is Java."""
        return file_path.endswith('.java')
    
    def get_parser(self):
        """Return tree-sitter parser for Java."""
        # Implementation specific to Java
        pass

Component Implementations

Implement the core interfaces:

ICodeParser - Parse code into AST
ISyntaxTreeNavigator - Navigate and query AST
IElementExtractor - Extract code elements
IPostProcessor - Transform raw data to CodeElement objects

Formatter

from codehem.core.formatting.formatter import BraceFormatter

class JavaFormatter(BraceFormatter):
    """Java-specific code formatting."""
    
    def __init__(self):
        super().__init__(
            indent_size=4,
            use_tabs=False,
            brace_style="allman"  # or "k&r"
        )
    
    def format_class(self, class_name: str, content: str) -> str:
        """Format Java class declaration."""
        return f"public class {class_name} {{\n{content}\n}}"

Base Classes

CodeHem provides base classes to minimize implementation work:

BaseElementExtractor

from codehem.core.components.base_implementations import BaseElementExtractor

class JavaElementExtractor(BaseElementExtractor):
    def __init__(self, navigator):
        super().__init__('java', navigator)
    
    def extract_classes(self, tree, code_bytes):
        """Extract Java class declarations."""
        query = """
        (class_declaration
          name: (identifier) @class_name
          body: (class_body) @body) @class_decl
        """
        return self._extract_with_query(tree, code_bytes, query)

BraceFormatter vs IndentFormatter

Choose the appropriate base formatter:

# For languages with braces (Java, C++, JavaScript)
from codehem.core.formatting.formatter import BraceFormatter

class JavaFormatter(BraceFormatter):
    pass

# For languages with indentation (Python, YAML)
from codehem.core.formatting.formatter import IndentFormatter

class PythonFormatter(IndentFormatter):
    pass

Testing Plugins

Test Structure

tests/
├── test_<language>_basic.py
├── test_<language>_extraction.py
├── test_<language>_manipulation.py
└── fixtures/
    └── <language>/
        ├── simple_class.txt
        ├── complex_example.txt
        └── edge_cases.txt

Test Templates

import pytest
from codehem import CodeHem

class TestJavaExtraction:
    def setup_method(self):
        self.hem = CodeHem("java")
    
    def test_extract_java_class(self):
        code = """
        public class Example {
            private int value;
            
            public int getValue() {
                return value;
            }
        }
        """
        
        result = self.hem.extract(code)
        assert len(result.classes) == 1
        assert result.classes[0].name == "Example"
        assert len(result.classes[0].methods) == 1

Architecture Overview

Component Relationships

CodeHem (Main API)
├── LanguageService (Plugin Entry Point)
├── Components/
│   ├── Parser (AST Creation)
│   ├── Navigator (AST Traversal)
│   ├── Extractor (Element Identification)
│   └── PostProcessor (Data Transformation)
├── Formatter (Code Formatting)
└── Manipulator (Code Modification)

Data Flow

Detection: Language detector identifies file type
Parsing: Parser creates AST from source code
Navigation: Navigator executes tree-sitter queries
Extraction: Extractor identifies code elements
Post-processing: PostProcessor transforms to CodeElement objects
Manipulation: Manipulator applies modifications
Formatting: Formatter ensures consistent style

Error Handling

CodeHem provides comprehensive error handling:

from codehem.core.error_handling import CodeHemError, retry_with_backoff

try:
    result = hem.apply_patch(xpath, new_code, mode="replace")
except CodeHemError as e:
    # Handle CodeHem-specific errors
    logger.error(f"CodeHem error: {e}")
except Exception as e:
    # Handle unexpected errors
    logger.error(f"Unexpected error: {e}")

Testing Guidelines

Test Categories

Unit Tests - Test individual components
Integration Tests - Test component interactions
Language Tests - Test language-specific functionality
End-to-End Tests - Test complete workflows

Running Tests

# All tests
pytest -xv

# Specific language
pytest tests/python/ -v
pytest tests/typescript/ -v

# With coverage
pytest --cov=codehem --cov-report=html

# Specific test pattern
pytest -k "test_extract" -v

Test Fixtures

Store test code in fixture files:

tests/fixtures/python/general/
├── simple_class.txt
├── complex_class.txt
├── multiple_functions.txt
└── edge_cases.txt

Load fixtures in tests:

from tests.helpers.fixture_loader import load_fixture

def test_complex_extraction():
    code = load_fixture('python/general/complex_class.txt')
    result = hem.extract(code)
    # Assertions...

Performance Considerations

Caching Strategy

CodeHem implements multiple caching layers:

AST Cache - Parsed trees cached by code hash
Query Cache - Tree-sitter query results cached
Element Cache - Extracted elements cached

Memory Management

# For large codebases, use workspace indexing
workspace = CodeHem.open_workspace("/large/repo")

# Avoid repeatedly parsing the same files
hem = CodeHem("python")
for file_path in files:
    if not hem.is_cached(file_path):
        # Only parse if not cached
        hem.extract_from_file(file_path)

Profiling

Profile extraction performance:

import cProfile
from codehem import CodeHem

def profile_extraction():
    hem = CodeHem("python")
    with open("large_file.py") as f:
        code = f.read()
    
    result = hem.extract(code)
    return result

cProfile.run('profile_extraction()', 'extraction_profile.prof')

Diagrams

Architecture and plugin relations are illustrated in docs/architecture.puml. PlantUML renders the diagrams for the GitHub Pages site.

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass: pytest -xv
Submit a pull request

For plugin development, start with the cookiecutter template and follow the plugin structure guidelines above.

FilesExpand file tree

Developer-Guide.md

Latest commit

History

Developer-Guide.md

File metadata and controls

Developer Guide

Table of Contents

Core API Reference

CodeHem Class

Initialization

Core Methods

extract(code: str) -> CodeElementsResult

get_text_by_xpath(code: str, xpath: str, return_hash: bool = False)

apply_patch(xpath: str, new_code: str, mode: str, original_hash: str = None, dry_run: bool = False, return_format: str = "text")

Builder Methods

new_function(code: str, name: str, args: List[str], body: List[str], **kwargs)

new_class(code: str, name: str, methods: List[Dict], **kwargs)

new_method(code: str, parent: str, name: str, args: List[str], body: List[str], **kwargs)

Workspace Class

Initialization

Methods

find(name: str = None, kind: str = None, file_pattern: str = None) -> List[Tuple[str, str]]

apply_patch(file_path: str, xpath: str, new_code: str, **kwargs)

XPath Query Syntax

Basic Patterns

Advanced Patterns

Plugin Development

Overview

Plugin Structure

Creating a Plugin

1. Use the Cookiecutter Template

2. Define Entry Points

3. Implement Core Components

Language Service

Component Implementations

Formatter

Base Classes

BaseElementExtractor

BraceFormatter vs IndentFormatter

Testing Plugins

Test Structure

Test Templates

Architecture Overview

Component Relationships

Data Flow

Error Handling

Testing Guidelines

Test Categories

Running Tests

Test Fixtures

Performance Considerations

Caching Strategy

Memory Management

Profiling

Diagrams

Contributing

`extract(code: str) -> CodeElementsResult`

`get_text_by_xpath(code: str, xpath: str, return_hash: bool = False)`

`apply_patch(xpath: str, new_code: str, mode: str, original_hash: str = None, dry_run: bool = False, return_format: str = "text")`

`new_function(code: str, name: str, args: List[str], body: List[str], **kwargs)`

`new_class(code: str, name: str, methods: List[Dict], **kwargs)`

`new_method(code: str, parent: str, name: str, args: List[str], body: List[str], **kwargs)`

`find(name: str = None, kind: str = None, file_pattern: str = None) -> List[Tuple[str, str]]`

`apply_patch(file_path: str, xpath: str, new_code: str, **kwargs)`