Implementation Guide: .cgrignore File Support
🎯 Goal
Add support for .cgrignore file (similar to .gitignore) to allow users to declaratively specify which files and directories should be excluded from indexing.
📋 Current State
What exists:
- ✅
exclude_patterns list in config (hardcoded or in JSON)
- ✅
pathspec library in requirements.txt (NOT USED)
- ✅
walk_source_files() function with hardcoded exclusions
What's missing:
- ❌
.cgrignore file support
- ❌ Glob pattern matching (only exact directory names)
- ❌ Per-project ignore files
- ❌ Negation patterns (
!important.py)
🔧 Implementation Plan
Step 1: Create .cgrignore Parser
File: ast_rag/utils/ignore_parser.py (new)
"""
.cgrignore file parser for AST-RAG.
Format: Same as .gitignore
- One pattern per line
- # for comments
- ! for negation
- ** for matching across directories
- * for wildcard matching
"""
import os
from pathlib import Path
from typing import Optional
import pathspec
class CgrIgnoreParser:
"""Parser for .cgrignore files."""
def __init__(self, root_path: str):
self.root_path = Path(root_path)
self.spec: Optional[pathspec.PathSpec] = None
self.patterns: list[str] = []
def load(self, ignore_file: Optional[str] = None) -> None:
"""
Load .cgrignore file.
Args:
ignore_file: Path to ignore file. If None, looks for .cgrignore in root.
"""
if ignore_file is None:
ignore_file = os.path.join(self.root_path, ".cgrignore")
if not os.path.exists(ignore_file):
# No ignore file, use defaults
self._load_defaults()
return
with open(ignore_file, "r") as f:
lines = f.readlines()
# Filter out comments and empty lines
patterns = []
for line in lines:
line = line.strip()
if line and not line.startswith("#"):
patterns.append(line)
self.patterns = patterns
self.spec = pathspec.PathSpec.from_lines("gitwildmatch", patterns)
def _load_defaults(self) -> None:
"""Load default ignore patterns."""
self.patterns = [
".git/",
"__pycache__/",
"node_modules/",
"target/",
"build/",
"dist/",
".gradle/",
".idea/",
".vscode/",
"venv/",
".venv/",
"*.pyc",
"*.pyo",
"*.class",
"*.o",
"*.so",
"*.dll",
]
self.spec = pathspec.PathSpec.from_lines("gitwildmatch", self.patterns)
def should_ignore(self, file_path: str) -> bool:
"""
Check if a file should be ignored.
Args:
file_path: Absolute or relative path to check
Returns:
True if file should be ignored
"""
if self.spec is None:
self._load_defaults()
# Make path relative to root
path = Path(file_path)
try:
rel_path = path.relative_to(self.root_path)
except ValueError:
# Path is not under root, don't ignore
return False
return self.spec.match_file(str(rel_path))
def get_patterns(self) -> list[str]:
"""Return loaded patterns."""
return self.patterns.copy()
Step 2: Update walk_source_files()
File: ast_rag/services/parsing/parser_manager.py
Modify walk_source_files() to use the ignore parser:
def walk_source_files(
root: str,
exclude_dirs: Optional[list[str]] = None,
ignore_file: Optional[str] = None,
) -> list[tuple[str, str]]:
"""
Recursively enumerate all source files under root.
Args:
root: Root directory to walk
exclude_dirs: Additional directories to exclude (legacy, kept for compatibility)
ignore_file: Path to .cgrignore file (default: root/.cgrignore)
Returns:
List of (absolute_file_path, language) tuples
"""
# Initialize ignore parser
ignore_parser = CgrIgnoreParser(root)
ignore_parser.load(ignore_file)
result: list[tuple[str, str]] = []
for dirpath, dirnames, filenames in os.walk(root):
# Filter directories in-place (skip ignored dirs)
dirnames[:] = [
d for d in dirnames
if not ignore_parser.should_ignore(os.path.join(dirpath, d))
and not d.startswith(".")
]
# Add exclude_dirs patterns (for backward compatibility)
if exclude_dirs:
dirnames[:] = [d for d in dirnames if d not in exclude_dirs]
for fname in filenames:
file_path = os.path.join(dirpath, fname)
# Check if file should be ignored
if ignore_parser.should_ignore(file_path):
continue
ext = Path(fname).suffix.lower()
lang = EXT_TO_LANG.get(ext)
if lang:
result.append((file_path, lang))
return result
Step 3: Update CLI to Accept Ignore File
File: ast_rag/cli.py
Add --ignore-file option:
@app.command("index")
def index_project(
root: str = typer.Argument(".", help="Root directory to index"),
# ... existing options
ignore_file: Optional[str] = typer.Option(
None,
"--ignore-file",
"-i",
help="Path to .cgrignore file (default: .cgrignore in root)",
),
) -> None:
"""Index a codebase."""
cfg = _load_config()
# Merge exclude_patterns from config with .cgrignore
files = walk_source_files(
root,
exclude_dirs=cfg.exclude_patterns,
ignore_file=ignore_file,
)
# ... rest of indexing
Step 4: Update Config Schema
File: ast_rag/dto/config.py
Add ignore_file to config:
class ProjectConfig(BaseModel):
# ... existing fields
ignore_file: Optional[str] = None # Path to .cgrignore
Step 5: Create Sample .cgrignore
File: .cgrignore.example (new in repo root)
# .cgrignore - Files and directories to exclude from AST-RAG indexing
# Format: Same as .gitignore
# Version control
.git/
.svn/
.hg/
# Build artifacts
build/
dist/
target/
*.o
*.so
*.dll
*.pyc
*.pyo
*.class
# Dependencies
node_modules/
vendor/
__pycache__/
.venv/
venv/
env/
# IDE
.idea/
.vscode/
*.swp
*.swo
*~
# Test fixtures (optional - uncomment if needed)
# test/fixtures/
# tests/data/
# Documentation (optional)
# docs/
# *.md
# Large data files
*.csv
*.json
*.parquet
Step 6: Documentation
File: docs/IGNORE_FILES.md (new)
# .cgrignore File Format
AST-RAG supports `.cgrignore` files to exclude files and directories from indexing.
## Location
Place a `.cgrignore` file in your project root. AST-RAG will automatically load it.
## Format
The format is identical to `.gitignore`:
Comment
*.pyc # Ignore all .pyc files
build/ # Ignore build directory
!important.py # But don't ignore important.py
/test/ # Ignore test directories anywhere
## Patterns
| Pattern | Meaning |
|---------|---------|
| `*.ext` | Ignore all files with extension .ext |
| `dir/` | Ignore directory dir |
| `!file` | Negation: don't ignore file |
| `**/dir` | Match dir in any directory |
| `dir/**` | Match everything under dir |
## Example
Ignore all test files
test/
tests/
*_test.py
But keep critical test config
!test/config.py
Ignore documentation
docs/
*.md
But keep README
!README.md
## Fallback
If no `.cgrignore` file exists, AST-RAG uses sensible defaults:
- `.git/`, `__pycache__/`, `node_modules/`, etc.
- Build artifacts: `build/`, `dist/`, `target/`
- IDE files: `.idea/`, `.vscode/`
🧪 Testing
def test_cgrignore():
# Create temp directory with .cgrignore
with tempfile.TemporaryDirectory() as tmpdir:
# Write .cgrignore
with open(os.path.join(tmpdir, ".cgrignore"), "w") as f:
f.write("*.pyc\nbuild/\n")
# Create test files
os.makedirs(os.path.join(tmpdir, "src"))
os.makedirs(os.path.join(tmpdir, "build"))
with open(os.path.join(tmpdir, "src", "main.py"), "w") as f:
f.write("print('hello')")
with open(os.path.join(tmpdir, "build", "out.pyc"), "w") as f:
f.write("binary")
# Test walk_source_files
files = walk_source_files(tmpdir)
file_paths = [f[0] for f in files]
assert any("main.py" in f for f in file_paths)
assert not any("out.pyc" in f for f in file_paths)
assert not any("build" in f for f in file_paths)
📁 Files to Create/Modify
Create:
ast_rag/utils/ignore_parser.py - New ignore parser class
.cgrignore.example - Example ignore file
docs/IGNORE_FILES.md - Documentation
Modify:
ast_rag/services/parsing/parser_manager.py - Update walk_source_files()
ast_rag/cli.py - Add --ignore-file option
ast_rag/dto/config.py - Add ignore_file field
requirements.txt - Ensure pathspec>=0.12 is present
⏱️ Estimated Time
- 2-3 hours for implementation
- 1 hour for testing
- 30 minutes for documentation
Labels: enhancement, cli, configuration
Priority: Low
Implementation Time: 3-4 hours
Implementation Guide: .cgrignore File Support
🎯 Goal
Add support for
.cgrignorefile (similar to.gitignore) to allow users to declaratively specify which files and directories should be excluded from indexing.📋 Current State
What exists:
exclude_patternslist in config (hardcoded or in JSON)pathspeclibrary inrequirements.txt(NOT USED)walk_source_files()function with hardcoded exclusionsWhat's missing:
.cgrignorefile support!important.py)🔧 Implementation Plan
Step 1: Create .cgrignore Parser
File:
ast_rag/utils/ignore_parser.py(new)Step 2: Update walk_source_files()
File:
ast_rag/services/parsing/parser_manager.pyModify
walk_source_files()to use the ignore parser:Step 3: Update CLI to Accept Ignore File
File:
ast_rag/cli.pyAdd
--ignore-fileoption:Step 4: Update Config Schema
File:
ast_rag/dto/config.pyAdd ignore_file to config:
Step 5: Create Sample .cgrignore
File:
.cgrignore.example(new in repo root)Step 6: Documentation
File:
docs/IGNORE_FILES.md(new)Comment
*.pyc # Ignore all .pyc files
build/ # Ignore build directory
!important.py # But don't ignore important.py
/test/ # Ignore test directories anywhere
Ignore all test files
test/
tests/
*_test.py
But keep critical test config
!test/config.py
Ignore documentation
docs/
*.md
But keep README
!README.md
🧪 Testing
📁 Files to Create/Modify
Create:
ast_rag/utils/ignore_parser.py- New ignore parser class.cgrignore.example- Example ignore filedocs/IGNORE_FILES.md- DocumentationModify:
ast_rag/services/parsing/parser_manager.py- Updatewalk_source_files()ast_rag/cli.py- Add--ignore-fileoptionast_rag/dto/config.py- Add ignore_file fieldrequirements.txt- Ensurepathspec>=0.12is present⏱️ Estimated Time
Labels:
enhancement,cli,configurationPriority: Low
Implementation Time: 3-4 hours