Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
368 changes: 7 additions & 361 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CLAUDE.md

Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was removed explaining that this file provides guidance to Claude Code. While the comment may have been considered redundant, removing this context makes it less clear what the purpose of CLAUDE.md is for developers who are not familiar with Claude Code integration.

Suggested change
This document provides guidance and context for Claude Code / Claude Desktop (and other AI coding assistants) when interacting with this repository.

Copilot uses AI. Check for mistakes.
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This document provides guidance and context for Claude Code / Claude Desktop (and other AI coding assistants) when interacting with this repository.

## Project Overview

Expand All @@ -10,366 +10,12 @@ Git log analysis tool that extracts monthly Git logs, parses commit history with
**Python:** 3.12+ (tested on 3.12, 3.13, 3.14)
**Package Manager:** uv (recommended) or pip

## Development Commands
## Code Guidelines

### Setup
### 주석 작성 원칙

```bash
# Install dependencies using uv (recommended)
uv sync --dev
주석은 필요할 때만 작성하며, 다음 경우에만 추가합니다:

# Or using pip
pip install mcp pytest ruff
```

### Testing

```bash
# Configure tests first
cp tests/config.py.example tests/config.py
# Edit tests/config.py to set RESULTS_DIR

# Run all tests
uv run pytest

# Run specific test file
pytest tests/test_query_engine.py

# Run with verbose output
pytest -v
```

### Linting

```bash
# Check code style
uv run ruff check .

# Auto-fix issues
uv run ruff check --fix .
```

### CLI Usage

The CLI supports three subcommands: `parse`, `search`, and `projects`.

**1. Generate monthly Git logs:**

```bash
python git_log_analysis/git_monthly_log_generator.py <repo_path> <start_year> <start_month>
```

**2. Parse logs to JSON:**

```bash
python cli.py parse -l <log_dir> -o <output_dir>

# Example
python cli.py parse -l data/project1_logs -o results/project1
```

**3. Search commits:**

```bash
# Basic keyword search
python cli.py search -r results -k "fix bug"

# Date range search
python cli.py search -r results -s 2024-01-01 -e 2024-12-31

# Filter by author
python cli.py search -r results -a "홍길동"

# Filter by project
python cli.py search -r results -p project1 -k "feature"

# Export to JSON
python cli.py search -r results -k "update" --format json -o results.json

# Export to CSV
python cli.py search -r results -a "김철수" --format csv -o report.csv

# Limit results
python cli.py search -r results -k "refactor" --limit 10

# Combined filters
python cli.py search -r results -k "authentication" -a "developer" -s 2024-01-01 --format json
```

**4. List projects:**

```bash
python cli.py projects -r results

# Example output:
# Available projects in results:
# - project1: 150 commits
# - project2: 89 commits
# - project3: 203 commits
```

**5. Run MCP server:**

```bash
# Set environment variable (Windows)
set GIT_LOG_RESULTS_DIR=D:\path\to\results

# Set environment variable (Unix/Linux/macOS)
export GIT_LOG_RESULTS_DIR=/path/to/results

# Run server
uv run python mcp_server.py
```

## Architecture

### Data Flow Pipeline

1. **Log Generation** (`git_monthly_log_generator.py`)
- Executes `git log` commands with date ranges
- Outputs raw log files to `<repo>/git_logs_by_month/`
- Format: `git_log_YYYY-MM.txt`
- Uses custom Git format with `===` delimiters

2. **Parsing Pipeline** (`cli.py` → `processors/` → `git_diff_parser.py`)
- `batch_processor.process_directory_logs()`: Orchestrates batch processing with `ProcessingStats` tracking
- `file_processor.parse_log_file()`: Handles individual files with encoding fallback
- `encoding_handler`: 6-level encoding fallback strategy (UTF-8 → CP949 → EUC-KR with strict/replace modes)
- `GitLogParser`: Parses commits and diffs into structured data
- Outputs: `*_summary.json` (metadata) and `*_changes.json` (diffs)

3. **MCP Search Service** (`mcp_server.py` + `mcp/`)
- `GitLogDataLoader`: Lazy-loads JSON files by project/date with file metadata caching
- `QueryEngine`: Two-stage filtering (month-level → day-level) with `QueryParams` validation
- `GitLogService`: High-level orchestration layer
- Tools: `search_commits` (searches message & file paths), `list_projects` (with commit counts)

### Key Design Patterns

**Two-Stage Filtering (QueryEngine + DataLoader)**

- Stage 1 (DataLoader): Coarse filtering by year-month to minimize file I/O
- Stage 2 (QueryEngine): Precise filtering by exact date/author/keyword
- This handles edge cases like searches spanning month boundaries
- `QueryParams` validates and normalizes inputs (lowercases keyword/author/project)

**Dual JSON Output**

- `*_summary.json`: Commit metadata + aggregated stats (for search)
- `*_changes.json`: Full diff details indexed by commit hash (for details)
- Separation optimizes for common search operations vs. detailed inspection

**Dataclass Hierarchy**

- `DiffLine` → `DiffChunk` → `FileChange` → `CommitData`
- Mirrors Git's conceptual model: lines in chunks, chunks in files, files in commits
- All dataclasses support JSON serialization

**Protocol-Based Type Safety**

- `SubParser` Protocol in `cli/types.py` for argparse type hints
- `FormatterProtocol` for output formatters (Text, JSON, CSV)
- Ensures type safety without concrete inheritance

### Module Responsibilities

- `git_diff_parser.py`: Core parsing logic for Git log format with diffs
- `git_monthly_log_generator.py`: Git command wrapper for monthly extraction
- `processors/batch_processor.py`: Orchestrates multi-file processing with `ProcessingStats`
- `processors/file_processor.py`: Single-file parsing and JSON serialization
- `processors/encoding_handler.py`: 6-level encoding fallback (UTF-8/CP949/EUC-KR with strict/replace/ignore)
- `cli/`: CLI subcommand implementations
- `parse_command.py`: Parse subcommand (wraps batch_processor)
- `search_command.py`: Search subcommand (uses GitLogService)
- `projects_command.py`: Projects subcommand (lists projects with commit counts)
- `formatters.py`: Output formatters (TextFormatter, JsonFormatter, CsvFormatter) using Protocol pattern
- `types.py`: Type definitions (`SubParser` Protocol for argparse)
- `mcp/data_loader.py`: Lazy file loading with project/date filtering and `FileInfo`/`Commit` dataclasses
- `mcp/query_engine.py`: Search logic with two-stage filtering, `QueryParams` validation, and date utilities
- `mcp/service.py`: High-level `GitLogService` interface (used by both MCP and CLI)
- `mcp/utils.py`: Date validation and parsing helpers (`is_valid_date_string`, `parse_date_to_naive`)
- `mcp_server.py`: MCP protocol implementation with two tools
- `cli.py`: CLI entry point with subcommand routing (parse/search/projects)

### Important Implementation Details

**Git Log Format**

- Uses custom format with `===` delimiters between commits
- Format: `commit:`, `Author:`, `Email:`, `Date:`, `Message:`, then diff blocks
- Generated by: `--pretty=format:===%ncommit:%H%nAuthor:%an%nEmail:%ae%nDate:%ad%nMessage:%s%n`
- Date format: `--date=iso` (yields `YYYY-MM-DD HH:MM:SS +ZZZZ`)

**Encoding Handling**

The `encoding_handler` implements a sophisticated 6-level fallback strategy for robust international character support:

1. `cp949` (strict) - Korean Windows primary encoding
2. `euc-kr` (strict) - Alternative Korean encoding
3. `cp949` (replace) - CP949 with character replacement
4. `euc-kr` (replace) - EUC-KR with character replacement
5. `latin1` (strict) - Fallback for Western characters
6. `utf-8` (ignore) - Final fallback ignoring errors

**Implementation:** `encoding_handler.parse_commits_with_encoding_fallback()`
- Creates temporary UTF-8 file when fallback succeeds
- Handles Korean Windows environments and mixed-encoding files

**Date Format**

- Git log: ISO format with timezone (`%Y-%m-%d %H:%M:%S +0900`)
- Search params: `YYYY-MM-DD`
- File naming: `YYYY-MM`
- QueryParams validates dates and parses to naive datetime

**Query Normalization**

`QueryParams` performs post-init validation:
- Converts `keyword`, `author`, `project` to lowercase for case-insensitive search
- Validates date strings with `is_valid_date_string()`
- Parses dates to naive datetime with `parse_date_to_naive()`

**MCP Tools**

- `search_commits`: Searches in commit messages AND file paths
- `list_projects`: Returns project names with commit counts and date ranges

## Configuration Files

### pyproject.toml

```toml
[project]
name = "git-log-analysis"
version = "0.2.0"
requires-python = ">=3.12"
dependencies = ["mcp>=1.17.0", "ruff>=0.14.1"]

[dependency-groups]
dev = ["pytest>=8.4.2"]
```

### .claude/settings.local.json

Claude Code-specific settings:
```json
{
"permissions": {
"Bash(tree:*)": "allow"
}
}
```

Enables `tree` command for directory visualization.

### tests/config.py

Runtime configuration for tests:
```python
RESULTS_DIR = "./results" # Multi-project root directory
```

Must be created from `tests/config.py.example` before running tests.

### .vscode/settings.json

VSCode workspace settings for Python development.

## Testing Notes

### Test Infrastructure

**Pytest Fixtures (`conftest.py`):**
- `temp_test_data_dir`: Session-scoped temporary directory with auto-generated test JSON files
- `test_loader`: GitLogDataLoader fixture for 3 test projects (project_a, project_b, project_c)
- `test_query_engine`: QueryEngine fixture with pre-loaded test data

**Test Coverage:**
- `test_cli_commands.py`: CLI command integration tests
- `test_data_loader.py`: Data loader functionality and multi-project support
- `test_formatter.py`: Output formatters (Text, JSON, CSV)
- `test_query_engine.py`: Search functionality (keyword, date, author, project filters)

### Running Tests

- Tests require `tests/config.py` with `RESULTS_DIR` pointing to actual parsed data
- Test files use dataclasses for clear assertions
- CI runs on Python 3.12, 3.13, 3.14 with uv-based workflow (`.github/workflows/pr_test.yml`)

### CI/CD

GitHub Actions workflow:
```yaml
- Python versions: 3.12, 3.13, 3.14
- Package manager: uv
- Checks: ruff check . && pytest
```

## Project Structure

```
git-log-analysis/
├── .claude/ # Claude Code settings
│ └── settings.local.json # Permissions config
├── .github/workflows/ # CI/CD
│ └── pr_test.yml # Test workflow
├── .vscode/ # VSCode settings
├── git_log_analysis/ # Main package
│ ├── cli/ # CLI commands
│ │ ├── formatters.py # Output formatters (Protocol-based)
│ │ ├── parse_command.py # Parse subcommand
│ │ ├── projects_command.py # Projects subcommand
│ │ ├── search_command.py # Search subcommand
│ │ └── types.py # SubParser Protocol
│ ├── mcp/ # MCP server components
│ │ ├── data_loader.py # Lazy loading with caching
│ │ ├── query_engine.py # Two-stage filtering
│ │ ├── service.py # GitLogService orchestration
│ │ └── utils.py # Date utilities
│ ├── processors/ # Log processing pipeline
│ │ ├── batch_processor.py # Multi-file orchestration
│ │ ├── encoding_handler.py # 6-level encoding fallback
│ │ └── file_processor.py # Single-file parsing
│ ├── git_diff_parser.py # Core parsing logic
│ └── git_monthly_log_generator.py # Log extraction
├── tests/ # Test suite
│ ├── conftest.py # Pytest fixtures
│ ├── config.py # Test configuration
│ ├── config.py.example # Config template
│ ├── test_cli_commands.py
│ ├── test_data_loader.py
│ ├── test_formatter.py
│ └── test_query_engine.py
├── cli.py # CLI entry point
├── mcp_server.py # MCP server entry point
├── pyproject.toml # Project metadata
├── CLAUDE.md # This file
└── README.md # User documentation (Korean)
```

## Development Workflow

1. **Adding Features:**
- Update relevant module in `git_log_analysis/`
- Add tests in `tests/`
- Run `ruff check --fix .`
- Run `pytest`
- Update this CLAUDE.md if architecture changes

2. **Fixing Bugs:**
- Write failing test first
- Fix the bug
- Ensure all tests pass
- Update documentation if behavior changes

3. **Refactoring:**
- Ensure tests pass before refactoring
- Make incremental changes
- Run tests after each change
- Update type hints and docstrings

4. **Releasing:**
- Update version in `pyproject.toml`
- Update CHANGELOG (if exists)
- Tag release: `git tag v0.X.Y`
- Push: `git push --tags`
- **Public API**: 외부에서 사용하는 함수, 클래스, 메서드의 docstring
- **코드 외적 이해 필요**: 비즈니스 로직, 알고리즘 의도, 제약사항 등 코드만으로 이해하기 어려운 부분
- **불필요한 주석**: 코드 자체로 명확한 구현 세부사항, 자명한 로직
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,16 @@ python cli.py -l data/project_git_logs_by_month -o results/project

2. `list_projects`: 프로젝트 목록 및 커밋 수 조회

3. `get_commit_changes`: 특정 커밋의 상세 변경 내역 조회
- `commit_hash`: 커밋 해시 (전체 또는 최소 7자 이상)
- `project`: 프로젝트 이름 (선택, 검색 속도 향상)

**Claude Desktop에서 사용 예시**:

- "Add 키워드로 커밋 검색해줘"
- "2024년 1월 작업 내역 알려줘"
- "abc1234 커밋의 변경사항 보여줘"
- "최근 커밋 중 하나 선택해서 상세 변경 내역 알려줘"

## 테스트

Expand Down
Loading