Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,39 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [0.9.1] - 2026-01-28

### Fixed

- **Severity serialization**: Use enum `.name` (string like "HIGH") instead of `.value` (numeric) across all writers (json, markdown, xml) and rendering adapters
- **Security issue masking**: Respect `mask_output_content` config flag in standalone writers (json, markdown, xml) to suppress security issue details when enabled
- **PEP 604 isinstance compatibility**: Fix `isinstance(v, str | int | float | bool)` to use tuple form `isinstance(v, (str, int, float, bool))` in rendering_adapters.py for Python 3.9 compatibility
- **Signal handler thread safety**: Guard `signal.signal()` call in `SignalHandler.install()` to only run from the main thread, preventing `ValueError` in worker threads
- **Config validation**: Allow `source_url` and `diff` configs without requiring `target_path`, fixing early validation error for remote/diff workflows
- **Docstring accuracy**: Fix `process_codebase()` docstring that incorrectly claimed `CancelledException` is raised (it returns `None` on cancellation)
- **Parse result reconstruction**: Properly reconstruct `ParseResult` dataclass from dict when deserializing multiprocess worker results in unified_pipeline.py

### Changed

- **Default output filename convention**: Output files now use `ccc_{folder_name}_{mmddyy}.{ext}` pattern (e.g., `ccc_myproject_012826.md`) instead of the old `{folder_name}_ccc.{format}` pattern. Format names are mapped to proper file extensions (`markdown` → `.md`, `text` → `.txt`). Date stamp is included for easy versioning.

### Added

- **Graceful Interrupt Handling (Ctrl+C)**: Full cooperative cancellation support
- First Ctrl+C triggers graceful cancellation with progress preservation
- Second Ctrl+C within 2 seconds forces immediate exit
- Thread-safe `CancellationToken` for cooperative task cancellation
- `SignalHandler` class with context manager support
- Cancellation checks throughout the processing pipeline
- New module: `codeconcat/utils/cancellation.py`

- **Unified Progress Dashboard**: Flicker-free Rich Live panel display
- Single persistent dashboard showing all 4 processing stages (Collecting → Parsing → Annotating → Writing)
- Visual progress bars with percentage and item counts
- Stage status icons: ○ pending, ● in progress, ✓ completed, ✗ failed
- Elapsed time tracking per stage and total
- TTY detection with automatic fallback to `SimpleProgress` for non-interactive environments
- Refresh rate limiting (10 Hz) to reduce CPU usage and flicker
- New module: `codeconcat/cli/progress.py`

- **5 New AI Providers for Code Summarization**:
- **Google Gemini**: Native SDK integration via `google-genai`
- Supports Gemini 2.5 Pro, Gemini 2.0 Flash, Gemini 1.5 Flash
Expand Down Expand Up @@ -44,6 +75,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed

- **Reconstruction Parsing Hardening**: Improved markdown section parsing (supports paths with spaces), robust fenced code extraction, and diff-only block handling.
- Added strict parsing mode by default (with optional lenient repairs) for JSON/XML inputs
- XML reconstruction now prefers `defusedxml` when available for safer parsing

- **Swift Parser Partial Results Merging**: Tree-sitter partial parse results now merge with regex parser
- When tree-sitter encounters unsupported syntax (e.g., Swift 5.10+ `nonisolated(unsafe)`), it now includes partial results for merging instead of discarding them
- Fallback regex parsers always run when tree-sitter has errors, ensuring modern language features are captured
Expand All @@ -62,6 +97,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Early termination was preventing result merging from occurring
- Tests now properly verify merging behavior

- **Verbose Debug Logging During Annotation Failures**: Removed debug code that dumped entire `ParsedFileData` objects (including full file contents) to stderr when annotation exceptions occurred
- Previously, `repr(file)` was logged which could output megabytes of content for large files
- Now logs only the file path and exception message

- **Declaration Attribute Access in Writers**: Fixed `'dict' object has no attribute 'kind'` errors
- Declarations may be stored as either `Declaration` objects or dict representations
- Added `_get_decl_attr()` helper function across all writer modules for defensive attribute access
- Affected files: `annotator.py`, `markdown_writer.py`, `json_writer.py`, `xml_writer.py`, `rendering_adapters.py`

- **Security Issue Attribute Access in Writers**: Fixed `'dict' object has no attribute 'severity'` errors
- Security issues may be stored as either `SecurityIssue` objects or dict representations
- Added `_get_issue_attr()` helper function across all writer modules for defensive attribute access
- Handles both enum values (with `.value`/`.name`) and string severity values
- Affected files: `markdown_writer.py`, `json_writer.py`, `xml_writer.py`, `rendering_adapters.py`

- **Parallel Processing Dataclass Reconstruction**: Fixed `'dict' object has no attribute 'kind'` error in summarization processor when processing large codebases (50+ files)
- Root cause: `dataclasses.asdict()` in parallel processing worker converted nested `Declaration`, `TokenStats`, `SecurityIssue`, and `DiffMetadata` objects to plain dictionaries
- When `ParsedFileData(**result_dict)` reconstructed the object, nested dataclasses remained as dicts instead of being converted back to their proper types
- Added `_reconstruct_parsed_file_data()` and `_reconstruct_declaration()` helper functions in `unified_pipeline.py` to properly reconstruct all nested dataclass objects
- Handles recursive `Declaration.children` reconstruction and `modifiers` set/list conversion
- Affected file: `codeconcat/parser/unified_pipeline.py`

### Performance

- **Parser Early Termination Threshold**: Increased from 1 to 5 declarations
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ codeconcat run --security --semgrep --compress --output secure-report.json
- **REST API** - FastAPI-based server for programmatic access
- **Modern CLI** - Typer-powered interface with shell completion and rich help
- **Smart Caching** - TTL-based cache management for repeated operations
- **Graceful Interrupts** - Ctrl+C handling with double-press force quit support
- **Unified Progress Display** - Flicker-free Rich Live dashboard with stage tracking

## Language Support

Expand Down Expand Up @@ -510,7 +512,7 @@ Process files and generate AI-optimized output.

| Option | Short | Description |
|--------|-------|-------------|
| `--output` | `-o` | Output file path (auto-detected from format if omitted) |
| `--output` | `-o` | Output file path (default: `ccc_{folder}_{mmddyy}.{ext}`) |
| `--format` | `-f` | Output format: `markdown`, `json`, `xml`, `text` |
| `--preset` | `-p` | Configuration preset: `lean`, `medium`, `full` |

Expand Down Expand Up @@ -685,6 +687,7 @@ Reconstruct source files from CodeConCat output with security validation.
|--------|-------|-------------|
| `--output-dir` | `-o` | Directory for files (default: ./reconstructed) |
| `--format` | `-f` | Input format (auto-detected if not specified) |
| `--strict` / `--lenient` | | Strict parsing (default: `--strict`) or lenient repair mode |
| `--force` | | Overwrite existing files |
| `--dry-run` | | Preview without creating files |
| `--verbose` | `-v` | Show detailed progress |
Expand Down Expand Up @@ -1069,6 +1072,7 @@ codeconcat reconstruct output.md --force
**Security Features:**
- Path traversal protection prevents `../../../etc/passwd` attacks
- All file writes validated against target directory boundary
- XML parsing uses `defusedxml` for XXE-safe reconstruction
- Supports Markdown, XML, and JSON formats

### Differential Outputs
Expand Down Expand Up @@ -1416,7 +1420,7 @@ For detailed technical documentation of all fixes, see **[PARSER_FIXES_SUMMARY.m

See [CHANGELOG.md](./CHANGELOG.md) for complete version history and release notes.

**Current Version:** 0.9.0
**Current Version:** 0.9.1

### Troubleshooting

Expand Down
2 changes: 1 addition & 1 deletion codeconcat/base_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,7 +714,7 @@ def get(self, key: str, default=None):
merge_docs: bool = False
doc_extensions: list[str] = Field(default_factory=lambda: [".md", ".rst", ".txt", ".rmd"])
custom_extension_map: dict[str, str] = Field(default_factory=dict)
output: str = "code_concat_output.md"
output: str = ""
format: str = "markdown"
max_workers: int = 4
disable_tree: bool = False
Expand Down
9 changes: 9 additions & 0 deletions codeconcat/cli/commands/reconstruct.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@ def reconstruct_command(
rich_help_panel="Input Options",
),
] = None,
strict: Annotated[
bool,
typer.Option(
"--strict/--lenient",
help="Use strict parsing (disable JSON/XML repair heuristics)",
rich_help_panel="Input Options",
),
] = True,
force: Annotated[
bool,
typer.Option(
Expand Down Expand Up @@ -131,6 +139,7 @@ def reconstruct_command(
str(output_dir),
format_type=input_format,
verbose=verbose,
strict=strict,
)

progress.update(task, completed=100)
Expand Down
79 changes: 56 additions & 23 deletions codeconcat/cli/commands/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,21 @@

import typer
from rich.panel import Panel
from rich.progress import (
BarColumn,
Progress,
SpinnerColumn,
TaskProgressColumn,
TextColumn,
TimeRemainingColumn,
)
from rich.table import Table

from codeconcat.config.config_builder import ConfigBuilder
from codeconcat.errors import CodeConcatError
from codeconcat.main import _write_output_files, run_codeconcat
from codeconcat.utils.cancellation import (
CancelledException,
get_cancellation_token,
setup_signal_handler,
)
from codeconcat.validation.security_reporter import init_reporter
from codeconcat.validation.unsupported_reporter import init_reporter as init_unsupported_reporter

from ..config import get_state
from ..progress import create_progress
from ..utils import (
console,
is_github_url_or_shorthand,
Expand Down Expand Up @@ -809,24 +807,57 @@ def run_command(
process_source = config.source_url if config.source_url else config.target_path
console.print(f"\n[bold cyan]Processing files from:[/bold cyan] {process_source}\n")

with Progress(
SpinnerColumn(spinner_name="dots", style="cyan"),
TextColumn("[bold blue]{task.description}"),
BarColumn(bar_width=40, style="cyan", complete_style="green"),
TaskProgressColumn(),
TimeRemainingColumn(),
# Setup cancellation token and signal handler for graceful Ctrl+C
cancel_token = get_cancellation_token()
progress_display = create_progress(
console=console,
disable=disable_progress or state.quiet,
refresh_per_second=4,
) as progress:
task = progress.add_task("[cyan]Processing files...", total=None)
quiet=state.quiet,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The --no-progress (disable_progress) CLI flag is not actually disabling the new progress dashboard: create_progress is called with quiet=state.quiet only, so when the user requests no progress but is not in quiet mode, a SimpleProgress instance still prints progress updates. This is a behavior regression and violates the option's documented intent to disable progress output. Make sure the quiet argument also incorporates disable_progress so that no progress messages are shown when that flag is set. [logic error]

Severity Level: Critical 🚨
- ❌ CLI prints progress despite --no-progress flag.
- ⚠️ Confuses users scripting CLI expecting quiet output.
- ⚠️ Breaks redirection/automation that relies on no progress lines.
Suggested change
quiet=state.quiet,
quiet=state.quiet or disable_progress,
Steps of Reproduction ✅
1. Invoke the CLI with the --no-progress flag (runs run_command). The create_progress call
happens at codeconcat/cli/commands/run.py:812 where run.py passes quiet=state.quiet and
force_simple=disable_progress to create_progress().

2. create_progress implementation is at codeconcat/cli/progress.py:def
create_progress(...) (lines 419-443). It checks `if quiet:` first (line 436) and returns
SimpleProgress(console, quiet=True) if quiet is True.

3. If quiet is False but force_simple is True (line 440), create_progress returns
SimpleProgress(console, quiet=False). SimpleProgress.__init__ at
codeconcat/cli/progress.py:349 sets self.quiet = quiet (lines 349-352) and its
start/update/complete methods print unless self.quiet is True (examples at lines 367-374,
381-389, 391-396).

4. Because run.py passes quiet=state.quiet and force_simple=disable_progress, running
`codeconcat run --no-progress` (disable_progress=True) while not in quiet mode
(state.quiet is False, the common default) causes create_progress to take the force_simple
branch and construct SimpleProgress(console, quiet=False). That SimpleProgress will print
progress lines (see SimpleProgress.start_stage at lines 367-374 and update_progress at
381-389), violating the user's expected --no-progress behavior.

5. Reproduce locally: run the CLI without --quiet but with --no-progress, observe progress
printing during processing. Confirm call sites in run.py:812 (create_progress call) and
progress.py:440 (force_simple branch) using Grep/Read outputs above.

6. Conclusion: the existing pattern is actionable (not a hypothetical edge) and will
reproduce consistently whenever users set --no-progress while not setting quiet.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** codeconcat/cli/commands/run.py
**Line:** 814:814
**Comment:**
	*Logic Error: The `--no-progress` (`disable_progress`) CLI flag is not actually disabling the new progress dashboard: `create_progress` is called with `quiet=state.quiet` only, so when the user requests no progress but is not in quiet mode, a `SimpleProgress` instance still prints progress updates. This is a behavior regression and violates the option's documented intent to disable progress output. Make sure the `quiet` argument also incorporates `disable_progress` so that no progress messages are shown when that flag is set.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

force_simple=disable_progress,
)

# Track if we cancelled gracefully for proper exit messaging
was_cancelled = False

with progress_display as dashboard:
# Setup signal handler - keep callback minimal for signal safety
def on_cancel():
nonlocal was_cancelled
was_cancelled = True
# Don't do Rich UI work in signal handler - defer to main flow

signal_handler = setup_signal_handler(
token=cancel_token,
on_cancel=on_cancel,
quiet=state.quiet,
)

try:
output_content = run_codeconcat(config)
progress.update(task, completed=100)
output_content = run_codeconcat(
config,
progress_callback=dashboard,
cancel_token=cancel_token,
)
# Check if cancelled during execution (returns None on cancel)
if output_content is None and cancel_token.is_cancelled():
was_cancelled = True
except CancelledException:
was_cancelled = True
output_content = None
except CodeConcatError as e:
if hasattr(dashboard, "fail_stage"):
dashboard.fail_stage(str(e))
print_error(f"Processing failed: {e}")
raise typer.Exit(1) from e
finally:
signal_handler.uninstall()
# Update dashboard after signal handler is uninstalled (safe context)
if was_cancelled and hasattr(dashboard, "skip_remaining"):
dashboard.skip_remaining("cancelled")

# Handle cancellation exit
if was_cancelled:
print_warning("Operation cancelled by user")
raise typer.Exit(130)

# Write output
if output_content:
Expand Down Expand Up @@ -880,14 +911,16 @@ def run_command(
stats_table.add_row("Total lines", f"{stats.get('total_lines', 0):,}")
stats_table.add_row("Total bytes", f"{stats.get('total_bytes', 0):,}")

if hasattr(config, "files_processed"):
stats_table.add_row("Files processed", str(len(config.target_path)))

console.print("\n", stats_table)
else:
print_warning("No output generated")

except KeyboardInterrupt:
# This can still trigger if signal handler wasn't installed yet
print_warning("Operation cancelled by user")
raise typer.Exit(130) from None
except CancelledException:
# Graceful cancellation via token
print_warning("Operation cancelled by user")
raise typer.Exit(130) from None
except Exception as e:
Expand Down
Loading