feat: progress reporting, cancellation, pipeline improvements, and test cleanup#42
Conversation
- test_diagnose_verify: Accept exit code 1 when optional Tree-sitter grammars (crystal, wat) are unavailable in CI environment - test_diagnose_verify_shows_parser_status: Same fix for exit code - test_long_path_attack: Use path > 4096 bytes to properly trigger the path length security check - test_command_injection_fix: Handle semgrep config errors gracefully (YAML parsing issues in CI don't indicate command injection)
- Remove verbose debug logging that dumped entire ParsedFileData objects during annotation failures (was outputting megabytes of content) - Add _get_decl_attr() helper across all writers to handle both Declaration objects and dict representations - Add _get_issue_attr() helper across all writers to handle both SecurityIssue objects and dict representations - Handle both enum values (.value/.name) and string severity values Fixes "'dict' object has no attribute 'kind'" and "'dict' object has no attribute 'severity'" errors during output generation. Affected files: main.py, annotator.py, markdown_writer.py, json_writer.py, xml_writer.py, rendering_adapters.py
…ovements Add rich progress bars for CLI operations, graceful cancellation via signal handling, improved parser pipeline merging, and enhanced reconstruction module. Includes new tests for pipeline merging.
- Remove unsupported asyncio_default_fixture_loop_scope config option - Rename TestParser to SampleParser to avoid pytest collection warning - Replace return True with implicit None in test_python_parser_fix - Add warning filters for RuntimeWarning, ResourceWarning, and PytestUnraisableExceptionWarning from third-party async mocks
|
CodeAnt AI is reviewing your PR. |
Nitpicks 🔍
|
| logger.info("Collecting input files...") | ||
| if progress_callback: | ||
| progress_callback.start_stage("Collecting", message="Scanning files...") | ||
| files_to_process: list[ParsedFileData] = [] |
There was a problem hiding this comment.
Suggestion: The configuration validation in run_codeconcat unconditionally requires target_path, which breaks valid configurations that use source_url (e.g., GitHub repositories or API requests), causing a ConfigurationError("Target path is required") even though a supported source was provided. [logic error]
Severity Level: Critical 🚨
- ❌ Remote repository source mode fails entirely.
- ❌ CLI `--source-url` users cannot run processing.
- ⚠️ Automated integrations using source_url break.
- ⚠️ Tests exercising remote-collection will fail.| logger.info("Collecting input files...") | |
| if progress_callback: | |
| progress_callback.start_stage("Collecting", message="Scanning files...") | |
| files_to_process: list[ParsedFileData] = [] | |
| if not (getattr(config, "target_path", None) or getattr(config, "source_url", None)): | |
| raise ConfigurationError("Either target_path or source_url is required") |
Steps of Reproduction ✅
1. From the CLI entry point, run_codeconcat is called in cli_entry_point() (file:
codeconcat/main.py — see line where `output_content = run_codeconcat(config)` is invoked).
This is the real entry that passes the built `config` into run_codeconcat.
2. Configure the CLI to use a remote repository source by passing a source URL in the
configuration (for example via `--source-url` / config.source_url). This populates
config.source_url but may leave config.target_path unset.
3. When run_codeconcat(config) starts, it enters the validation block (code in
codeconcat/main.py starting at the try: on line 839) which currently checks only `if not
config.target_path:` and unconditionally raises ConfigurationError("Target path is
required") when target_path is unset.
4. Because config.source_url is a valid alternative, the code should accept configurations
that set source_url instead of target_path. Instead, the current check raises
ConfigurationError preventing any run that uses a source_url. The later code paths (lines
later in run_codeconcat) explicitly handle `elif config.source_url:` to collect from
remote repositories, proving source_url is supported — but this code is unreachable
because of the earlier unconditional target_path requirement.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** codeconcat/main.py
**Line:** 841:844
**Comment:**
*Logic Error: The configuration validation in `run_codeconcat` unconditionally requires `target_path`, which breaks valid configurations that use `source_url` (e.g., GitHub repositories or API requests), causing a `ConfigurationError("Target path is required")` even though a supported source was provided.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| refresh_per_second=4, | ||
| ) as progress: | ||
| task = progress.add_task("[cyan]Processing files...", total=None) | ||
| quiet=state.quiet, |
There was a problem hiding this comment.
Suggestion: The --no-progress (disable_progress) CLI flag is not actually disabling the new progress dashboard: create_progress is called with quiet=state.quiet only, so when the user requests no progress but is not in quiet mode, a SimpleProgress instance still prints progress updates. This is a behavior regression and violates the option's documented intent to disable progress output. Make sure the quiet argument also incorporates disable_progress so that no progress messages are shown when that flag is set. [logic error]
Severity Level: Critical 🚨
- ❌ CLI prints progress despite --no-progress flag.
- ⚠️ Confuses users scripting CLI expecting quiet output.
- ⚠️ Breaks redirection/automation that relies on no progress lines.| quiet=state.quiet, | |
| quiet=state.quiet or disable_progress, |
Steps of Reproduction ✅
1. Invoke the CLI with the --no-progress flag (runs run_command). The create_progress call
happens at codeconcat/cli/commands/run.py:812 where run.py passes quiet=state.quiet and
force_simple=disable_progress to create_progress().
2. create_progress implementation is at codeconcat/cli/progress.py:def
create_progress(...) (lines 419-443). It checks `if quiet:` first (line 436) and returns
SimpleProgress(console, quiet=True) if quiet is True.
3. If quiet is False but force_simple is True (line 440), create_progress returns
SimpleProgress(console, quiet=False). SimpleProgress.__init__ at
codeconcat/cli/progress.py:349 sets self.quiet = quiet (lines 349-352) and its
start/update/complete methods print unless self.quiet is True (examples at lines 367-374,
381-389, 391-396).
4. Because run.py passes quiet=state.quiet and force_simple=disable_progress, running
`codeconcat run --no-progress` (disable_progress=True) while not in quiet mode
(state.quiet is False, the common default) causes create_progress to take the force_simple
branch and construct SimpleProgress(console, quiet=False). That SimpleProgress will print
progress lines (see SimpleProgress.start_stage at lines 367-374 and update_progress at
381-389), violating the user's expected --no-progress behavior.
5. Reproduce locally: run the CLI without --quiet but with --no-progress, observe progress
printing during processing. Confirm call sites in run.py:812 (create_progress call) and
progress.py:440 (force_simple branch) using Grep/Read outputs above.
6. Conclusion: the existing pattern is actionable (not a hypothetical edge) and will
reproduce consistently whenever users set --no-progress while not setting quiet.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** codeconcat/cli/commands/run.py
**Line:** 814:814
**Comment:**
*Logic Error: The `--no-progress` (`disable_progress`) CLI flag is not actually disabling the new progress dashboard: `create_progress` is called with `quiet=state.quiet` only, so when the user requests no progress but is not in quiet mode, a `SimpleProgress` instance still prints progress updates. This is a behavior regression and violates the option's documented intent to disable progress output. Make sure the `quiet` argument also incorporates `disable_progress` so that no progress messages are shown when that flag is set.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| ET.SubElement( | ||
| security, | ||
| "issue", | ||
| severity=str(issue.severity.value), | ||
| line=str(issue.line_number), | ||
| rule=issue.rule_id, | ||
| ).text = issue.description | ||
| severity=severity_str, | ||
| line=str(_get_issue_attr(issue, "line_number", 0)), | ||
| rule=_get_issue_attr(issue, "rule_id", ""), | ||
| ).text = _get_issue_attr(issue, "description", "") |
There was a problem hiding this comment.
Suggestion: When mask_output_content is enabled, this XML writer still emits full security issue descriptions, unlike other writers that suppress detailed security content, which can leak sensitive snippets or messages despite the masking configuration. [security]
Severity Level: Major ⚠️
- ❌ Potential leak of sensitive security descriptions in exported XML.
- ⚠️ Affects XML output used by LLM ingestion and downstream tools.
- ⚠️ Masking config's guarantee is violated (trust/security issue).| ET.SubElement( | |
| security, | |
| "issue", | |
| severity=str(issue.severity.value), | |
| line=str(issue.line_number), | |
| rule=issue.rule_id, | |
| ).text = issue.description | |
| severity=severity_str, | |
| line=str(_get_issue_attr(issue, "line_number", 0)), | |
| rule=_get_issue_attr(issue, "rule_id", ""), | |
| ).text = _get_issue_attr(issue, "description", "") | |
| description = ( | |
| "" | |
| if getattr(config, "mask_output_content", False) | |
| else _get_issue_attr(issue, "description", "") | |
| ) | |
| ET.SubElement( | |
| security, | |
| "issue", | |
| severity=severity_str, | |
| line=str(_get_issue_attr(issue, "line_number", 0)), | |
| rule=_get_issue_attr(issue, "rule_id", ""), | |
| ).text = description |
Steps of Reproduction ✅
1. Construct a WritableItem with security_issues present and descriptions (e.g., a dict or
object containing "description") and set config.mask_output_content = True. The writer
entrypoint is write_xml in codeconcat/writer/xml_writer.py.
2. Call write_xml(items, config, folder_tree_str='') which iterates files; the security
block executed at codeconcat/writer/xml_writer.py lines 179-194 (the loop over
item.security_issues).
3. Current code (lines 179-194) always sets the XML issue element text to the issue
description via _get_issue_attr(issue, "description", "") — this writes full descriptions
into the XML even when config.mask_output_content is True.
4. Observe produced XML contains full security descriptions in <issue> elements (file:
codeconcat/writer/xml_writer.py around lines 179-194). Expectation: with
mask_output_content True, descriptions should be suppressed but they are not.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** codeconcat/writer/xml_writer.py
**Line:** 188:194
**Comment:**
*Security: When `mask_output_content` is enabled, this XML writer still emits full security issue descriptions, unlike other writers that suppress detailed security content, which can leak sensitive snippets or messages despite the masking configuration.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| output_parts.append( | ||
| f"- {severity_badge} **Line {issue.line_number}**: {issue.description}" | ||
| ) | ||
| severity = _get_issue_attr(issue, "severity", "INFO") |
There was a problem hiding this comment.
Suggestion: The Markdown writer renders detailed security issue lines unconditionally, ignoring the mask_output_content flag, so sensitive descriptions and line numbers can still appear even when masking is enabled; add a check inside the loop to skip emitting these lines when masking is active. [security]
Severity Level: Critical 🚨
- ❌ Masking option does not prevent leaking sensitive issue descriptions.
- ⚠️ Line numbers for findings exposed in reports.| severity = _get_issue_attr(issue, "severity", "INFO") | |
| if getattr(config, "mask_output_content", False): | |
| continue |
Steps of Reproduction ✅
1. Call write_markdown(items, config, ...) where config.mask_output_content is True
(config object passed into write_markdown is used extensively in the function; see
top-level call context in codeconcat/writer/markdown_writer.py).
2. In write_markdown, the file-level security issues rendering block executes (file:
codeconcat/writer/markdown_writer.py, lines 247-259). The loop `for issue in
item.security_issues:` (line 253) unconditionally appends detailed lines including line
numbers and descriptions.
3. Because there's no check of config.mask_output_content inside that loop, masked runs
will still include vulnerable descriptions and precise line numbers in the generated
Markdown (observe output lines from line 253-258 containing the description and line
number).
4. Reproduce by creating an item with security_issues containing a description and
line_number, call write_markdown with config.mask_output_content=True, then inspect
returned Markdown: sensitive data appears despite masking.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** codeconcat/writer/markdown_writer.py
**Line:** 254:254
**Comment:**
*Security: The Markdown writer renders detailed security issue lines unconditionally, ignoring the `mask_output_content` flag, so sensitive descriptions and line numbers can still appear even when masking is enabled; add a check inside the loop to skip emitting these lines when masking is active.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # Handle enum with .value attribute | ||
| if hasattr(severity, "value"): | ||
| severity_str = str(severity.value).upper() |
There was a problem hiding this comment.
Suggestion: The severity badge helper converts enum severities using their numeric .value (e.g. 4) before lookup, but the badge map is keyed by names like "CRITICAL", so any SecuritySeverity enum will fall back to the unknown "❓" badge instead of the correct severity icon; normalize using the enum's .name when available. [logic error]
Severity Level: Major ⚠️
- ⚠️ Severity badges display as unknown for Enum severities.
- ⚠️ Markdown security summary shows ❓ instead of icons.| # Handle enum with .value attribute | |
| if hasattr(severity, "value"): | |
| severity_str = str(severity.value).upper() | |
| # Normalize severity to an uppercase string name | |
| if hasattr(severity, "name"): | |
| # Enum-like object: use its name attribute (e.g. SecuritySeverity.CRITICAL -> "CRITICAL") | |
| severity_str = str(severity.name).upper() |
Steps of Reproduction ✅
1. Render Markdown report for a file that has security issues: call write_markdown() in
codeconcat/writer/markdown_writer.py which iterates security issues at write_markdown()
around lines 251-259 (see the loop starting at line 253 `for issue in
item.security_issues:`).
2. During that loop (file: codeconcat/writer/markdown_writer.py, line ~256) the code calls
_get_severity_badge(severity) to render an icon for the issue's severity.
3. If the security issue uses an Enum object (e.g. SecuritySeverity.CRITICAL) where
severity has .name and .value, the current implementation of _get_severity_badge (defined
at codeconcat/writer/markdown_writer.py:423-431) checks for `.value` and converts that
numeric value to a string (e.g. "4") then uppercases it.
4. The badges map keys are names like "CRITICAL", so converting .value to a string yields
no match and badges.get(...) returns "❓" instead of the intended "🔴". Observe incorrect
badge in the generated Markdown output where enum severities are present.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** codeconcat/writer/markdown_writer.py
**Line:** 426:428
**Comment:**
*Logic Error: The severity badge helper converts enum severities using their numeric `.value` (e.g. 4) before lookup, but the badge map is keyed by names like "CRITICAL", so any `SecuritySeverity` enum will fall back to the unknown "❓" badge instead of the correct severity icon; normalize using the enum's `.name` when available.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
|
||
| # Handle severity (may be enum or string) | ||
| severity = _get_issue_attr(issue, "severity", "INFO") | ||
| severity_str = str(severity.value) if hasattr(severity, "value") else str(severity) |
There was a problem hiding this comment.
Suggestion: In the XML renderer, the severity attribute is serialized using the enum's numeric .value (e.g., "3") instead of its human-readable name (e.g., "HIGH"), which makes XML severity inconsistent with JSON/text outputs and can break consumers that expect symbolic severity names across all formats. [logic error]
Severity Level: Major ⚠️
- ⚠️ XML export uses numeric severity codes, inconsistent.
- ❌ Consumers expecting "HIGH"/"CRITICAL" may misinterpret severity.
- ⚠️ Cross-format comparisons (JSON vs XML) become unreliable.
- ⚠️ Tests asserting string severity may fail.| severity_str = str(severity.value) if hasattr(severity, "value") else str(severity) | |
| # Normalize to human-readable name for consistency across formats | |
| severity_str = severity.name if hasattr(severity, "name") else str(severity) |
Steps of Reproduction ✅
1. Locate the XML security-issue serialization function at
codeconcat/writer/rendering_adapters.py::XmlRenderAdapter.add_security_issue_to_element
(function defined around lines 522-572 in the PR hunk). The severity handling block begins
at line 558 in the PR diff.
2. Construct an AnnotatedFileData (or directly call
XmlRenderAdapter.add_security_issue_to_element) with a SecurityIssue object whose severity
is an enum value (for example SecuritySeverity.HIGH). In tests or runtime this commonly
occurs when security analysis code builds SecurityIssue instances (see types imported at
top of file).
3. Call XmlRenderAdapter.create_annotated_file_element(...) which iterates
file_data.security_issues and calls add_security_issue_to_element(...) (creation path
present in create_annotated_file_element around lines ~620-660). This will execute the
severity serialization code at lines 558-561.
4. Serialize the resulting ElementTree to a string (for example ET.tostring(file_element,
encoding="unicode") in test or runtime). Observe the severity attribute contains the
enum's numeric .value (e.g., "3") instead of the human-readable name "HIGH". This differs
from JSON/text renderers which use severity.name or uppercase names, causing inconsistency
for consumers that expect symbolic severity names.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** codeconcat/writer/rendering_adapters.py
**Line:** 560:560
**Comment:**
*Logic Error: In the XML renderer, the severity attribute is serialized using the enum's numeric `.value` (e.g., "3") instead of its human-readable name (e.g., "HIGH"), which makes XML severity inconsistent with JSON/text outputs and can break consumers that expect symbolic severity names across all formats.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| force_simple: If True, use SimpleProgress even on TTY | ||
|
|
||
| Returns: | ||
| ProgressDashboard for TTY, SimpleProgress for non-TTY or if forced | ||
| """ | ||
| console = console or Console() | ||
|
|
||
| if quiet: | ||
| progress = SimpleProgress(console, quiet=True) | ||
| return progress | ||
|
|
||
| if force_simple or not sys.stdout.isatty(): | ||
| return SimpleProgress(console, quiet=False) | ||
|
|
There was a problem hiding this comment.
Suggestion: The CLI --no-progress flag is wired to the force_simple parameter in create_progress, but force_simple currently just switches to SimpleProgress with quiet=False, so progress lines are still printed in non-TTY/simple mode; to honor the "disable progress bars" contract while still passing a progress object into run_codeconcat, force_simple should create a quiet (non-printing) progress instance. [logic error]
Severity Level: Major ⚠️
- ❌ CLI run command prints progress despite --no-progress.
- ⚠️ Unexpected console output in automated scripts.
- ⚠️ Tests that assert no stdout may fail.
- ⚠️ UX inconsistency with --no-progress flag.| force_simple: If True, use SimpleProgress even on TTY | |
| Returns: | |
| ProgressDashboard for TTY, SimpleProgress for non-TTY or if forced | |
| """ | |
| console = console or Console() | |
| if quiet: | |
| progress = SimpleProgress(console, quiet=True) | |
| return progress | |
| if force_simple or not sys.stdout.isatty(): | |
| return SimpleProgress(console, quiet=False) | |
| force_simple: If True, create a SimpleProgress instance instead of the live Rich dashboard | |
| (typically used for CLI flags that disable progress bars) | |
| Returns: | |
| ProgressDashboard for TTY, SimpleProgress for non-TTY or if forced | |
| """ | |
| console = console or Console() | |
| # If quiet is requested, always create a disabled progress implementation. | |
| if quiet: | |
| return SimpleProgress(console, quiet=True) | |
| # When force_simple is True (e.g. for a "--no-progress" style flag), | |
| # return a SimpleProgress in quiet mode so that no progress lines are printed | |
| # while still providing an object with the expected progress API. | |
| if force_simple: | |
| return SimpleProgress(console, quiet=True) | |
| # For non-TTY environments, fall back to simple line-based progress output. | |
| if not sys.stdout.isatty(): | |
| return SimpleProgress(console, quiet=False) | |
| # Default to the full Rich live dashboard on real TTYs. |
Steps of Reproduction ✅
1. Observe CLI invocation path in codeconcat/cli/commands/run.py where create_progress()
is called at lines 812-816:
- File: codeconcat/cli/commands/run.py
- Lines: 812-816: progress_display = create_progress(console=console,
quiet=state.quiet, force_simple=disable_progress)
- Expected behavior: when `--no-progress` (`disable_progress`) is True, no progress
output should be printed.
2. Inspect create_progress definition in codeconcat/cli/progress.py at lines 419-443:
- File: codeconcat/cli/progress.py
- Lines: 419-443 define create_progress(...).
- Current behavior: when force_simple is True the function executes `return
SimpleProgress(console, quiet=False)` (lines 440-441 in the final file), creating a
SimpleProgress that is not quiet.
3. Reproduce from the CLI:
- Run the CLI command that sets the flag, e.g. `codeconcat run --no-progress` which
sets `disable_progress=True`.
- In run_command, disable_progress is passed as force_simple to create_progress
(run.py:812-816).
- Because create_progress currently returns SimpleProgress(..., quiet=False) when
force_simple is True, SimpleProgress.start_stage/ update_progress will print lines to
the console (SimpleProgress prints unless quiet is True).
- Expected reproduction result: despite using --no-progress, progress lines are printed
to stdout.
4. Verify concrete lines causing output:
- SimpleProgress.start_stage prints at codeconcat/cli/progress.py:369-373 when not
quiet.
- SimpleProgress.update_progress prints at codeconcat/cli/progress.py:389 when not
quiet.
- The misconfiguration (force_simple -> quiet=False) therefore produces visible output
during the run invoked from run.py:812-816.
5. Conclusion / Why suggested change is necessary:
- The call site (run.py) passes the CLI `--no-progress` via force_simple;
create_progress's current branch treats force_simple as "use simple mode" but leaves it
enabled (not quiet). The improved_code makes force_simple create a quiet SimpleProgress
so callers that expect no output (like the CLI flag) get a non-printing progress object
while still receiving the progress API. This reproduces deterministically by running
the CLI with --no-progress and observing printed progress lines; fixing prevents that
output.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** codeconcat/cli/progress.py
**Line:** 429:442
**Comment:**
*Logic Error: The CLI `--no-progress` flag is wired to the `force_simple` parameter in `create_progress`, but `force_simple` currently just switches to `SimpleProgress` with `quiet=False`, so progress lines are still printed in non-TTY/simple mode; to honor the "disable progress bars" contract while still passing a progress object into `run_codeconcat`, `force_simple` should create a quiet (non-printing) progress instance.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
- Use enum .name instead of .value for severity serialization across all writers - Respect mask_output_content flag in standalone writers (json, markdown, xml) - Fix isinstance() PEP 604 syntax for Python 3.9 compatibility - Guard signal.signal() for main-thread-only execution - Fix config validation to allow source_url/diff without target_path - Fix docstring claiming CancelledException is raised (returns None) - Reconstruct ParseResult from dict in multiprocess worker results
Summary
Test plan