From 96ff53c3afe09c35ec0bb3087c632c03cdadb020 Mon Sep 17 00:00:00 2001 From: Christophe Dervieux Date: Wed, 1 Jul 2026 14:40:07 +0200 Subject: [PATCH 1/6] Investigate bd-dff27o04: Windows JSON writer backslash path separators Confirms root cause: ASTContext::with_filename stores raw filename strings with no normalization; both JSON writer emission sites read them verbatim. Single-point fix at with_filename ingress using the existing quarto_util::to_forward_slashes helper. --- ...-01-windows-json-writer-backslash-paths.md | 60 +++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md diff --git a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md new file mode 100644 index 000000000..2655c88a0 --- /dev/null +++ b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md @@ -0,0 +1,60 @@ +# Windows: JSON writer emits backslash path separators in output (bd-dff27o04) + +**Date:** 2026-07-01 +**Braid:** bd-dff27o04 +**Worktree:** `.worktrees/bd-dff27o04-windows-json-writer-emits` (branch `braid/bd-dff27o04-windows-json-writer-emits`, based on `main` @ `b8fb38b0`) +**Status:** Investigation — pending design alignment with user. **Do not start implementation until the user gives the go-ahead.** + +## Triage verdict + +**Ready to design** — root cause fully confirmed at the source level (single ingress point, single fix), verified against the actual insta snapshot mismatch, no open dependency-graph pressure blocking it. + +## Issue context + +`ASTContext::with_filename` (`crates/pampa/src/pandoc/ast_context.rs:42`) stores whatever filename string is handed to it verbatim into `self.filenames`. On Windows, callers that derive the filename from a real `Path` (e.g. `path.to_string_lossy()` in the snapshot test harness, or CLI `input_filename`) produce backslash-separated strings. The JSON writer later emits that string verbatim in two places: + +- `crates/pampa/src/writers/json.rs:1812` — `FileEntryJson.name` (struct-based, non-streaming `write` path) +- `crates/pampa/src/writers/json.rs:3967` — `w.str_value(filename)?` (streaming writer variant) + +Both read from `ast_context.filenames[idx]` with no normalization. The committed insta snapshots (created on macOS/Linux) always have forward slashes, so on Windows the `name` field diverges and the snapshot test fails. + +Originally filed under the line-ending epic (bd-eehxwr29) because it shared the `json/001` failure symptom, then detached 2026-06-26 as a distinct Windows/Linux parity issue — explicitly **not** a line-ending/byte-offset problem: normalizing `\` → `/` in a filename string doesn't touch source byte offsets, so it's orthogonal to the preserve-line-endings invariant that epic protects. + +## Dependency graph + +- **supersedes bd-238o** (closed 2026-06-26): bd-238o originally bundled 3 unported Windows fixes from quarto-markdown (native `\r` escape, CRLF test-read normalization, JSON path separator). It was split apart during the line-ending epic's decomposition: fix 1 → bd-ske10iyd, fix 2 → superseded by the `.gitattributes` LF pin (bd-mv2ggmr5, rejected the CRLF-normalize-on-read approach per PR #329), fix 3 (this one) → bd-dff27o04. +- **related bd-eehxwr29** (open epic, "Enforce preserve line-ending policy across q2"): shares a label but is explicitly a different mechanism — no blocking dependency, just shared history. +- No `blocks` edges. No incoming pressure beyond "this is the last of the three original quarto-markdown ports still open." + +## What the code looks like today + +Confirmed by direct inspection (not just re-reading the description): + +1. `crates/pampa/tests/integration/test.rs:345` — `test_snapshots_for_format` calls `readers::qmd::read(input.as_bytes(), false, &path.to_string_lossy(), ...)`. `path` comes from `glob("tests/snapshots/{format}/*.qmd")`; on Windows `glob` returns native-separator paths, so `path.to_string_lossy()` is backslash-separated (e.g. `tests\snapshots\json\001.qmd`). +2. `crates/pampa/src/readers/qmd.rs:98` — `ASTContext::with_filename(filename.to_string())` stores that raw string with no normalization. +3. `crates/pampa/src/pandoc/ast_context.rs:42-51` — `with_filename` just does `filenames: vec![filename_str]`. No other ingress point currently exists in production code — `add_filename` (`ast_context.rs:70`) is only exercised by unit tests today, no real caller adds a second file. +4. `crates/pampa/src/writers/json.rs:1801-1823` builds `AstContextJson.files: Vec` from `ast_context.filenames[idx]` directly (`name: filename.clone()`), and the streaming variant at `json.rs:3951-3967` does the same via `w.str_value(filename)?`. +5. Verified against the *actual* insta snapshot (not the stale-looking `tests/snapshots/json/001.json` fixture file, which is unrelated/unused by this test): `crates/pampa/snapshots/json/001.snap` line 6 contains `"astContext":{"files":[{"...,"name":"tests/snapshots/json/001.qmd",...}]}` — forward slashes, committed from a Unix machine. This is exactly the field built at json.rs:1812. +6. `quarto_util::to_forward_slashes(path: &Path) -> String` (`crates/quarto-util/src/path.rs:23`) already exists and is used elsewhere for this exact purpose (PR #340 Lua side, HTML resource paths, DocumentProfile, listings) — no new helper needed, just a new call site. + +**Reproducibility:** not yet re-run end-to-end on this machine — see "Environment note" below. Source-level analysis is unambiguous: `path.to_string_lossy()` on a glob-returned Windows path will contain `\`, and nothing between there and the JSON writer normalizes it. The only reason this isn't currently failing loudly in this session is an unrelated, transient native-build issue (see below) that blocked running the test at all; it is not evidence the bug is absent. + +**Environment note (tangential, not part of this fix):** `cargo xtask verify --skip-hub-build` failed on first attempt in this fresh worktree with `aws-lc-sys` / `crypto-common` C-compile errors under `cl.exe` (unrelated to pampa/JSON — pulled in transitively via `quarto-hub → reqwest → rustls → aws-lc-rs → aws-lc-sys`). Rebuilding `aws-lc-sys` in isolation (`cargo build -p aws-lc-sys`) succeeded immediately after, confirming this was a parallel-build resource-contention flake in the fresh worktree's `target/`, not a real regression — historical measurement logs (`claude-notes/research/measurements/baseline-debug.log`) show this exact crate building fine on this machine before. A clean re-run of `cargo xtask verify --skip-hub-build` is in flight; if it comes back green, treat this as noise. If it recurs deterministically, it's a separate strand, not part of this fix. + +## Proposed phases (draft) + +- **Phase 0 — Test** (TDD, per project policy): add a unit test that constructs an `ASTContext` (or calls `qmd::read`) with a backslash-containing filename string (no `#[cfg(windows)]` needed — the input is a literal string, not a real OS path, so the test should run identically on all platforms) and asserts the JSON writer's `files[].name` (and the streaming variant) come out forward-slash-only. Run it first, confirm it fails against current `with_filename`. +- **Phase 1 — Fix**: normalize in `ASTContext::with_filename` (`ast_context.rs:42`) via `quarto_util::to_forward_slashes(Path::new(&filename_str))`, so both writer sites and any diagnostics built from `ast_context.filenames`/`source_context` inherit the fix for free — no changes needed at the two writer call sites themselves. +- **Phase 2 — Verify**: re-run `unit_test_snapshots_json` (and the sibling `unit_test_snapshots_native`/etc., since `with_filename` is shared) to confirm no snapshot regressions; full `cargo nextest run --workspace` per project policy. +- **Phase 3 — Docs/changelog**: none expected beyond the fix itself — this is an internal writer-determinism bug, not a user-facing behavior change (unless `quarto` CLI diagnostics also show the normalized path, which is arguably a strict improvement — worth flagging to the user, not deciding unilaterally). + +## Open design questions for the user + +1. **Normalization point.** Description proposes normalizing at `ASTContext::with_filename` ingress (single point, covers both writer sites and diagnostics). Confirm that's preferred over normalizing at each of the two writer call sites individually (which would NOT fix diagnostics that also read `ast_context.filenames`). +2. **`add_filename` (ast_context.rs:70).** Currently dead in production (only unit-tested), so no real second-file path exists yet to normalize. Leave it un-normalized for now (YAGNI) and let a future caller discover the gap, or normalize defensively there too while we're in the file? +3. **Test cfg.** Since the bug is reproducible with a literal `"tests\\snapshots\\json\\001.qmd"` string (no actual Windows path APIs involved), the regression test can run on every platform, not just Windows. Confirm that's the intent — a platform-gated test would under-cover this (CI on Linux/macOS would never catch a regression). + +## Risks / tradeoffs (draft) + +- Low risk: `to_forward_slashes` is already battle-tested elsewhere in the codebase for identical purposes (per the strand description). The change is additive (one call in one function) and doesn't touch byte offsets or the line-ending preserve policy. +- Snapshot fallout: normalizing at ingress could change filenames embedded in *other* existing snapshots beyond `json/001` if any of them currently encode backslashes on non-Windows CI (unlikely, since CI presumably runs Linux/macOS) — worth a full snapshot diff check after the fix, not just the one known-failing file. From 3b00e638fefddbeabc1aa527f746039868950da5 Mon Sep 17 00:00:00 2001 From: Christophe Dervieux Date: Wed, 1 Jul 2026 14:49:37 +0200 Subject: [PATCH 2/6] Address roborev design review: sharpen invariant, add E2E check, resolve add_filename question --- .../2026-07-01-windows-json-writer-backslash-paths.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md index 2655c88a0..b4aad1aa3 100644 --- a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md +++ b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md @@ -36,6 +36,7 @@ Confirmed by direct inspection (not just re-reading the description): 4. `crates/pampa/src/writers/json.rs:1801-1823` builds `AstContextJson.files: Vec` from `ast_context.filenames[idx]` directly (`name: filename.clone()`), and the streaming variant at `json.rs:3951-3967` does the same via `w.str_value(filename)?`. 5. Verified against the *actual* insta snapshot (not the stale-looking `tests/snapshots/json/001.json` fixture file, which is unrelated/unused by this test): `crates/pampa/snapshots/json/001.snap` line 6 contains `"astContext":{"files":[{"...,"name":"tests/snapshots/json/001.qmd",...}]}` — forward slashes, committed from a Unix machine. This is exactly the field built at json.rs:1812. 6. `quarto_util::to_forward_slashes(path: &Path) -> String` (`crates/quarto-util/src/path.rs:23`) already exists and is used elsewhere for this exact purpose (PR #340 Lua side, HTML resource paths, DocumentProfile, listings) — no new helper needed, just a new call site. +7. **Invariant check (all real call sites of `with_filename`/`add_filename`, checked by grep):** every production caller passes either a genuine file path (CLI `input_filename`, test-harness `path.to_string_lossy()`, Lua `readwrite.rs` file reads) or a bracketed placeholder literal with no path separators (`""`, `""`, `""`). There is no current call site that passes an opaque, non-path identifier that could contain a legitimate literal backslash. So `ASTContext.filenames` is, in practice, always either "a portable/display path" or "a placeholder" — never an arbitrary opaque string — which is the invariant the forward-slash fix relies on. **Reproducibility:** not yet re-run end-to-end on this machine — see "Environment note" below. Source-level analysis is unambiguous: `path.to_string_lossy()` on a glob-returned Windows path will contain `\`, and nothing between there and the JSON writer normalizes it. The only reason this isn't currently failing loudly in this session is an unrelated, transient native-build issue (see below) that blocked running the test at all; it is not evidence the bug is absent. @@ -44,15 +45,16 @@ Confirmed by direct inspection (not just re-reading the description): ## Proposed phases (draft) - **Phase 0 — Test** (TDD, per project policy): add a unit test that constructs an `ASTContext` (or calls `qmd::read`) with a backslash-containing filename string (no `#[cfg(windows)]` needed — the input is a literal string, not a real OS path, so the test should run identically on all platforms) and asserts the JSON writer's `files[].name` (and the streaming variant) come out forward-slash-only. Run it first, confirm it fails against current `with_filename`. -- **Phase 1 — Fix**: normalize in `ASTContext::with_filename` (`ast_context.rs:42`) via `quarto_util::to_forward_slashes(Path::new(&filename_str))`, so both writer sites and any diagnostics built from `ast_context.filenames`/`source_context` inherit the fix for free — no changes needed at the two writer call sites themselves. -- **Phase 2 — Verify**: re-run `unit_test_snapshots_json` (and the sibling `unit_test_snapshots_native`/etc., since `with_filename` is shared) to confirm no snapshot regressions; full `cargo nextest run --workspace` per project policy. -- **Phase 3 — Docs/changelog**: none expected beyond the fix itself — this is an internal writer-determinism bug, not a user-facing behavior change (unless `quarto` CLI diagnostics also show the normalized path, which is arguably a strict improvement — worth flagging to the user, not deciding unilaterally). +- **Phase 1 — Fix**: normalize in `ASTContext::with_filename` (`ast_context.rs:42`) via `quarto_util::to_forward_slashes(Path::new(&filename_str))`. Also normalize in `add_filename` (`ast_context.rs:70`) for consistency — same vector, same one-line fix, and leaving it inconsistent would silently reintroduce the bug the day a real second-file caller shows up (see design question 2, resolved below). Both writer sites and any diagnostics built from `ast_context.filenames`/`source_context` inherit the fix for free — no changes needed at the two writer call sites themselves. +- **Phase 2 — Verify**: re-run `unit_test_snapshots_json` (and the sibling `unit_test_snapshots_native`/etc., since `with_filename` is shared) to confirm no snapshot regressions; full `cargo nextest run --workspace` per project policy. Then **end-to-end CLI check** (mandatory per this repo's "End-to-end verification before declaring success" policy — unit tests alone don't count as done for a CLI-visible feature): run `cargo run -p pampa -- -t json -i ` on Windows against a small fixture, and inspect the actual `astContext.files[0].name` in the printed JSON to confirm forward slashes. Also check whether CLI diagnostic/error output (which reads the same `SourceContext`) now shows the normalized path, and record what it shows. +- **Phase 3 — Docs/changelog**: none expected beyond the fix itself — this is an internal writer-determinism bug, not a user-facing feature. The diagnostic-path side effect (see design question 4) may warrant a one-line changelog note if it changes what users see in error output. ## Open design questions for the user 1. **Normalization point.** Description proposes normalizing at `ASTContext::with_filename` ingress (single point, covers both writer sites and diagnostics). Confirm that's preferred over normalizing at each of the two writer call sites individually (which would NOT fix diagnostics that also read `ast_context.filenames`). -2. **`add_filename` (ast_context.rs:70).** Currently dead in production (only unit-tested), so no real second-file path exists yet to normalize. Leave it un-normalized for now (YAGNI) and let a future caller discover the gap, or normalize defensively there too while we're in the file? +2. **`add_filename` (ast_context.rs:70).** Recommendation above is to normalize it too, for consistency with `with_filename` — same vector, same writer, no reason for the two ingress points to diverge even though `add_filename` has no production caller yet. Confirm, or push back if there's a reason to keep it scoped to `with_filename` only. 3. **Test cfg.** Since the bug is reproducible with a literal `"tests\\snapshots\\json\\001.qmd"` string (no actual Windows path APIs involved), the regression test can run on every platform, not just Windows. Confirm that's the intent — a platform-gated test would under-cover this (CI on Linux/macOS would never catch a regression). +4. **Diagnostic/error-message paths.** `with_filename` also seeds `SourceContext`, which CLI diagnostics read. Normalizing at ingress means Windows users will start seeing forward-slash paths in `pampa`/`q2` error messages too, not just JSON output — that's a small but real, user-visible behavior change beyond "fix the JSON writer." Confirm this is desired (it's consistent with the rest of the codebase's forward-slash convention), rather than leaving it as an unplanned side effect. ## Risks / tradeoffs (draft) From d8b07b7a999f088798c9d6c897cd0b3d738cae14 Mon Sep 17 00:00:00 2001 From: Christophe Dervieux Date: Wed, 1 Jul 2026 14:56:34 +0200 Subject: [PATCH 3/6] Add quarto-cli path-normalization precedent (deepwiki + source-verified) --- ...-01-windows-json-writer-backslash-paths.md | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md index b4aad1aa3..384d0a3af 100644 --- a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md +++ b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md @@ -56,7 +56,25 @@ Confirmed by direct inspection (not just re-reading the description): 3. **Test cfg.** Since the bug is reproducible with a literal `"tests\\snapshots\\json\\001.qmd"` string (no actual Windows path APIs involved), the regression test can run on every platform, not just Windows. Confirm that's the intent — a platform-gated test would under-cover this (CI on Linux/macOS would never catch a regression). 4. **Diagnostic/error-message paths.** `with_filename` also seeds `SourceContext`, which CLI diagnostics read. Normalizing at ingress means Windows users will start seeing forward-slash paths in `pampa`/`q2` error messages too, not just JSON output — that's a small but real, user-visible behavior change beyond "fix the JSON writer." Confirm this is desired (it's consistent with the rest of the codebase's forward-slash convention), rather than leaving it as an unplanned side effect. +## Ecosystem precedent + +Checked TypeScript Quarto (quarto-dev/quarto-cli) for how it handles this exact +problem, since it's the sibling codebase with a decade of Windows mileage. +Confirmed against source (not just DeepWiki's summary): `pathWithForwardSlashes` +in `src/core/path.ts:199` does the identical `path.replace(/\\/g, "/")`, applied +at the same class of boundary q2 needs — resource paths, project-cache keys +(`FileInformationCacheMap`), and output emission (Pandoc metadata, TOML, +generated JS for OJS). There's also a companion `normalizePath` that uppercases +Windows drive letters for consistent cache-key comparison — out of scope here +(no drive-letter-comparison need in `ASTContext`), but worth knowing exists if +a future path-identity bug surfaces. + +This doesn't change any of the open design questions below — it confirms the +proposed approach (normalize at ingress, forward slashes in output) matches the +established convention in both codebases, not just q2's own prior art +(`quarto_util::to_forward_slashes`). + ## Risks / tradeoffs (draft) -- Low risk: `to_forward_slashes` is already battle-tested elsewhere in the codebase for identical purposes (per the strand description). The change is additive (one call in one function) and doesn't touch byte offsets or the line-ending preserve policy. +- Low risk: `to_forward_slashes` is already battle-tested elsewhere in the codebase for identical purposes (per the strand description), and the same pattern is independently proven in TypeScript Quarto (see Ecosystem precedent above). The change is additive (one call in one function) and doesn't touch byte offsets or the line-ending preserve policy. - Snapshot fallout: normalizing at ingress could change filenames embedded in *other* existing snapshots beyond `json/001` if any of them currently encode backslashes on non-Windows CI (unlikely, since CI presumably runs Linux/macOS) — worth a full snapshot diff check after the fix, not just the one known-failing file. From 44b75bccee01e3809e9113c3a750abbe6149556b Mon Sep 17 00:00:00 2001 From: Christophe Dervieux Date: Wed, 1 Jul 2026 15:44:24 +0200 Subject: [PATCH 4/6] Normalize path separators in ASTContext filenames and diagnostics (bd-dff27o04) Windows callers pass backslash-separated paths into ASTContext::with_filename (from glob() results, CLI args, etc). The JSON writer and ariadne diagnostics both read that string verbatim, so output diverges from insta snapshots recorded on Unix and from the forward-slash convention used everywhere else in the codebase (quarto_util::to_forward_slashes, already used for Lua/HTML/ DocumentProfile paths). TypeScript Quarto normalizes the same way via pathWithForwardSlashes for the same reason. Normalizes at with_filename/add_filename (the ASTContext ingress point, covers the JSON writer and SourceContext-based diagnostics for free), plus a second ingress point found during end-to-end verification: main.rs's ad hoc fallback SourceContext on the hard-parse-error path, which bypassed ASTContext entirely. Left the "Missing Newline" CLI warning alone since it echoes the user's own typed path back to them rather than emitting a portable identifier. --- ...-01-windows-json-writer-backslash-paths.md | 28 ++++--- crates/pampa/src/main.rs | 7 +- crates/pampa/src/pandoc/ast_context.rs | 25 ++++++- crates/pampa/tests/integration/main.rs | 1 + .../test_diagnostic_path_normalization.rs | 73 +++++++++++++++++++ 5 files changed, 120 insertions(+), 14 deletions(-) create mode 100644 crates/pampa/tests/integration/test_diagnostic_path_normalization.rs diff --git a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md index 384d0a3af..2646d6d66 100644 --- a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md +++ b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md @@ -3,7 +3,7 @@ **Date:** 2026-07-01 **Braid:** bd-dff27o04 **Worktree:** `.worktrees/bd-dff27o04-windows-json-writer-emits` (branch `braid/bd-dff27o04-windows-json-writer-emits`, based on `main` @ `b8fb38b0`) -**Status:** Investigation — pending design alignment with user. **Do not start implementation until the user gives the go-ahead.** +**Status:** Design aligned (2026-07-01) — implementing. ## Triage verdict @@ -42,19 +42,25 @@ Confirmed by direct inspection (not just re-reading the description): **Environment note (tangential, not part of this fix):** `cargo xtask verify --skip-hub-build` failed on first attempt in this fresh worktree with `aws-lc-sys` / `crypto-common` C-compile errors under `cl.exe` (unrelated to pampa/JSON — pulled in transitively via `quarto-hub → reqwest → rustls → aws-lc-rs → aws-lc-sys`). Rebuilding `aws-lc-sys` in isolation (`cargo build -p aws-lc-sys`) succeeded immediately after, confirming this was a parallel-build resource-contention flake in the fresh worktree's `target/`, not a real regression — historical measurement logs (`claude-notes/research/measurements/baseline-debug.log`) show this exact crate building fine on this machine before. A clean re-run of `cargo xtask verify --skip-hub-build` is in flight; if it comes back green, treat this as noise. If it recurs deterministically, it's a separate strand, not part of this fix. -## Proposed phases (draft) +## Phases — completed 2026-07-01 -- **Phase 0 — Test** (TDD, per project policy): add a unit test that constructs an `ASTContext` (or calls `qmd::read`) with a backslash-containing filename string (no `#[cfg(windows)]` needed — the input is a literal string, not a real OS path, so the test should run identically on all platforms) and asserts the JSON writer's `files[].name` (and the streaming variant) come out forward-slash-only. Run it first, confirm it fails against current `with_filename`. -- **Phase 1 — Fix**: normalize in `ASTContext::with_filename` (`ast_context.rs:42`) via `quarto_util::to_forward_slashes(Path::new(&filename_str))`. Also normalize in `add_filename` (`ast_context.rs:70`) for consistency — same vector, same one-line fix, and leaving it inconsistent would silently reintroduce the bug the day a real second-file caller shows up (see design question 2, resolved below). Both writer sites and any diagnostics built from `ast_context.filenames`/`source_context` inherit the fix for free — no changes needed at the two writer call sites themselves. -- **Phase 2 — Verify**: re-run `unit_test_snapshots_json` (and the sibling `unit_test_snapshots_native`/etc., since `with_filename` is shared) to confirm no snapshot regressions; full `cargo nextest run --workspace` per project policy. Then **end-to-end CLI check** (mandatory per this repo's "End-to-end verification before declaring success" policy — unit tests alone don't count as done for a CLI-visible feature): run `cargo run -p pampa -- -t json -i ` on Windows against a small fixture, and inspect the actual `astContext.files[0].name` in the printed JSON to confirm forward slashes. Also check whether CLI diagnostic/error output (which reads the same `SourceContext`) now shows the normalized path, and record what it shows. -- **Phase 3 — Docs/changelog**: none expected beyond the fix itself — this is an internal writer-determinism bug, not a user-facing feature. The diagnostic-path side effect (see design question 4) may warrant a one-line changelog note if it changes what users see in error output. +- **Phase 0 — Test** ✅: added `test_with_filename_normalizes_backslashes` and `test_add_filename_normalizes_backslashes` in `ast_context.rs`. Confirmed RED against unmodified `with_filename`/`add_filename` before implementing. +- **Phase 1 — Fix** ✅: normalized in both `ASTContext::with_filename` and `add_filename` via `quarto_util::to_forward_slashes`. Confirmed GREEN. +- **Phase 1b — scope extension (found during E2E verification)**: end-to-end CLI testing (`cargo run -p pampa -- -t json -i `) surfaced a second, un-normalized ingress point: `main.rs:279`'s ad hoc fallback `SourceContext` (built directly from the raw CLI arg on the hard-parse-error path) feeds the same ariadne `-->` file:line label as the happy path, but bypasses `ASTContext` entirely. Confirmed with user this was in-scope (same bug class, same fix pattern) rather than deferred. Added `hard_parse_error_diagnostic_uses_forward_slashes` integration test (spawns the real `pampa` binary via `CARGO_BIN_EXE_pampa`, drives an unclosed-span hard error through a native-separator temp path) — RED confirmed via `git apply -R` on just the `main.rs` fix, then GREEN after reapplying. One iteration needed: the first version of this test asserted "no backslash anywhere in stderr," which false-positived on ariadne's OSC-8 hyperlink terminator (`ESC \`, an unrelated ANSI control byte) — tightened to check the specific path fragment instead. + - **Explicitly NOT fixed, by design**: `main.rs:217-225`'s "Missing Newline at End of File" warning, which interpolates the raw CLI arg into a plain sentence (`` File `{input_filename}` does not end with a newline ``). This isn't a portable identifier — it's echoing the user's own typed path back to them, which is standard CLI convention (matches `rustc`, `cat`, etc.). Normalizing it would be surprising, not helpful. +- **Phase 2 — Verify** ✅: full `pampa` crate suite (`cargo nextest run -p pampa --no-fail-fast`) — 8 pre-existing failures confirmed via `git stash`/baseline re-run to be unrelated CRLF byte-offset issues (catalogued Windows issue, orthogonal to this fix), zero new regressions. Full `cargo nextest run --workspace` was intentionally skipped per user direction — CI covers workspace-wide verification, and known Windows-only CRLF failures aren't fully resolved yet, so a full run adds cost without new signal; instead did a targeted grep-based cross-crate check confirming no crate outside `pampa` reads `ASTContext.filenames` as a literal disk path (Windows file APIs accept forward slashes natively regardless). E2E CLI check done for both the JSON writer (`astContext.files[0].name`) and the diagnostics path (ariadne file:line label). +- **Phase 3 — Docs/changelog**: none needed — internal writer-determinism bug, not a user-facing feature change beyond what's captured in the design questions above. -## Open design questions for the user +## Design questions — resolved 2026-07-01 -1. **Normalization point.** Description proposes normalizing at `ASTContext::with_filename` ingress (single point, covers both writer sites and diagnostics). Confirm that's preferred over normalizing at each of the two writer call sites individually (which would NOT fix diagnostics that also read `ast_context.filenames`). -2. **`add_filename` (ast_context.rs:70).** Recommendation above is to normalize it too, for consistency with `with_filename` — same vector, same writer, no reason for the two ingress points to diverge even though `add_filename` has no production caller yet. Confirm, or push back if there's a reason to keep it scoped to `with_filename` only. -3. **Test cfg.** Since the bug is reproducible with a literal `"tests\\snapshots\\json\\001.qmd"` string (no actual Windows path APIs involved), the regression test can run on every platform, not just Windows. Confirm that's the intent — a platform-gated test would under-cover this (CI on Linux/macOS would never catch a regression). -4. **Diagnostic/error-message paths.** `with_filename` also seeds `SourceContext`, which CLI diagnostics read. Normalizing at ingress means Windows users will start seeing forward-slash paths in `pampa`/`q2` error messages too, not just JSON output — that's a small but real, user-visible behavior change beyond "fix the JSON writer." Confirm this is desired (it's consistent with the rest of the codebase's forward-slash convention), rather than leaving it as an unplanned side effect. +All 4 confirmed by user, matching this plan's recommendations: + +1. **Normalization point:** `ASTContext::with_filename` ingress (single point, covers both writer sites and diagnostics). ✅ Confirmed. +2. **`add_filename` (ast_context.rs:70):** normalize it too, for consistency with `with_filename`. ✅ Confirmed. +3. **Test cfg:** regression test runs on all platforms (no `#[cfg(windows)]` gate) — reproducible via a literal backslash string, no real Windows path APIs needed. ✅ Confirmed. +4. **Diagnostic/error-message paths:** Windows users will see forward-slash paths in CLI error output too, not just JSON — accepted as an intended, desired side effect (consistent with codebase + quarto-cli convention). ✅ Confirmed. + +**Status: design aligned. Proceeding to Phase 0 (TDD).** ## Ecosystem precedent diff --git a/crates/pampa/src/main.rs b/crates/pampa/src/main.rs index bd0b54946..a5b3567b2 100644 --- a/crates/pampa/src/main.rs +++ b/crates/pampa/src/main.rs @@ -7,7 +7,9 @@ use clap::Parser; use quarto_error_reporting::DiagnosticMessageBuilder; +use quarto_util::to_forward_slashes; use std::io::{self, Read, Write}; +use std::path::Path; mod attribution; mod citeproc_filter; @@ -276,7 +278,10 @@ fn main() { } else { // Build a minimal source context for Ariadne rendering let mut source_context = quarto_source_map::SourceContext::new(); - source_context.add_file(input_filename.to_string(), Some(input.clone())); + source_context.add_file( + to_forward_slashes(Path::new(input_filename)), + Some(input.clone()), + ); for diagnostic in diagnostics { eprintln!("{}", diagnostic.to_text(Some(&source_context))); diff --git a/crates/pampa/src/pandoc/ast_context.rs b/crates/pampa/src/pandoc/ast_context.rs index 4da4b2ec2..30046ebe8 100644 --- a/crates/pampa/src/pandoc/ast_context.rs +++ b/crates/pampa/src/pandoc/ast_context.rs @@ -4,7 +4,9 @@ */ use quarto_source_map::{FileId, SourceContext}; +use quarto_util::to_forward_slashes; use std::cell::Cell; +use std::path::Path; /// Context passed through the parsing pipeline to provide information /// about the current parse operation and manage string ownership. @@ -40,7 +42,7 @@ impl ASTContext { } pub fn with_filename(filename: impl Into) -> Self { - let filename_str = filename.into(); + let filename_str = to_forward_slashes(Path::new(&filename.into())); let mut source_context = SourceContext::new(); // Add the file without content for now (content can be added later if needed) source_context.add_file(filename_str.clone(), None); @@ -68,7 +70,8 @@ impl ASTContext { /// Add a filename to the context and return its index pub fn add_filename(&mut self, filename: String) -> usize { - self.filenames.push(filename); + self.filenames + .push(to_forward_slashes(Path::new(&filename))); self.filenames.len() - 1 } @@ -124,6 +127,24 @@ mod tests { assert_eq!(ctx.filenames[0], "test.qmd"); } + #[test] + fn test_with_filename_normalizes_backslashes() { + let ctx = ASTContext::with_filename("tests\\snapshots\\json\\001.qmd"); + assert_eq!(ctx.filenames[0], "tests/snapshots/json/001.qmd"); + + // SourceContext (used for CLI diagnostics) must see the same + // normalized path, not just the `filenames` vector used by writers. + let file = ctx.source_context.get_file(FileId(0)).unwrap(); + assert_eq!(file.path, "tests/snapshots/json/001.qmd"); + } + + #[test] + fn test_add_filename_normalizes_backslashes() { + let mut ctx = ASTContext::new(); + let idx = ctx.add_filename("sub\\dir\\second.qmd".to_string()); + assert_eq!(ctx.filenames[idx], "sub/dir/second.qmd"); + } + #[test] fn test_anonymous() { let ctx = ASTContext::anonymous(); diff --git a/crates/pampa/tests/integration/main.rs b/crates/pampa/tests/integration/main.rs index 316061f31..a999b80c6 100644 --- a/crates/pampa/tests/integration/main.rs +++ b/crates/pampa/tests/integration/main.rs @@ -36,6 +36,7 @@ pub mod test_cli_input_arg; pub mod test_code_block_attributes; pub mod test_code_span; pub mod test_diagnostic_determinism; +pub mod test_diagnostic_path_normalization; pub mod test_editorial_mark_spacing; pub mod test_emphasis_opening_mark; pub mod test_error_corpus; diff --git a/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs new file mode 100644 index 000000000..ab545b3c0 --- /dev/null +++ b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs @@ -0,0 +1,73 @@ +/* + * test_diagnostic_path_normalization.rs + * Copyright (c) 2026 Posit, PBC + * + * Regression test for bd-dff27o04: file paths shown in pampa's output + * must always use forward slashes, even on Windows where CLI arguments + * and directory joins naturally produce backslash-separated paths. + * + * Two ingress points are covered: + * - ASTContext::with_filename / add_filename (unit-tested directly in + * crates/pampa/src/pandoc/ast_context.rs) + * - main.rs's own fallback SourceContext, built from the raw CLI arg + * when a *hard* parse error occurs (the `Err(diagnostics)` branch in + * main.rs, before ASTContext-based normalization ever runs) + * + * This test exercises the second ingress point end-to-end through the + * real binary, since that logic lives in main.rs and has no library + * seam to unit test directly. + */ + +use std::fs; +use std::process::Command; + +fn pampa() -> Command { + Command::new(env!("CARGO_BIN_EXE_pampa")) +} + +#[test] +fn hard_parse_error_diagnostic_uses_forward_slashes() { + let tmp = tempfile::tempdir().expect("tempdir"); + let subdir = tmp.path().join("nested").join("dir"); + fs::create_dir_all(&subdir).expect("create nested dir"); + + // An unclosed span is a hard parse error (Q-2-1), which routes through + // main.rs's `Err(diagnostics)` branch and its ad hoc fallback SourceContext. + let file = subdir.join("bad.qmd"); + fs::write(&file, "an [unclosed span\n").expect("write bad qmd"); + + // `PathBuf::join` uses the platform's native separator, so on Windows + // this path contains backslashes without us hardcoding any. + let path_str = file.to_str().unwrap(); + + let output = pampa() + .args(["-i", path_str, "-t", "json"]) + .output() + .expect("run pampa"); + + assert!( + !output.status.success(), + "expected a hard parse error (non-zero exit) for an unclosed span" + ); + + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + stderr.contains("bad.qmd"), + "expected the diagnostic to reference the input file, got:\n{}", + stderr + ); + // Check the path fragment specifically, not "any backslash in stderr" — + // ariadne's OSC-8 hyperlink terminator is the byte sequence ESC '\\', + // which contains a literal backslash unrelated to path separators and + // would otherwise produce a false positive. + assert!( + stderr.contains("nested/dir/bad.qmd"), + "expected forward-slash path in diagnostic, got:\n{}", + stderr + ); + assert!( + !stderr.contains("nested\\dir\\bad.qmd"), + "diagnostic must not contain backslash-separated path, got:\n{}", + stderr + ); +} From f5a93fa4818c81d73615de8f1dfe614d76a77648 Mon Sep 17 00:00:00 2001 From: Christophe Dervieux Date: Wed, 1 Jul 2026 15:59:46 +0200 Subject: [PATCH 5/6] Normalize source_context filenames re-added by qmd/commonmark readers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit roborev review (job 1726) caught that both readers rebuild or re-add to context.source_context using the raw filename right after ASTContext::with_filename already normalized it — so a successful parse that emits warnings still rendered ariadne diagnostics with Windows backslashes, even though the earlier fix covered the JSON writer and the hard-parse-error fallback in main.rs. qmd.rs's case was live: it fully replaces context.source_context, so the normalized entry with_filename created was discarded. commonmark.rs's case was latent (a second, unused add_file call creates a dead entry at index 1 since FileId(0) is hardcoded) but fixed anyway for consistency, since leaving raw backslashes there is a footgun if that assumption ever changes. Both now reuse context.filenames[0], the value with_filename already normalized, rather than re-deriving from the raw filename argument. --- crates/pampa/src/readers/commonmark.rs | 7 ++- crates/pampa/src/readers/qmd.rs | 8 +++- .../test_diagnostic_path_normalization.rs | 45 +++++++++++++++++-- 3 files changed, 52 insertions(+), 8 deletions(-) diff --git a/crates/pampa/src/readers/commonmark.rs b/crates/pampa/src/readers/commonmark.rs index eae995c5c..82157c47b 100644 --- a/crates/pampa/src/readers/commonmark.rs +++ b/crates/pampa/src/readers/commonmark.rs @@ -31,11 +31,14 @@ pub fn read(input: &str, filename: &str) -> (Pandoc, ASTContext) { // Parse with comrak let root = parse_document(&arena, input, &options); - // Set up source tracking + // Set up source tracking. Reuse the already-normalized filename from + // `context.filenames[0]` (set by `with_filename`) rather than the raw + // `filename` again, so this doesn't reintroduce backslashes on Windows. let mut context = ASTContext::with_filename(filename.to_string()); + let normalized_filename = context.filenames[0].clone(); context .source_context - .add_file(filename.to_string(), Some(input.to_string())); + .add_file(normalized_filename, Some(input.to_string())); let file_id = FileId(0); // First file added gets ID 0 // Create source location context for conversion diff --git a/crates/pampa/src/readers/qmd.rs b/crates/pampa/src/readers/qmd.rs index 08ebde187..34c038fe5 100644 --- a/crates/pampa/src/readers/qmd.rs +++ b/crates/pampa/src/readers/qmd.rs @@ -98,12 +98,16 @@ pub fn read( let mut context = ASTContext::with_filename(filename.to_string()); // Store parent source info for recursive parses context.parent_source_info = parent_source_info; - // Add the input content to the SourceContext for proper error rendering + // Add the input content to the SourceContext for proper error rendering. + // Reuse the already-normalized filename from `context.filenames[0]` + // (set by `with_filename`) rather than the raw `filename` again, so this + // rebuilt SourceContext doesn't reintroduce backslashes on Windows. let input_str = String::from_utf8_lossy(input_bytes).to_string(); + let normalized_filename = context.filenames[0].clone(); context.source_context = quarto_source_map::SourceContext::new(); context .source_context - .add_file(filename.to_string(), Some(input_str)); + .add_file(normalized_filename, Some(input_str)); if tree.had_parse_error() { parser.parser.set_logger(Some(Box::new(|log_type, message| { diff --git a/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs index ab545b3c0..c0eb1daf5 100644 --- a/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs +++ b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs @@ -6,18 +6,24 @@ * must always use forward slashes, even on Windows where CLI arguments * and directory joins naturally produce backslash-separated paths. * - * Two ingress points are covered: + * Ingress points covered: * - ASTContext::with_filename / add_filename (unit-tested directly in * crates/pampa/src/pandoc/ast_context.rs) * - main.rs's own fallback SourceContext, built from the raw CLI arg * when a *hard* parse error occurs (the `Err(diagnostics)` branch in * main.rs, before ASTContext-based normalization ever runs) + * - readers::qmd::read and readers::commonmark::read, which each + * re-add the raw filename to `context.source_context` right after + * `ASTContext::with_filename` normalized it (caught by roborev + * review of the initial fix, job 1726) — the warnings/success path + * for diagnostics, distinct from the hard-error fallback above. * - * This test exercises the second ingress point end-to-end through the - * real binary, since that logic lives in main.rs and has no library - * seam to unit test directly. + * The main.rs case is exercised end-to-end through the real binary, + * since that logic has no library seam to unit test directly. The + * reader cases call the library API directly. */ +use quarto_source_map::FileId; use std::fs; use std::process::Command; @@ -71,3 +77,34 @@ fn hard_parse_error_diagnostic_uses_forward_slashes() { stderr ); } + +#[test] +fn qmd_reader_source_context_uses_forward_slashes() { + let (_, context, _) = pampa::readers::qmd::read( + b"hello\n", + false, + "tests\\snapshots\\json\\001.qmd", + &mut std::io::sink(), + true, + None, + ) + .expect("parse should succeed"); + + let file = context + .source_context + .get_file(FileId(0)) + .expect("file 0 should exist"); + assert_eq!(file.path, "tests/snapshots/json/001.qmd"); +} + +#[test] +fn commonmark_reader_source_context_uses_forward_slashes() { + let (_, context) = + pampa::readers::commonmark::read("hello\n", "tests\\snapshots\\commonmark\\001.qmd"); + + let file = context + .source_context + .get_file(FileId(0)) + .expect("file 0 should exist"); + assert_eq!(file.path, "tests/snapshots/commonmark/001.qmd"); +} From 26ff5a3988547f484a80bb7194b0ab5b0b656950 Mon Sep 17 00:00:00 2001 From: Christophe Dervieux Date: Wed, 1 Jul 2026 16:09:48 +0200 Subject: [PATCH 6/6] Fix commonmark regression test to check the FileId it actually protects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit roborev review (job 1730) caught that commonmark_reader_source_context_uses_forward_slashes asserted FileId(0), but ASTContext::with_filename populates FileId(0) internally before commonmark::read's own add_file call ever runs — the entry that call actually writes is FileId(1). The test passed regardless of whether that call normalized its filename, giving zero protection for the fix in f5a93fa4. Verified by reverting just that one line and confirming the test still passed; asserting FileId(1) instead now fails correctly when reverted. --- .../integration/test_diagnostic_path_normalization.rs | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs index c0eb1daf5..0d0263c22 100644 --- a/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs +++ b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs @@ -102,9 +102,14 @@ fn commonmark_reader_source_context_uses_forward_slashes() { let (_, context) = pampa::readers::commonmark::read("hello\n", "tests\\snapshots\\commonmark\\001.qmd"); + // FileId(0) is the entry `ASTContext::with_filename` creates internally + // (already normalized, unrelated to this reader's own fix). The fix + // this test protects is commonmark::read's own second `add_file` call, + // which lands at FileId(1) — asserting FileId(0) here would pass + // regardless of whether that second call normalizes its filename. let file = context .source_context - .get_file(FileId(0)) - .expect("file 0 should exist"); + .get_file(FileId(1)) + .expect("file 1 should exist"); assert_eq!(file.path, "tests/snapshots/commonmark/001.qmd"); }