diff --git a/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md new file mode 100644 index 000000000..2646d6d66 --- /dev/null +++ b/claude-notes/plans/2026-07-01-windows-json-writer-backslash-paths.md @@ -0,0 +1,86 @@ +# Windows: JSON writer emits backslash path separators in output (bd-dff27o04) + +**Date:** 2026-07-01 +**Braid:** bd-dff27o04 +**Worktree:** `.worktrees/bd-dff27o04-windows-json-writer-emits` (branch `braid/bd-dff27o04-windows-json-writer-emits`, based on `main` @ `b8fb38b0`) +**Status:** Design aligned (2026-07-01) — implementing. + +## Triage verdict + +**Ready to design** — root cause fully confirmed at the source level (single ingress point, single fix), verified against the actual insta snapshot mismatch, no open dependency-graph pressure blocking it. + +## Issue context + +`ASTContext::with_filename` (`crates/pampa/src/pandoc/ast_context.rs:42`) stores whatever filename string is handed to it verbatim into `self.filenames`. On Windows, callers that derive the filename from a real `Path` (e.g. `path.to_string_lossy()` in the snapshot test harness, or CLI `input_filename`) produce backslash-separated strings. The JSON writer later emits that string verbatim in two places: + +- `crates/pampa/src/writers/json.rs:1812` — `FileEntryJson.name` (struct-based, non-streaming `write` path) +- `crates/pampa/src/writers/json.rs:3967` — `w.str_value(filename)?` (streaming writer variant) + +Both read from `ast_context.filenames[idx]` with no normalization. The committed insta snapshots (created on macOS/Linux) always have forward slashes, so on Windows the `name` field diverges and the snapshot test fails. + +Originally filed under the line-ending epic (bd-eehxwr29) because it shared the `json/001` failure symptom, then detached 2026-06-26 as a distinct Windows/Linux parity issue — explicitly **not** a line-ending/byte-offset problem: normalizing `\` → `/` in a filename string doesn't touch source byte offsets, so it's orthogonal to the preserve-line-endings invariant that epic protects. + +## Dependency graph + +- **supersedes bd-238o** (closed 2026-06-26): bd-238o originally bundled 3 unported Windows fixes from quarto-markdown (native `\r` escape, CRLF test-read normalization, JSON path separator). It was split apart during the line-ending epic's decomposition: fix 1 → bd-ske10iyd, fix 2 → superseded by the `.gitattributes` LF pin (bd-mv2ggmr5, rejected the CRLF-normalize-on-read approach per PR #329), fix 3 (this one) → bd-dff27o04. +- **related bd-eehxwr29** (open epic, "Enforce preserve line-ending policy across q2"): shares a label but is explicitly a different mechanism — no blocking dependency, just shared history. +- No `blocks` edges. No incoming pressure beyond "this is the last of the three original quarto-markdown ports still open." + +## What the code looks like today + +Confirmed by direct inspection (not just re-reading the description): + +1. `crates/pampa/tests/integration/test.rs:345` — `test_snapshots_for_format` calls `readers::qmd::read(input.as_bytes(), false, &path.to_string_lossy(), ...)`. `path` comes from `glob("tests/snapshots/{format}/*.qmd")`; on Windows `glob` returns native-separator paths, so `path.to_string_lossy()` is backslash-separated (e.g. `tests\snapshots\json\001.qmd`). +2. `crates/pampa/src/readers/qmd.rs:98` — `ASTContext::with_filename(filename.to_string())` stores that raw string with no normalization. +3. `crates/pampa/src/pandoc/ast_context.rs:42-51` — `with_filename` just does `filenames: vec![filename_str]`. No other ingress point currently exists in production code — `add_filename` (`ast_context.rs:70`) is only exercised by unit tests today, no real caller adds a second file. +4. `crates/pampa/src/writers/json.rs:1801-1823` builds `AstContextJson.files: Vec` from `ast_context.filenames[idx]` directly (`name: filename.clone()`), and the streaming variant at `json.rs:3951-3967` does the same via `w.str_value(filename)?`. +5. Verified against the *actual* insta snapshot (not the stale-looking `tests/snapshots/json/001.json` fixture file, which is unrelated/unused by this test): `crates/pampa/snapshots/json/001.snap` line 6 contains `"astContext":{"files":[{"...,"name":"tests/snapshots/json/001.qmd",...}]}` — forward slashes, committed from a Unix machine. This is exactly the field built at json.rs:1812. +6. `quarto_util::to_forward_slashes(path: &Path) -> String` (`crates/quarto-util/src/path.rs:23`) already exists and is used elsewhere for this exact purpose (PR #340 Lua side, HTML resource paths, DocumentProfile, listings) — no new helper needed, just a new call site. +7. **Invariant check (all real call sites of `with_filename`/`add_filename`, checked by grep):** every production caller passes either a genuine file path (CLI `input_filename`, test-harness `path.to_string_lossy()`, Lua `readwrite.rs` file reads) or a bracketed placeholder literal with no path separators (`""`, `""`, `""`). There is no current call site that passes an opaque, non-path identifier that could contain a legitimate literal backslash. So `ASTContext.filenames` is, in practice, always either "a portable/display path" or "a placeholder" — never an arbitrary opaque string — which is the invariant the forward-slash fix relies on. + +**Reproducibility:** not yet re-run end-to-end on this machine — see "Environment note" below. Source-level analysis is unambiguous: `path.to_string_lossy()` on a glob-returned Windows path will contain `\`, and nothing between there and the JSON writer normalizes it. The only reason this isn't currently failing loudly in this session is an unrelated, transient native-build issue (see below) that blocked running the test at all; it is not evidence the bug is absent. + +**Environment note (tangential, not part of this fix):** `cargo xtask verify --skip-hub-build` failed on first attempt in this fresh worktree with `aws-lc-sys` / `crypto-common` C-compile errors under `cl.exe` (unrelated to pampa/JSON — pulled in transitively via `quarto-hub → reqwest → rustls → aws-lc-rs → aws-lc-sys`). Rebuilding `aws-lc-sys` in isolation (`cargo build -p aws-lc-sys`) succeeded immediately after, confirming this was a parallel-build resource-contention flake in the fresh worktree's `target/`, not a real regression — historical measurement logs (`claude-notes/research/measurements/baseline-debug.log`) show this exact crate building fine on this machine before. A clean re-run of `cargo xtask verify --skip-hub-build` is in flight; if it comes back green, treat this as noise. If it recurs deterministically, it's a separate strand, not part of this fix. + +## Phases — completed 2026-07-01 + +- **Phase 0 — Test** ✅: added `test_with_filename_normalizes_backslashes` and `test_add_filename_normalizes_backslashes` in `ast_context.rs`. Confirmed RED against unmodified `with_filename`/`add_filename` before implementing. +- **Phase 1 — Fix** ✅: normalized in both `ASTContext::with_filename` and `add_filename` via `quarto_util::to_forward_slashes`. Confirmed GREEN. +- **Phase 1b — scope extension (found during E2E verification)**: end-to-end CLI testing (`cargo run -p pampa -- -t json -i `) surfaced a second, un-normalized ingress point: `main.rs:279`'s ad hoc fallback `SourceContext` (built directly from the raw CLI arg on the hard-parse-error path) feeds the same ariadne `-->` file:line label as the happy path, but bypasses `ASTContext` entirely. Confirmed with user this was in-scope (same bug class, same fix pattern) rather than deferred. Added `hard_parse_error_diagnostic_uses_forward_slashes` integration test (spawns the real `pampa` binary via `CARGO_BIN_EXE_pampa`, drives an unclosed-span hard error through a native-separator temp path) — RED confirmed via `git apply -R` on just the `main.rs` fix, then GREEN after reapplying. One iteration needed: the first version of this test asserted "no backslash anywhere in stderr," which false-positived on ariadne's OSC-8 hyperlink terminator (`ESC \`, an unrelated ANSI control byte) — tightened to check the specific path fragment instead. + - **Explicitly NOT fixed, by design**: `main.rs:217-225`'s "Missing Newline at End of File" warning, which interpolates the raw CLI arg into a plain sentence (`` File `{input_filename}` does not end with a newline ``). This isn't a portable identifier — it's echoing the user's own typed path back to them, which is standard CLI convention (matches `rustc`, `cat`, etc.). Normalizing it would be surprising, not helpful. +- **Phase 2 — Verify** ✅: full `pampa` crate suite (`cargo nextest run -p pampa --no-fail-fast`) — 8 pre-existing failures confirmed via `git stash`/baseline re-run to be unrelated CRLF byte-offset issues (catalogued Windows issue, orthogonal to this fix), zero new regressions. Full `cargo nextest run --workspace` was intentionally skipped per user direction — CI covers workspace-wide verification, and known Windows-only CRLF failures aren't fully resolved yet, so a full run adds cost without new signal; instead did a targeted grep-based cross-crate check confirming no crate outside `pampa` reads `ASTContext.filenames` as a literal disk path (Windows file APIs accept forward slashes natively regardless). E2E CLI check done for both the JSON writer (`astContext.files[0].name`) and the diagnostics path (ariadne file:line label). +- **Phase 3 — Docs/changelog**: none needed — internal writer-determinism bug, not a user-facing feature change beyond what's captured in the design questions above. + +## Design questions — resolved 2026-07-01 + +All 4 confirmed by user, matching this plan's recommendations: + +1. **Normalization point:** `ASTContext::with_filename` ingress (single point, covers both writer sites and diagnostics). ✅ Confirmed. +2. **`add_filename` (ast_context.rs:70):** normalize it too, for consistency with `with_filename`. ✅ Confirmed. +3. **Test cfg:** regression test runs on all platforms (no `#[cfg(windows)]` gate) — reproducible via a literal backslash string, no real Windows path APIs needed. ✅ Confirmed. +4. **Diagnostic/error-message paths:** Windows users will see forward-slash paths in CLI error output too, not just JSON — accepted as an intended, desired side effect (consistent with codebase + quarto-cli convention). ✅ Confirmed. + +**Status: design aligned. Proceeding to Phase 0 (TDD).** + +## Ecosystem precedent + +Checked TypeScript Quarto (quarto-dev/quarto-cli) for how it handles this exact +problem, since it's the sibling codebase with a decade of Windows mileage. +Confirmed against source (not just DeepWiki's summary): `pathWithForwardSlashes` +in `src/core/path.ts:199` does the identical `path.replace(/\\/g, "/")`, applied +at the same class of boundary q2 needs — resource paths, project-cache keys +(`FileInformationCacheMap`), and output emission (Pandoc metadata, TOML, +generated JS for OJS). There's also a companion `normalizePath` that uppercases +Windows drive letters for consistent cache-key comparison — out of scope here +(no drive-letter-comparison need in `ASTContext`), but worth knowing exists if +a future path-identity bug surfaces. + +This doesn't change any of the open design questions below — it confirms the +proposed approach (normalize at ingress, forward slashes in output) matches the +established convention in both codebases, not just q2's own prior art +(`quarto_util::to_forward_slashes`). + +## Risks / tradeoffs (draft) + +- Low risk: `to_forward_slashes` is already battle-tested elsewhere in the codebase for identical purposes (per the strand description), and the same pattern is independently proven in TypeScript Quarto (see Ecosystem precedent above). The change is additive (one call in one function) and doesn't touch byte offsets or the line-ending preserve policy. +- Snapshot fallout: normalizing at ingress could change filenames embedded in *other* existing snapshots beyond `json/001` if any of them currently encode backslashes on non-Windows CI (unlikely, since CI presumably runs Linux/macOS) — worth a full snapshot diff check after the fix, not just the one known-failing file. diff --git a/crates/pampa/src/main.rs b/crates/pampa/src/main.rs index bd0b54946..a5b3567b2 100644 --- a/crates/pampa/src/main.rs +++ b/crates/pampa/src/main.rs @@ -7,7 +7,9 @@ use clap::Parser; use quarto_error_reporting::DiagnosticMessageBuilder; +use quarto_util::to_forward_slashes; use std::io::{self, Read, Write}; +use std::path::Path; mod attribution; mod citeproc_filter; @@ -276,7 +278,10 @@ fn main() { } else { // Build a minimal source context for Ariadne rendering let mut source_context = quarto_source_map::SourceContext::new(); - source_context.add_file(input_filename.to_string(), Some(input.clone())); + source_context.add_file( + to_forward_slashes(Path::new(input_filename)), + Some(input.clone()), + ); for diagnostic in diagnostics { eprintln!("{}", diagnostic.to_text(Some(&source_context))); diff --git a/crates/pampa/src/pandoc/ast_context.rs b/crates/pampa/src/pandoc/ast_context.rs index 4da4b2ec2..30046ebe8 100644 --- a/crates/pampa/src/pandoc/ast_context.rs +++ b/crates/pampa/src/pandoc/ast_context.rs @@ -4,7 +4,9 @@ */ use quarto_source_map::{FileId, SourceContext}; +use quarto_util::to_forward_slashes; use std::cell::Cell; +use std::path::Path; /// Context passed through the parsing pipeline to provide information /// about the current parse operation and manage string ownership. @@ -40,7 +42,7 @@ impl ASTContext { } pub fn with_filename(filename: impl Into) -> Self { - let filename_str = filename.into(); + let filename_str = to_forward_slashes(Path::new(&filename.into())); let mut source_context = SourceContext::new(); // Add the file without content for now (content can be added later if needed) source_context.add_file(filename_str.clone(), None); @@ -68,7 +70,8 @@ impl ASTContext { /// Add a filename to the context and return its index pub fn add_filename(&mut self, filename: String) -> usize { - self.filenames.push(filename); + self.filenames + .push(to_forward_slashes(Path::new(&filename))); self.filenames.len() - 1 } @@ -124,6 +127,24 @@ mod tests { assert_eq!(ctx.filenames[0], "test.qmd"); } + #[test] + fn test_with_filename_normalizes_backslashes() { + let ctx = ASTContext::with_filename("tests\\snapshots\\json\\001.qmd"); + assert_eq!(ctx.filenames[0], "tests/snapshots/json/001.qmd"); + + // SourceContext (used for CLI diagnostics) must see the same + // normalized path, not just the `filenames` vector used by writers. + let file = ctx.source_context.get_file(FileId(0)).unwrap(); + assert_eq!(file.path, "tests/snapshots/json/001.qmd"); + } + + #[test] + fn test_add_filename_normalizes_backslashes() { + let mut ctx = ASTContext::new(); + let idx = ctx.add_filename("sub\\dir\\second.qmd".to_string()); + assert_eq!(ctx.filenames[idx], "sub/dir/second.qmd"); + } + #[test] fn test_anonymous() { let ctx = ASTContext::anonymous(); diff --git a/crates/pampa/src/readers/commonmark.rs b/crates/pampa/src/readers/commonmark.rs index eae995c5c..82157c47b 100644 --- a/crates/pampa/src/readers/commonmark.rs +++ b/crates/pampa/src/readers/commonmark.rs @@ -31,11 +31,14 @@ pub fn read(input: &str, filename: &str) -> (Pandoc, ASTContext) { // Parse with comrak let root = parse_document(&arena, input, &options); - // Set up source tracking + // Set up source tracking. Reuse the already-normalized filename from + // `context.filenames[0]` (set by `with_filename`) rather than the raw + // `filename` again, so this doesn't reintroduce backslashes on Windows. let mut context = ASTContext::with_filename(filename.to_string()); + let normalized_filename = context.filenames[0].clone(); context .source_context - .add_file(filename.to_string(), Some(input.to_string())); + .add_file(normalized_filename, Some(input.to_string())); let file_id = FileId(0); // First file added gets ID 0 // Create source location context for conversion diff --git a/crates/pampa/src/readers/qmd.rs b/crates/pampa/src/readers/qmd.rs index 08ebde187..34c038fe5 100644 --- a/crates/pampa/src/readers/qmd.rs +++ b/crates/pampa/src/readers/qmd.rs @@ -98,12 +98,16 @@ pub fn read( let mut context = ASTContext::with_filename(filename.to_string()); // Store parent source info for recursive parses context.parent_source_info = parent_source_info; - // Add the input content to the SourceContext for proper error rendering + // Add the input content to the SourceContext for proper error rendering. + // Reuse the already-normalized filename from `context.filenames[0]` + // (set by `with_filename`) rather than the raw `filename` again, so this + // rebuilt SourceContext doesn't reintroduce backslashes on Windows. let input_str = String::from_utf8_lossy(input_bytes).to_string(); + let normalized_filename = context.filenames[0].clone(); context.source_context = quarto_source_map::SourceContext::new(); context .source_context - .add_file(filename.to_string(), Some(input_str)); + .add_file(normalized_filename, Some(input_str)); if tree.had_parse_error() { parser.parser.set_logger(Some(Box::new(|log_type, message| { diff --git a/crates/pampa/tests/integration/main.rs b/crates/pampa/tests/integration/main.rs index 316061f31..a999b80c6 100644 --- a/crates/pampa/tests/integration/main.rs +++ b/crates/pampa/tests/integration/main.rs @@ -36,6 +36,7 @@ pub mod test_cli_input_arg; pub mod test_code_block_attributes; pub mod test_code_span; pub mod test_diagnostic_determinism; +pub mod test_diagnostic_path_normalization; pub mod test_editorial_mark_spacing; pub mod test_emphasis_opening_mark; pub mod test_error_corpus; diff --git a/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs new file mode 100644 index 000000000..0d0263c22 --- /dev/null +++ b/crates/pampa/tests/integration/test_diagnostic_path_normalization.rs @@ -0,0 +1,115 @@ +/* + * test_diagnostic_path_normalization.rs + * Copyright (c) 2026 Posit, PBC + * + * Regression test for bd-dff27o04: file paths shown in pampa's output + * must always use forward slashes, even on Windows where CLI arguments + * and directory joins naturally produce backslash-separated paths. + * + * Ingress points covered: + * - ASTContext::with_filename / add_filename (unit-tested directly in + * crates/pampa/src/pandoc/ast_context.rs) + * - main.rs's own fallback SourceContext, built from the raw CLI arg + * when a *hard* parse error occurs (the `Err(diagnostics)` branch in + * main.rs, before ASTContext-based normalization ever runs) + * - readers::qmd::read and readers::commonmark::read, which each + * re-add the raw filename to `context.source_context` right after + * `ASTContext::with_filename` normalized it (caught by roborev + * review of the initial fix, job 1726) — the warnings/success path + * for diagnostics, distinct from the hard-error fallback above. + * + * The main.rs case is exercised end-to-end through the real binary, + * since that logic has no library seam to unit test directly. The + * reader cases call the library API directly. + */ + +use quarto_source_map::FileId; +use std::fs; +use std::process::Command; + +fn pampa() -> Command { + Command::new(env!("CARGO_BIN_EXE_pampa")) +} + +#[test] +fn hard_parse_error_diagnostic_uses_forward_slashes() { + let tmp = tempfile::tempdir().expect("tempdir"); + let subdir = tmp.path().join("nested").join("dir"); + fs::create_dir_all(&subdir).expect("create nested dir"); + + // An unclosed span is a hard parse error (Q-2-1), which routes through + // main.rs's `Err(diagnostics)` branch and its ad hoc fallback SourceContext. + let file = subdir.join("bad.qmd"); + fs::write(&file, "an [unclosed span\n").expect("write bad qmd"); + + // `PathBuf::join` uses the platform's native separator, so on Windows + // this path contains backslashes without us hardcoding any. + let path_str = file.to_str().unwrap(); + + let output = pampa() + .args(["-i", path_str, "-t", "json"]) + .output() + .expect("run pampa"); + + assert!( + !output.status.success(), + "expected a hard parse error (non-zero exit) for an unclosed span" + ); + + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + stderr.contains("bad.qmd"), + "expected the diagnostic to reference the input file, got:\n{}", + stderr + ); + // Check the path fragment specifically, not "any backslash in stderr" — + // ariadne's OSC-8 hyperlink terminator is the byte sequence ESC '\\', + // which contains a literal backslash unrelated to path separators and + // would otherwise produce a false positive. + assert!( + stderr.contains("nested/dir/bad.qmd"), + "expected forward-slash path in diagnostic, got:\n{}", + stderr + ); + assert!( + !stderr.contains("nested\\dir\\bad.qmd"), + "diagnostic must not contain backslash-separated path, got:\n{}", + stderr + ); +} + +#[test] +fn qmd_reader_source_context_uses_forward_slashes() { + let (_, context, _) = pampa::readers::qmd::read( + b"hello\n", + false, + "tests\\snapshots\\json\\001.qmd", + &mut std::io::sink(), + true, + None, + ) + .expect("parse should succeed"); + + let file = context + .source_context + .get_file(FileId(0)) + .expect("file 0 should exist"); + assert_eq!(file.path, "tests/snapshots/json/001.qmd"); +} + +#[test] +fn commonmark_reader_source_context_uses_forward_slashes() { + let (_, context) = + pampa::readers::commonmark::read("hello\n", "tests\\snapshots\\commonmark\\001.qmd"); + + // FileId(0) is the entry `ASTContext::with_filename` creates internally + // (already normalized, unrelated to this reader's own fix). The fix + // this test protects is commonmark::read's own second `add_file` call, + // which lands at FileId(1) — asserting FileId(0) here would pass + // regardless of whether that second call normalizes its filename. + let file = context + .source_context + .get_file(FileId(1)) + .expect("file 1 should exist"); + assert_eq!(file.path, "tests/snapshots/commonmark/001.qmd"); +}