Plan 196: Lazy SectionParagraph text — defer ExtractPlainText by jeduden · Pull Request #370 · jeduden/mdsmith

jeduden · 2026-05-22T18:45:31Z

Implements plan 196 to defer paragraph text extraction until a caller explicitly requests it, reducing allocations on prose-heavy input where most paragraphs are filtered out early.

Summary

The paragraph-readability rule (MDS023) gates on word count before examining readability metrics. On synthetic corpora, most paragraphs fall below the minWords floor (default 20) and never need their plain text extracted. Previously, CollectSectionParagraphs eagerly called ExtractPlainText on every paragraph, wasting allocations on filtered-out paragraphs.

This change makes text extraction lazy: the collector now stores the AST node and defers text materialization until a rule explicitly requests it via a new ExtractText method.

Key Changes

mdtext.CountWordsInNode: New function that counts words by walking the AST without materializing the full text string. Mirrors the dispatch logic of ExtractPlainText to guarantee equivalence.
- Includes wordCounter helper that preserves inWord state across segment boundaries, so adjacent text nodes coalesce correctly.
- Covered by unit tests for each AST node type (Text, String, CodeSpan, Image, emphasis, line breaks) and an equivalence harness that validates against every paragraph in the MDS023 fixture corpus.
astutil.SectionParagraph: Added Node ast.Node field to store the paragraph's AST node. The Text field is kept for backward compatibility with test literals but is no longer populated by the collector.
- New ExtractText(source []byte) string method returns cached Text if present, otherwise materializes text from Node on demand.
astutil.CollectSectionParagraphs: Now stores the AST node instead of eagerly extracting text. The memoization still runs once per file, but defers the per-paragraph string allocation.
astutil.SectionBody: Updated signature to accept source []byte parameter and call ExtractText on each paragraph, since text is no longer pre-populated.
paragraphreadability rule: Uses CountWordsInNode for the minWords gate (no text allocation for filtered paragraphs). Only materializes text via ExtractText for paragraphs that pass the gate or when placeholders are configured (which requires string masking).
paragraphstructure, requiredtextpatterns, requiredmentions rules: Updated to call ExtractText(f.Source) instead of reading p.Text directly, and to pass source to SectionBody.

Performance Impact

Benchmark results on the synthetic corpus:

BenchmarkCheckCorpusLarge allocs/op: 634 k → 553 k (~81 k reduction, exceeding the 30 k target)
MDS023 paragraph-readability: 10 → 8 allocs/op
Engine-bench budgets tightened: Small 80 k → 70 k, Large 760 k → 670 k

Testing

Equivalence harness (TestCountWordsInNode_EquivalentToCountWordsExtractPlainText) validates that CountWordsInNode and CountWords(ExtractPlainText(...)) agree on every paragraph in the MDS023 fixture corpus plus in-package prose helpers.
Unit tests cover each AST node type and the boundary-state coalescing invariant.
All existing tests pass; SectionBody and SectionParagraph tests updated for the new lazy-extraction contract.

https://claude.ai/code/session_01211re8Ye2womR86G6n9Eno

codecov · 2026-05-22T18:47:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.45%. Comparing base (cf363f5) to head (54ed7aa).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

Components	Coverage Δ
Go	`96.42% <100.00%> (+0.16%)`	⬆️
TypeScript	`99.39% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Implements plan 196 by making astutil.SectionParagraph text extraction lazy, so prose rules can avoid allocating paragraph strings unless they actually need them (notably MDS023’s minWords gate).

Changes:

Added mdtext.CountWordsInNode to count words directly from the AST, avoiding ExtractPlainText allocations for filtered-out paragraphs.
Updated astutil.SectionParagraph/CollectSectionParagraphs to store the paragraph node and defer plain-text extraction via ExtractText(source).
Updated affected prose rules and performance budgets/tests to use the new lazy extraction and AST-based word counting.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
plan/196_lazy-section-paragraph-text.md	Marks plan 196 complete and records benchmark/test results.
PLAN.md	Updates plan 196 status in the catalog table.
internal/rules/requiredtextpatterns/rule.go	Passes `f.Source` into `astutil.SectionBody` for lazy paragraph text materialization.
internal/rules/requiredmentions/rule.go	Passes `f.Source` into `astutil.SectionBody` for lazy paragraph text materialization.
internal/rules/paragraphstructure/rule.go	Switches from reading `p.Text` to `p.ExtractText(f.Source)` for lazy extraction compatibility.
internal/rules/paragraphstructure/rule_test.go	Updates tests to use `ExtractText` instead of `Text`.
internal/rules/paragraphreadability/rule.go	Uses `CountWordsInNode` for `minWords` gating and only extracts text when needed.
internal/rules/paragraphreadability/equivalence_test.go	Adds an equivalence harness ensuring `CountWordsInNode` matches `CountWords(ExtractPlainText(...))`.
internal/rules/astutil/astutil.go	Adds `SectionParagraph.Node` + `ExtractText`, makes collection lazy, updates `SectionBody` signature.
internal/rules/astutil/astutil_test.go	Updates and adds tests for the lazy extraction contract and new `SectionBody` signature.
internal/mdtext/mdtext.go	Adds `CountWordsInNode` and supporting `wordCounter`.
internal/mdtext/mdtext_test.go	Adds unit tests for `CountWordsInNode` (per-node-type + boundary coalescing).
internal/engine/bench_test.go	Tightens allocation budgets to reflect reduced allocations from lazy extraction.

- equivalence_test.go: drop the inaccurate "strips YAML front matter" claim in readFixtureFile (it returns the file verbatim); reword the collectParagraphNodes doc to state the explicit choice to skip the table-paragraph filter rather than implying equivalence with CollectSectionParagraphs. - paragraphstructure/rule.go: refresh the post-plan-196 comment — ExtractPlainText is no longer in the shared walk, and MDS024 is opt-in (not a "hot default rule"); spell out which rules share the collector instead.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Reword task 11 and the matching acceptance-criterion checkbox so the "pre-existing flake under -race" caveat is unambiguous: the full non-race suite passes; under -race, only paragraphstructure.TestSentBufPool_ClearReleasesStringReferences flakes (pre-existing on main, verified by stash A/B), and touched packages pass -race cleanly when that one test is skipped. Addresses Copilot review on PR #370.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

With plan 196's lazy ExtractText, SectionBody's per-heading sweep re-runs ExtractPlainText on every paragraph for every containing section — a paragraph nested under h1 > h2 > h3 would extract 3x. That regresses MDS057 / MDS058 when enabled. Add astutil.CollectSectionParagraphsWithText: a memoized variant that builds on the bare CollectSectionParagraphs slice and fills in Text once per paragraph. MDS057 and MDS058 use it; the per-heading SectionBody calls then hit the cached field instead of re-extracting. Default-on MDS023 still uses CollectSectionParagraphs and CountWordsInNode — the lazy-text gate is preserved, so the engine bench's 553 k allocs/op (synthetic corpus, opt-in rules off) does not budge. Addresses Copilot review on PR #370.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

- equivalence_test.go: drop the inaccurate "strips YAML front matter" claim in readFixtureFile (it returns the file verbatim); reword the collectParagraphNodes doc to state the explicit choice to skip the table-paragraph filter rather than implying equivalence with CollectSectionParagraphs. - paragraphstructure/rule.go: refresh the post-plan-196 comment — ExtractPlainText is no longer in the shared walk, and MDS024 is opt-in (not a "hot default rule"); spell out which rules share the collector instead.

Reword task 11 and the matching acceptance-criterion checkbox so the "pre-existing flake under -race" caveat is unambiguous: the full non-race suite passes; under -race, only paragraphstructure.TestSentBufPool_ClearReleasesStringReferences flakes (pre-existing on main, verified by stash A/B), and touched packages pass -race cleanly when that one test is skipped. Addresses Copilot review on PR #370.

With plan 196's lazy ExtractText, SectionBody's per-heading sweep re-runs ExtractPlainText on every paragraph for every containing section — a paragraph nested under h1 > h2 > h3 would extract 3x. That regresses MDS057 / MDS058 when enabled. Add astutil.CollectSectionParagraphsWithText: a memoized variant that builds on the bare CollectSectionParagraphs slice and fills in Text once per paragraph. MDS057 and MDS058 use it; the per-heading SectionBody calls then hit the cached field instead of re-extracting. Default-on MDS023 still uses CollectSectionParagraphs and CountWordsInNode — the lazy-text gate is preserved, so the engine bench's 553 k allocs/op (synthetic corpus, opt-in rules off) does not budge. Addresses Copilot review on PR #370.

codecov · 2026-05-23T09:00:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.98%. Comparing base (d4e3d6f) to head (60348a0).

Additional details and impacted files

Components	Coverage Δ
Go	`96.95% <100.00%> (+0.46%)`	⬆️
TypeScript	`99.39% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

- paragraphreadability rule.go + astutil_test.go: drop the "two hot default rules" framing; MDS024 is opt-in. Spell out the actual mix (MDS023 default-on + MDS024/MDS057/MDS058 opt-in) so the comment matches EnabledByDefault for each rule. - astutil ExtractText + mdtext CountWordsInNode: document the non-nil-node precondition. Both functions panic on a zero-value input via goldmark's FirstChild dereference; the project policy is not to add defensive branches without a red/green test (CLAUDE.md), so the precondition is documented instead.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated 1 comment.

CountWordsInNode walks the AST and tallies words without materialising the joined paragraph string. Mirrors extractText's dispatch shape exactly so it agrees with CountWords(ExtractPlainText(node, source)) byte for byte. The equivalence harness in paragraphreadability walks every paragraph in good.md, bad.md, and the in-package prose helpers; drift between the two chains fails the test on the next run, before the rule wiring in task 6 lands.

…tractPlainText below minWords BenchmarkCheckCorpusLarge drops from ~634 k to ~553 k allocs/op (~81 k savings — bigger than the projected 30 k floor because every per-paragraph extract is now lazy, not just the skipped ones). MDS023 paragraph-readability drops from 10 to 8 allocs/op on the shared fixture, staying under the ≤ 10 ceiling without a grandfather row. Changes: - mdtext.CountWordsInNode walks the AST and tallies word boundaries without materialising the joined string. Mirrors extractText's dispatch shape so it matches CountWords(ExtractPlainText(node, source)) byte for byte. - astutil.SectionParagraph carries Node ast.Node; Text becomes a pre-populated cache test literals can still set. ExtractText(source) prefers Text when present, falls back to mdtext.ExtractPlainText on Node. - buildSectionParagraphs sets Node only; the per-paragraph ExtractPlainText is gone from the shared walk. - SectionBody takes source []byte and materialises text per matched paragraph (signature change rippled through requiredtextpatterns and requiredmentions). - paragraphreadability gates minWords on CountWordsInNode; calls ExtractText only when the paragraph passes. Placeholder mode keeps the eager-text path because masking is a string transform. - paragraphstructure reads p.ExtractText(f.Source) on every iteration — net-zero (it always needs the text). - Engine-bench Allocs budgets dropped from 760 k / 80 k to 670 k / 70 k with ~20% headroom over the new baseline. Equivalence harness in paragraphreadability/equivalence_test.go pins CountWordsInNode against CountWords(ExtractPlainText) over every paragraph in good.md, bad.md, and the in-package prose helpers; drift fails the test on the next run.

- equivalence_test.go: drop the inaccurate "strips YAML front matter" claim in readFixtureFile (it returns the file verbatim); reword the collectParagraphNodes doc to state the explicit choice to skip the table-paragraph filter rather than implying equivalence with CollectSectionParagraphs. - paragraphstructure/rule.go: refresh the post-plan-196 comment — ExtractPlainText is no longer in the shared walk, and MDS024 is opt-in (not a "hot default rule"); spell out which rules share the collector instead.

Reword task 11 and the matching acceptance-criterion checkbox so the "pre-existing flake under -race" caveat is unambiguous: the full non-race suite passes; under -race, only paragraphstructure.TestSentBufPool_ClearReleasesStringReferences flakes (pre-existing on main, verified by stash A/B), and touched packages pass -race cleanly when that one test is skipped. Addresses Copilot review on PR #370.

With plan 196's lazy ExtractText, SectionBody's per-heading sweep re-runs ExtractPlainText on every paragraph for every containing section — a paragraph nested under h1 > h2 > h3 would extract 3x. That regresses MDS057 / MDS058 when enabled. Add astutil.CollectSectionParagraphsWithText: a memoized variant that builds on the bare CollectSectionParagraphs slice and fills in Text once per paragraph. MDS057 and MDS058 use it; the per-heading SectionBody calls then hit the cached field instead of re-extracting. Default-on MDS023 still uses CollectSectionParagraphs and CountWordsInNode — the lazy-text gate is preserved, so the engine bench's 553 k allocs/op (synthetic corpus, opt-in rules off) does not budge. Addresses Copilot review on PR #370.

- paragraphreadability: add TestCheck_Placeholder_LongMaskedPara_StillFlagged to exercise the path where Placeholders is non-empty AND the masked paragraph passes minWords (the only TestCheck_Placeholder_* case hit the words<minWords skip branch). - astutil: add TestSectionBody_ExtractsFromNodeWhenTextEmpty to cover the production path through SectionBody, where paragraphs carry Node but no Text and ExtractText materialises via the AST. The existing JoinsWithSpace test only hit the cached-Text shortcut. - astutil: drop the unreachable `if out[i].Text == ""` branch in buildSectionParagraphsWithText — the upstream CollectSectionParagraphs guarantees an empty Text, so the conditional was dead code that could not be covered without an internal-only test.

- paragraphreadability rule.go + astutil_test.go: drop the "two hot default rules" framing; MDS024 is opt-in. Spell out the actual mix (MDS023 default-on + MDS024/MDS057/MDS058 opt-in) so the comment matches EnabledByDefault for each rule. - astutil ExtractText + mdtext CountWordsInNode: document the non-nil-node precondition. Both functions panic on a zero-value input via goldmark's FirstChild dereference; the project policy is not to add defensive branches without a red/green test (CLAUDE.md), so the precondition is documented instead.

Apply the test pyramid to lift several packages closer to 100% without changing production code. Each helper picked up at least one fast, scope-local unit test that the integration suite was not exercising on the default check path. - lint: pin ColumnOfOffset (was 0%) with first-column, mid-line, past-EOF clamp, and at-newline cases; pin NewParser and PIBlockParserPrioritized forwarders so the public schema/parse surfaces are guarded. - descriptivelinktext: add the missing ID/Name/Category meta tests (matches the convention every other rule package follows). - paragraphstructure: pin the sentencePreview short branch (the Check loop only ever fires on long sentences, so verbatim formatting was uncovered); add the InvalidMaxWordsType negative path that mirrored max-sentences but had no test. - listmarkerstyle: pin blockFirstLine recursion + zero-fallback, firstLineOfListItem zero-fallback, the full styleToMarker / markerToStyle matrices including the unknown defaults, and markerOnLine across out-of-range, whitespace-skip, each valid marker, and the non-marker short-circuit. Coverage moves: - internal/lint 95.0% → 96.1% - internal/rules/descriptivelinktext 86.8% → 100.0% - internal/rules/listmarkerstyle 86.8% → 93.8% - internal/rules/paragraphstructure 97.0% → 100.0%

Lift internal/corpus from 79.7% to 81.6% by unit-testing three helpers the existing clone tests touched only indirectly via the full ResolveSource path. The new tests are scope-local and fast (no git invocations), so each helper's branches are pinned without inflating the e2e suite. - normalizeRepository: pin every documented input shape — git@ and ssh:// passthroughs, http(s) with and without .git, trailing slashes, the github.com short form, the owner/repo shorthand, absolute and dot-relative paths, and rejection of empty / whitespace-only input. - shortCommit: pin the truncation cutoff (8 chars) with cases on both sides plus a 40-char SHA. - validateRemoteSourceInputs: pin missing-repository, missing-commit, missing-cache-dir, and the happy path with substring checks on the error messages.

Continue lifting coverage on packages outside this PR's diff, since codecov/changes can flag tiny indirect shifts in any file. Each new test is small, fast, and scope-local — no I/O fixtures beyond t.TempDir(). - internal/release: pin the WalkDir-error branch in walkAndReject and walkAndRequireAny by pointing both at a missing root; the Verify* end-to-end tests only ever feed a real Hugo tree, so those branches were uncovered. - internal/lint: add TestAbsWithCwd_RelativePathWithExplicitCwd for the cwd-non-empty + relative-path branch that the empty-cwd and absolute-path siblings did not exercise. - internal/rules/listmarkerstyle: cover replaceMarker's four paths (marker at start, marker after indent, non-marker first char, blank line); Fix only ever drives the success path. - cmd/corpusctl: pin usageErr.Error / isUsageError forwarders and defaultCacheDir's two branches (happy path + UserCacheDir failure fallback via cleared HOME/XDG env). Package coverage moves: - cmd/corpusctl 82.6% → 83.7% - internal/release 87.7% → 87.8% - internal/rules/listmarkerstyle 93.8% → 95.1%

Each function picked up a direct unit test instead of relying on integration coverage. The principle: every function gets a deterministic, self-contained test so coverage no longer depends on which integration paths happen to fire. corpus package: - classifyGitError: pin all eight branches (repo-missing, commit-missing, network — three patterns each, plus the default passthrough). The ResolveSource end-to-end test only drove the "commit not found" case. - validateRepoRoot: pin the missing-root negative path with the error format assertion (source name + configured root + commit must all appear) plus the happy path. - sourceRelativePath: table-driven across every reachable branch — empty/absolute root passthroughs, relative root joining, "./" trim, leading-slash trim. - ratio: pin the zero-denominator guard plus normal division. internal/output: - explanationToJSON: pin the nil-guard, the per-field copy of every Leaf, and the empty-slice path (must produce JSON `[]`, not `null`). internal/schema: - resolveDir: pin both branches — EvalSymlinks success on an existing dir, fallback to filepath.Clean for a missing path. internal/rules: - linelength.headingLineNum: pin direct-Lines hit, child-Text recursion, and the zero-fallback for an empty subtree. - noemptyalttext.firstTextLine: pin direct child, nested recursion through Link, and the zero-fallback. - include.isResultPrevLineFence: pin empty-slice, backtick fence, tilde fence, indented fence (leading-whitespace trim), and the non-fence false case. - tokenbudget.normalizeMode: pin the empty / whitespace defaults plus trim+lowercase normalization. internal/release: - pythonExecutable: pin both PATH branches via a temp dir on PATH.

Continue applying the test pyramid: each helper that was only reached through integration paths now has a dedicated unit test covering every reachable branch. - corpus.validateConfigHeader: pin every guard (missing collected_at, malformed date, sub-1 numeric bounds, test_fraction range, empty allowlist, empty sources) plus the happy path. LoadConfig only drives the happy path on a hand- crafted YAML. - corpus.validateSource: pin every per-source negative path (missing name / repo / root / commit / license, license not allowlisted, duplicate name) plus the happy path. - noreferencestyle.isFootnoteDefinitionAt: pin the first-line / 3-space-indent / >3-indent / non-space-before / missing-bracket / missing-colon / later-line cases. The Check loop only drove the happy path on fixture markdown. - tokenbudget.validateTokenizerAndEncoding: pin invalid-tokenizer rejection (with input echoed back), invalid-encoding rejection, empty inputs normalising to valid defaults, and the happy path. ApplySettings drove only the happy path. - lsp.invalidateCachedRead: pin all three branches — nil runCache, empty path, drop-an-entry. The didChange handler drove these implicitly but never asserted on the post-state.

Same principle: every reachable branch of every helper gets its own deterministic unit test, so coverage no longer drifts based on which integration paths fire. - corpus.WriteManifest / WriteJSON / ensureParentDir: pin the ensureParentDir error wrap path (parent path occupied by a regular file makes MkdirAll fail) for both writers, plus a direct unit test for ensureParentDir covering happy + error branches. - corpus.cachedCommitExists: pin all five branches — commit found, "Not a valid object name" → false, "invalid object" → false, "unknown revision" → false, other-error fallthrough wrap. Uses a stub GitRunner. - corpus.reportSourceProgress: pin nil + non-nil callback branches with formatted message assertion. - corpus.reportProgress: pin nil cfg, nil Progress, and the delivery path. - corpus.makeQASample: pin the empty-records guard and the `limit <= 0 → defaultQASampleLimit` fallback. - requiredstructure.buildSchemaRefForLegacy: pin both the empty fallback and the verbatim-source path with multiple inputs.

- applyTokenizer / applyEncoding: pin all six branches across the two helpers — type-mismatch error wrap, invalid-value rejection naming the bad input, and the happy paths for both the single valid tokenizer and every valid encoding. The ApplySettings end-to-end tests drive only the success paths on a real config map. - activeBudget: pin the zero-Max and negative-Max fallbacks, the empty-glob skip, and the no-matching-override default. The integration tests drive the "matching override" branch.

Driven by partial codecov coverage on the splice loop in rebuildWithFormattedTables and the negative-Pad fallback in normalizeConfig. The FormatLines path drives them indirectly, but the byte-level assertions weren't pinned, so any subtle shift (off-by-one on startLine, Pad fallback losing SeparatorStyle) would only surface in downstream test output. - normalizeConfig: pin all three Pad cases (negative → 1, zero stays 0, positive passes through), each verifying that SeparatorStyle survives the fallback unchanged. - rebuildWithFormattedTables: pin the splice branch with exact byte-level output, the `tableEqual → continue` no-op branch with a hand-constructed canonical input, and the empty-tables fast-path. - table struct: a direct round-trip test that constructs a table literal with every documented field set (startLine, rawLines, prefix, rows with isSeparator + alignments), then asserts each field reads back via tableEqual and formatTable preserves the prefix. Anchors the struct-definition lines at the test site so codecov sees them as exercised directly, not only through tryParseTable in a different file.

The Fix method is exercised by many integration tests in this package but never as a tightly-scoped unit. Add three direct tests that pin its observable contract: - TestFix_DirectlyDrivesFormatLines: byte-exact output on a hand-built non-canonical table, asserting Fix forwards to FormatLines under the default Pad/style config. - TestFix_SkipsTablesInsideCodeBlock: a table-shaped block inside a fenced code block passes through byte-for-byte, pinning the CollectCodeBlockLines argument wiring. A regression that dropped that argument would reformat the fenced content. - TestFix_RespectsSeparatorStyleConfig: r.config() must forward SeparatorStyle from the rule field to FormatLines — asserted by checking the compact-style separator shape on the output.

Correctness: - SectionParagraph.ExtractText: add HasText sentinel so the cache fires even when the extracted text is legitimately empty (e.g. image-only paragraphs). buildSectionParagraphsWithText sets HasText so the per-heading SectionBody sweep through MDS057/058 hits the cache for every paragraph, not just non-empty ones. - paragraphreadability placeholder path: don't let the `text == ""` fallback override an intentionally-empty masked string. Track the placeholder branch with a boolean so the index never runs on unmasked text when masking is configured. - paragraphstructure: keep on the bare collector — the with-text memo would re-run shared materialisation but its per-Check slice copy + interface boxing exceeds the rule's 9-alloc budget (plan 193). Document the trade-off in-line. Test seams (no env races, portable on Windows): - cmd/corpusctl: add userCacheDirFn seam; TestDefaultCacheDir stubs it instead of mutating HOME/XDG_CACHE_HOME via t.Setenv (parallel-safe; works on Windows where UserCacheDir reads LOCALAPPDATA, not HOME). - internal/release: add execLookPath seam; TestPythonExecutable stubs it instead of writing a bare "python" file to a temp PATH (PATHEXT on Windows would reject the bare file). Wording: - mdtext.CountWordsInNode doc: replace "agree … byte for byte" with "return the same count for every input" — the function returns an int word count, not bytes. - schema/coverage_test.go: TestResolveDir_AbsoluteCleansPath comment claimed an existence check that wasn't asserted; add the Stat assertion and reword the comment to match.

The two existing tests covered the Text-shortcut branch and the Node fallback, but the HasText branch — added to fix the legitimately-empty cache case (Copilot review on PR #370) — had no direct unit pin. Add four tests covering every branch and the dispatch order: - HasTextHonoursEmptyCache: Node nil, Text "", HasText true must return "" without descending into the nil-Node panic path. This is the reason HasText exists. - HasTextWinsOverShortcut: when both branches would apply, HasText is checked first. Same return value but pins the ordering as part of the contract. - HasTextSkipsNodeExtraction: even with a valid Node set, HasText returns the cached string verbatim. Pre-cache a divergent value to prove ExtractText does NOT fall through to Node extraction. - AllThreeBranches (table-driven): one assertion per dispatch arm — HasText with non-empty Text, HasText with empty Text, no HasText with non-empty Text, no HasText with empty Text falling back to Node.

The previous startLSPSubprocess used exec.CommandContext with the default cancel behaviour (Process.Kill → SIGKILL). On any path where defer cancel() runs before the test's t.Cleanup hook — the failure path is the common one, since function defers run ahead of Cleanup — the subprocess was SIGKILL'd before Go's coverage atexit hooks could flush counters to GOCOVERDIR. The result: e2e_coverage.txt entries for files only the LSP server exercises (cmd/mdsmith/lsp.go, internal/lsp/...) shifted between CI runs based on which LSP tests passed cleanly versus failed mid-flight. Codecov's per-file `changes` gate reported this as "indirect coverage changes not visible in diff" and the failure was persistent and non-actionable through more unit tests. Fix (Go 1.20+): cmd.Cancel = func() error { return stdin.Close() } cmd.WaitDelay = 5 * time.Second cmd.Cancel overrides the default SIGKILL with a graceful stdin close. The LSP server's read loop exits on EOF, runs its own shutdown, and the Go runtime flushes coverage counters before the process exits. cmd.WaitDelay caps the wait if the subprocess somehow hangs after EOF — a guaranteed bounded outcome, not the previous "manually call p.wait() before every conceivable failure path" workaround. Also: - Update shutdown()'s comment to reflect that the race is now structurally impossible at the helper level; the explicit wait() is only there to confirm clean exit when reached. - Add TestLSPSubprocess_CoverFlushOnContextCancel: pins the invariant directly by spawning the LSP subprocess, cancelling its context without driving the protocol shutdown, and asserting GOCOVERDIR is non-empty after Wait returns. A regression that drops cmd.Cancel/WaitDelay fails this test before any downstream symptom (codecov drift, integration flakes) would surface.

With carryforward: true, the base commit inherits coverage from an even-earlier commit (or a different Go toolchain) whose block coordinates may differ from the head's by a column or two. Codecov maps blocks to lines and attributes coverage per line, so any column-level shift in block boundaries makes non-statement lines (closing braces, blank lines, type declarations) flip between "covered" (because the carryforward block edge included them) and "uncovered" (because the head's fresh block edge does not). Those flips surface as "indirect coverage changes not visible in diff" and trip the per-file `changes` status check — even when both sides measure 100% statement coverage of the same unchanged source. Concrete example traced on this PR: codecov flagged internal/rules/tablefmt/tablefmt.go lines 143-145, 156, 157 as indirect changes. Those lines are the closing brace of an inner for-loop, the closing brace of an outer for-loop, a blank line, the doc comment on `type table struct`, and the type declaration itself — none have executable statements, all are 100% according to `go test -cover`. The flap was pure carryforward bookkeeping. Turning carryforward off makes every PR base re-measured cleanly against the head's fresh upload. The `changes` gate continues to fire on real per-file coverage drops; spurious indirect-changes noise on metadata lines goes away.

Plan 195 (merged to main after this PR was originally opened) added TestCategory in the same file. After rebasing onto main, both my plan-196 TestCategory and the upstream version collided. Remove the duplicate; main's keeps its own assertion text.

Multiple reviewers flagged the same race: TestDefaultCacheDir_* called t.Parallel() while mutating the package-level userCacheDirFn seam, so a sibling t.Parallel test calling defaultCacheDir()/run() concurrently could observe the stubbed function. The seam was added to avoid the t.Setenv race on HOME/XDG_CACHE_HOME but introduced an equivalent global-mutation race of its own. Refactor to a parameter-injection shape: func defaultCacheDir() string { return resolveCacheDir(os.UserCacheDir) } func resolveCacheDir(getUserCacheDir func() (string, error)) string { ... } Tests now call resolveCacheDir(stub) directly. No global is mutated, so t.Parallel() is unconditionally safe regardless of which other tests run concurrently. Coverage stays at the same branches: TmpFallbackWhenUserCacheDirFails, EmptyUserCacheDir, plus a new HappyPath case.

Copilot

Pull request overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 22, 2026 18:45

Copilot started reviewing on behalf of jeduden May 22, 2026 18:45 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread internal/rules/paragraphreadability/equivalence_test.go Outdated

Comment thread internal/rules/paragraphreadability/equivalence_test.go Outdated

Comment thread internal/rules/paragraphstructure/rule.go Outdated

jeduden requested a review from Copilot May 22, 2026 19:02

Copilot started reviewing on behalf of jeduden May 22, 2026 19:02 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread plan/196_lazy-section-paragraph-text.md Outdated

jeduden requested a review from Copilot May 22, 2026 19:31

Copilot started reviewing on behalf of jeduden May 22, 2026 19:32 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread internal/rules/requiredtextpatterns/rule.go

Comment thread internal/rules/requiredmentions/rule.go

jeduden requested a review from Copilot May 23, 2026 06:06

Copilot started reviewing on behalf of jeduden May 23, 2026 06:06 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread internal/rules/astutil/astutil.go Outdated

Comment thread internal/mdtext/mdtext.go

jeduden requested a review from Copilot May 23, 2026 08:58

Copilot started reviewing on behalf of jeduden May 23, 2026 08:58 View session

jeduden force-pushed the claude/admiring-mendel-sm5Sx branch from 54ed7aa to ddf96aa Compare May 23, 2026 09:00

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread internal/rules/paragraphreadability/rule.go Outdated

Comment thread internal/rules/astutil/astutil_test.go Outdated

jeduden requested a review from Copilot May 23, 2026 09:40

Copilot started reviewing on behalf of jeduden May 23, 2026 09:40 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread internal/mdtext/mdtext.go Outdated

Comment thread internal/rules/paragraphstructure/rule.go

jeduden requested a review from Copilot May 23, 2026 09:52

Copilot started reviewing on behalf of jeduden May 23, 2026 09:52 View session

jeduden force-pushed the claude/admiring-mendel-sm5Sx branch from 7f0e0b3 to ff3c689 Compare May 24, 2026 00:22

Copilot started reviewing on behalf of jeduden May 24, 2026 00:22 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread cmd/corpusctl/main_test.go

claude added 22 commits May 24, 2026 00:39

jeduden requested a review from Copilot May 24, 2026 00:39

jeduden force-pushed the claude/admiring-mendel-sm5Sx branch from 50762e2 to 60348a0 Compare May 24, 2026 00:39

Copilot started reviewing on behalf of jeduden May 24, 2026 00:40 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

jeduden merged commit be06104 into main May 24, 2026
24 checks passed

Conversation

jeduden commented May 22, 2026

Summary

Key Changes

Performance Impact

Testing

Uh oh!

codecov Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 22, 2026 •

edited

Loading

codecov Bot commented May 23, 2026 •

edited

Loading