Skip to content

refactor: collapse four CsvReader row classes into one CsvLine<TPolicy> (#132)#133

Merged
stevehansen merged 2 commits into
masterfrom
refactor/csvline-row-unification
Jun 23, 2026
Merged

refactor: collapse four CsvReader row classes into one CsvLine<TPolicy> (#132)#133
stevehansen merged 2 commits into
masterfrom
refactor/csvline-row-unification

Conversation

@stevehansen

Copy link
Copy Markdown
Owner

Implements #132 — the row-side sequel to #118.

What

The four near-identical row classes the engine factories produced — ReadLine, ReadLineSpan, ReadLineSpanOptimized, ReadLineFromMemory (~465 LOC) — collapse into one internal sealed class CsvLine<TPolicy> (184 LOC). They owned the same concept (lazy split → validate → trim → lookup-with-policy) and differed only in backing store and which split/trim they called. That seam is now a value-type strategy TPolicy : struct, ITrimSplit:

  • DefaultTrimSplit — all targets (the ns2.0, net8 string, and FromMemory paths)
  • OptimizedTrimSplit — net8-only, holds the CsvMemoryOptions for the ArrayPool path

Both are JIT-devirtualized exactly like the existing IRowFactory structs — no virtual dispatch, AOT/trim-safe.

On net8 the single class implements the union of all three frozen interfaces; the Headers/Values/Raw/this[] members that collide only by return type (string vs ReadOnlyMemory<char>) are disambiguated by explicit interface implementation. On netstandard2.0 it implements only ICsvLine + DefaultTrimSplit. IEnumerable<out T> covariance adapts the closed generic to each public return type — no wrapper, no cast.

Internal-only — public API frozen

No diff to ICsvLine / ICsvLineSpan / ICsvLineFromMemory or any CsvReader static method. The three previously-duplicated error strings are now single-copy with byte-identical runtime output.

Three hardening fixes (from the adversarial design review)

  1. Empty value uses "".AsMemory(), not default! — on ns2.0 (MemoryText=string) default is null and would silently regress ReturnEmptyForMissingColumn from "" to null.
  2. parsedValues stays internal so GetBlock pre-seeds sub-line values via the object initializer (otherwise the row re-splits a synthesized Raw and yields wrong columns).
  3. Raw zero-alloc parity — original line cached in rawString; Raw => rawString ?? rawMemory.ToString() keeps Raw allocation-free on the TextReader paths while the memory paths still materialize on demand.

Verification

  • Build: clean on net8.0 / netstandard2.0 / net9.0, 0 errors, 0 new warnings (the pre-existing options.Splitter CS8602s are unchanged — confirmed against the baseline tree).
  • Tests: 193/193 passing. Adds Csv.Tests/RowUnificationTests.cs (10 tests) pinning all three fixes, incl. a MemoryMarshal.TryGetString + Assert.AreSame zero-copy aliasing check for FIX 3. Existing EngineUnificationTests cross-path identity assertions now verify a structural guarantee rather than papering over four hand-written classes.
  • Review (code quality): Looks-good — member-by-member behavior parity vs the four originals confirmed; no Blocking/Should-fix.
  • Verify (spec conformance): Conforms on all 14 requirements.

Also includes docs/Architecture-Deepening-Candidates.md — the exploration write-up that produced this RFC.

🤖 Generated with Claude Code

…y> (#132)

Sequel to #118 (engine-loop unification). The four near-identical row
types the factories produced — ReadLine, ReadLineSpan,
ReadLineSpanOptimized, ReadLineFromMemory (~465 LOC) — are replaced by a
single MemoryText-backed `internal sealed class CsvLine<TPolicy>` (184
LOC). They owned the same concept (lazy split -> validate -> trim ->
lookup-with-policy) and differed only in storage and which split/trim
they called; that seam is now the value-type strategy
`TPolicy : struct, ITrimSplit` (DefaultTrimSplit on all targets,
OptimizedTrimSplit on net8 for the ArrayPool path), JIT-devirtualized
like the existing IRowFactory structs.

On net8 the one class carries the union of all three frozen interfaces;
the indexers/Headers/Values/Raw that collide only by return type are
disambiguated via explicit interface implementation. ns2.0 compiles only
ICsvLine + DefaultTrimSplit. IEnumerable<out T> covariance adapts the
closed generic to each public return type with no wrapper or cast.

Internal-only: public API frozen (no diff to ICsvLine/ICsvLineSpan/
ICsvLineFromMemory or any CsvReader static method). The three duplicated
error strings are now single-copy with byte-identical output.

Three hardening fixes from the adversarial design review:
- Empty value uses `"".AsMemory()`, not `default!` — on ns2.0
  (MemoryText=string) `default` is null and would silently regress
  ReturnEmptyForMissingColumn from "" to null.
- parsedValues stays internal so GetBlock can pre-seed sub-line values
  via the object initializer (else the row re-splits a synthesized Raw).
- Original line string cached in rawString; `Raw => rawString ??
  rawMemory.ToString()` keeps Raw zero-alloc on the TextReader paths.

Adds Csv.Tests/RowUnificationTests.cs (10 tests) pinning the three fixes.
Full suite 193/193; clean build on net8.0/netstandard2.0/net9.0 with zero
new warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LLZ5zrGZVo2xi6rw6w4z3f

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request unifies four near-identical row classes (ReadLine, ReadLineSpan, ReadLineSpanOptimized, and ReadLineFromMemory) into a single generic class CsvLine implementing ICsvLine, ICsvLineSpan, and ICsvLineFromMemory. It introduces the ITrimSplit interface along with DefaultTrimSplit and OptimizedTrimSplit structs to encapsulate splitting and trimming behaviors. Additionally, a new test suite (RowUnificationTests.cs) has been added to verify the unified row implementation, and documentation (Architecture-Deepening-Candidates.md) has been provided to detail the architectural reasoning. No review comments were provided, so there is no feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

The main Csv.Tests project targets net9.0 only, so the netstandard2.0
behavioral path of CsvLine<TPolicy> -- where MemoryText == string and the
ReturnEmptyForMissingColumn "" vs null contract lives -- was verified by
inspection alone. netstandard2.0 is not runnable, so this adds a net48 test
project: net48 is compatible with netstandard2.0 but not net8/net9, so the
Csv reference resolves to the ns2.0 asset.

It links RowUnificationTests.cs (its #if-guarded Span/Memory cases drop out
under net48, leaving the string-path FIX 1/2/3 tests) and adds a small
contract suite, including a check that ICsvLineSpan (net8+ only) is absent
from the loaded assembly -- proving the tests hit the ns2.0 build.

8 tests pass on net48; the existing 193 net9.0 tests are unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LLZ5zrGZVo2xi6rw6w4z3f
@stevehansen

Copy link
Copy Markdown
Owner Author

/argus review

@argus-pr argus-pr Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@argus-pr review — 5d81ac8 · model mistral-medium-3-5

Summary — This PR refactors the CSV reader by consolidating four near-identical row classes into a single generic CsvLine&lt;TPolicy&gt; class, reducing code duplication from ~465 to ~210 lines. The refactoring uses a strategy pattern with ITrimSplit to handle different splitting behaviors and explicit interface implementation to support multiple interfaces on .NET 8.0+. Three hardening fixes address edge cases: proper empty value handling for netstandard2.0, maintaining GetBlock's pre-seeded values, and preserving zero-allocation behavior for Raw property on TextReader paths.

Found 2 inline finding(s).

Severity Posted Suppressed
CRITICAL 0 0
HIGH 0 0
MEDIUM 2 0
LOW 0 3
Walkthrough (8 files)
File Change
Csv/CsvReader.cs Modified (+75/-267)
docs/Architecture-Deepening-Candidates.md Added (+150/-0)
Csv/CsvReader.FromMemory.cs Modified (+1/-90)
Csv.sln Modified (+37/-3)
Csv/CsvReader.Engine.cs Modified (+12/-16)
Csv.Tests/RowUnificationTests.cs Added (+212/-0)
Csv.Tests.NetStandard/NetStandardContractTests.cs Added (+65/-0)
Csv.Tests.NetStandard/Csv.Tests.NetStandard.csproj Added (+42/-0)

🤖 Reviewed by @argus-pr (AI, model mistral-medium-3-5) — automated, may be wrong. Reply @argus-pr help for commands.

Comment thread Csv/CsvReader.cs
Comment thread Csv/CsvReader.cs
@stevehansen stevehansen merged commit a9dda51 into master Jun 23, 2026
3 checks passed
@stevehansen stevehansen deleted the refactor/csvline-row-unification branch June 23, 2026 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Collapse the four CsvReader row classes into one JIT-devirtualized CsvLine<TPolicy>

1 participant