You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sequel to #118. That RFC unified the four read loops behind one Enumerate<TSource, TFactory, TRow>; the TRow row types those factories produce are still four near-duplicate classes:
All four own the same concept: lazy RawFields = Split(raw) → validate column count → lazy ParsedValues = Trim(rawFields) → access by name/index under the ReturnEmptyForMissingColumn policy. They differ only in (a) backing store (string vs ReadOnlyMemory<char>) and (b) which split/trim they call (Split/Trim vs SplitLineOptimized/TrimOptimized). ReadLineSpan and ReadLineSpanOptimized are ~140 lines each and differ essentially only in storage representation.
The duplication is concrete and drift-prone — three error-message string literals are copy-pasted 3–4× each:
A fix to the missing-column message or the count-validation logic must land in four places. This is exactly the erosion #118 warned about ("a fifth row type would force the maintainer to copy the loop a fifth time and re-prove all the invariants").
It is a deepening opportunity in Ousterhout's sense: four shallow modules where one deep module belongs. The interface ("a parsed row: headers, values, access by name/index") is small; the implementation (lazy split → validate → trim → lookup-with-policy → identical errors) is large; the only thing that should vary is the storage/strategy seam.
Test cost today:EngineUnificationTests.cs (675 lines, 20 tests) and IssuesTests#114 exist largely to assert that all five read paths emit identical Headers/Values/Raw. That cross-path identity is currently a test-enforced hope; one row type makes it a structural guarantee.
Proposed Interface
Internal-only. The three public interfaces (ICsvLine, ICsvLineSpan, ICsvLineFromMemory) and every CsvReader/CsvWriter static method are unchanged. This is the hybrid chosen after a parallel design exploration of four divergent shapes (see docs/Architecture-Deepening-Candidates.md): a single MemoryText-backed generic row (the "zero-alloc-first" spine) hardened with the allocation/empty-value discipline that the "common-caller-first" two-tier design got right.
// The ONLY axis that genuinely varies on the memory side: pooled vs plain split/trim.internalinterfaceITrimSplit{IList<MemoryText>Split(MemoryTextraw,CsvOptionsoptions,int?capacity);MemoryText[]Trim(IList<MemoryText>fields,CsvOptionsoptions);}internalreadonlystructDefaultTrimSplit:ITrimSplit// ns2.0 + the net8 string/memory paths{publicIList<MemoryText>Split(MemoryTextraw,CsvOptionso,int?cap)=>CsvReader.SplitLine(raw,o,cap);publicMemoryText[]Trim(IList<MemoryText>f,CsvOptionso)=>CsvReader.Trim(f,o);}
#if NET8_0_OR_GREATERinternalreadonlystructOptimizedTrimSplit:ITrimSplit// the ArrayPool / CsvMemoryOptions path{privatereadonlyCsvMemoryOptionsmem;publicOptimizedTrimSplit(CsvMemoryOptionsmem)=>this.mem=mem;publicIList<MemoryText>Split(MemoryTextraw,CsvOptionso,int?cap)=>CsvReader.SplitLineOptimized(raw,o,mem,cap);publicMemoryText[]Trim(IList<MemoryText>f,CsvOptionso)=>CsvReader.TrimOptimized(f,o,mem);}
#endif
// The ONE row. Backing store is always MemoryText (= ReadOnlyMemory<char> on net8, = string on ns2.0// via the existing alias). On net8 it carries the UNION of all three frozen interfaces; the two// this[string]/this[int] indexers differ only by return type, disambiguated by EXPLICIT interface// implementation — legal C#, and the load-bearing trick that makes one type serve all paths.internalsealedclassCsvLine<TPolicy>:ICsvLine
#if NET8_0_OR_GREATER
,ICsvLineSpan,ICsvLineFromMemory
#endif
whereTPolicy:struct,ITrimSplit{privatereadonlyDictionary<string,int>headerLookup;privatereadonlyCsvOptionsoptions;privatereadonlyTPolicypolicy;// struct field -> JIT-specialized, no virtual dispatchprivatereadonlyMemoryText[]headers;privatereadonlyMemoryTextrawMemory;// the single backing fieldprivatereadonlystring?rawString;// FIX 3: original string when the source has oneinternalIList<MemoryText>?rawFields;// settable by the engine (multiline pre-split)internalMemoryText[]?parsedValues;// FIX 2: stays INTERNAL so GetBlock can pre-seedprivatestaticreadonlyMemoryTextEmpty="".AsMemory();// FIX 1: "" on ns2.0, ROM.Empty on net8// string Raw returns the cached original on the TextReader paths (zero-alloc), materializes only// on the ROM-backed optimized path — matching today's per-path behavior exactly.publicstringRaw=>rawString??rawMemory.ToString();// The single copy of the by-name lookup + missing-column policy + the three error strings.privateMemoryTextGet(stringname){if(!headerLookup.TryGetValue(name,outvari)){if(options.ReturnEmptyForMissingColumn)returnEmpty;// FIX 1: not `default!`thrownewArgumentOutOfRangeException(nameof(name),name,$"Header '{name}' does not exist. Expected one of {string.Join("; ",StringHeaders())}");}try{returnParsedValues[i];}catch(IndexOutOfRangeException){thrownewInvalidOperationException($"Invalid row, missing {name} header, expected {headers.Length} columns, got {ParsedValues.Length} columns.");}}// ICsvLine string projections call Get(name).AsString(); ICsvLineSpan returns Get(name)/.Span;// ICsvLineFromMemory (explicit) returns Get(name) — one resolver, three faces.}
Usage — factories retarget; public bodies unchanged
internalreadonlystructStringRowFactory:IRowFactory<CsvLine<DefaultTrimSplit>>{/* string source: rawString = the line */}
#if NET8_0_OR_GREATERinternalreadonlystructSpanRowFactory:IRowFactory<CsvLine<DefaultTrimSplit>>{/* string source -> ICsvLineSpan */}internalreadonlystructMemoryRowFactory:IRowFactory<CsvLine<DefaultTrimSplit>>{/* ROM source -> ICsvLineFromMemory */}internalreadonlystructOptimizedRowFactory:IRowFactory<CsvLine<OptimizedTrimSplit>>{/* ROM verbatim, NO ToString */}
#endif
// Enumerate returns IEnumerable<CsvLine<...>>; IEnumerable<out T> covariance assigns it to// IEnumerable<ICsvLine> / <ICsvLineSpan> / <ICsvLineFromMemory> with no wrapper and no Cast,// because CsvLine<...> implements all three. Public Read* method bodies are byte-for-byte unchanged.// GetBlock (CsvReader.cs:445) — the one internal construction site beyond the factories:// new CsvLine<DefaultTrimSplit>(headers, map, index, raw, default, options) { parsedValues = values }
The three corrections (each pins a defect the adversarial review found in the raw design)
Empty value (ns2.0 correctness). Replace return default! with static readonly MemoryText Empty = "".AsMemory(). default! is string.Empty on net8 but null on ns2.0 (where MemoryText = string), silently changing ReturnEmptyForMissingColumn from "" to null. "".AsMemory() is correct on both targets. (Tests run net9-only, so this would otherwise ship undetected.)
GetBlock. Keep parsedValuesinternal and preserve the { parsedValues = values } initializer at CsvReader.cs:445; otherwise the pre-extracted sub-line values are discarded and the row re-splits a synthesized Raw, producing wrong columns.
Raw identity / allocation parity. Carry the engine-supplied original line string in rawString; Raw => rawString ?? rawMemory.ToString(). Without this, a uniformly MemoryText-backed row re-materializes Raw via new string(span) on the Read/ReadAsSpanTextReader paths, regressing today's zero-alloc Raw. The optimized factory passes no rawString, so that path stays ROM-verbatim.
Dependency Strategy
In-process — merged directly; no ports, adapters, or stand-ins. The split/trim variation (SplitLine/Trim vs SplitLineOptimized/TrimOptimized) and its CsvMemoryOptions are wired via the injected value-type strategyTPolicy : struct, ITrimSplit — not a runtime branch and not a captured delegate. CsvMemoryOptions lives only inside OptimizedTrimSplit's field, so non-pooled rows carry zero extra state (parity with today, where only ReadLineSpanOptimized holds it). The existing free helpers SplitLine/Trim/SplitLineOptimized/TrimOptimized/TrimMemory keep their signatures; the strategies just call them.
Testing Strategy
New boundary tests to add:
A ns2.0-target assertion that ReturnEmptyForMissingColumn returns "" (not null) — closes the coverage gap that would hide FIX 1. (Either add ns2.0 to the test TFM, or a targeted unit test.)
A GetBlock test asserting pre-seeded sub-line values survive and are not re-split — guards FIX 2.
An allocation tripwire that line.Raw on Read/ReadAsSpan over a TextReader does not allocate — guards FIX 3 (extends the existing PerformanceTests allocation-comparison style).
Existing tests to keep (now load-bearing differently):EngineUnificationTests.cs already asserts identical Headers/Values/Raw across all five paths at the public boundary — it stays, but its cross-path duplication now verifies the unification rather than papering over four hand-written classes. No test needs deletion; none construct a row class directly.
Test environment: none — pure in-process computation.
Implementation Recommendations
Owns: the lazy split → validate → trim pipeline; the missing-column / short-row policy; the three error strings (now single-copy); the storage representation.
Hides: whether the raw line is a cached string, a string-view, or a never-materialized ReadOnlyMemory<char> slice; the pooled-vs-plain split/trim choice (behind ITrimSplit); the dual-interface explicit implementation.
Exposes: the three frozen interfaces, unchanged.
Migration: internal-only. Delete ReadLineSpan, ReadLineSpanOptimized, ReadLineFromMemory; rename ReadLine → CsvLine<DefaultTrimSplit>; retarget the four IRowFactory structs to the closed generic; update the single GetBlock call site. No public consumer changes. Expected ≈465 → ≈210 internal LOC.
Performance / AOT: no virtual dispatch added — TPolicy : struct, ITrimSplit is JIT-specialized and inlined exactly like the existing IRowFactory structs, and Enumerate's constraints are unchanged. No reflection; the two closed instantiations (CsvLine<DefaultTrimSplit>, CsvLine<OptimizedTrimSplit>) are trim-rooted by the public Read* methods. The optimized path keeps its ROM-verbatim backing, SplitLineOptimized/TrimOptimized, and one-heap-object-per-row profile.
Accepted behavioral note: on net8, a Read()/ReadAsSpan() row now also satisfies ICsvLineFromMemory at runtime (one type implements all three). Internal-only; the declared public return types are unchanged. A consumer relying on a negative cast would observe a change — documented as accepted, consistent with RFC: Unify the four CsvReader read loops behind one JIT-devirtualized engine #118's "same concrete type per path" consequence.
Comment warranted: explicit-interface implementation is load-bearing here (two this[string] differing only by return type on one class) — one short comment should explain why both exist.
🤖 Generated with Claude Code — produced by an /improve-codebase-architecture exploration (Candidate 1 of docs/Architecture-Deepening-Candidates.md), with the interface chosen via a 4-design parallel exploration + adversarial judging.
Problem
Sequel to #118. That RFC unified the four read loops behind one
Enumerate<TSource, TFactory, TRow>; theTRowrow types those factories produce are still four near-duplicate classes:ReadLine—CsvReader.cs:448(~98 LOC,string-backed,ICsvLine)ReadLineSpan—CsvReader.cs:549(~140 LOC,string-backed,ICsvLineSpan)ReadLineSpanOptimized—CsvReader.cs:690(~143 LOC,ReadOnlyMemory<char>-backed +CsvMemoryOptionsArrayPool,ICsvLineSpan)ReadLineFromMemory—CsvReader.FromMemory.cs:22(~86 LOC,ReadOnlyMemory<char>-backed,ICsvLineFromMemory)All four own the same concept: lazy
RawFields = Split(raw)→ validate column count → lazyParsedValues = Trim(rawFields)→ access by name/index under theReturnEmptyForMissingColumnpolicy. They differ only in (a) backing store (stringvsReadOnlyMemory<char>) and (b) which split/trim they call (Split/TrimvsSplitLineOptimized/TrimOptimized).ReadLineSpanandReadLineSpanOptimizedare ~140 lines each and differ essentially only in storage representation.The duplication is concrete and drift-prone — three error-message string literals are copy-pasted 3–4× each:
"Expected {Headers.Length}, got {raw.Count} columns."(:506,:601,:745, FromMemory:68)"Header '{name}' does not exist. Expected one of …"(:525,:621,:765, FromMemory:87)"Invalid row, missing {name} header, expected … columns, got … columns."(:534,:630,:774, FromMemory:96)A fix to the missing-column message or the count-validation logic must land in four places. This is exactly the erosion #118 warned about ("a fifth row type would force the maintainer to copy the loop a fifth time and re-prove all the invariants").
It is a deepening opportunity in Ousterhout's sense: four shallow modules where one deep module belongs. The interface ("a parsed row: headers, values, access by name/index") is small; the implementation (lazy split → validate → trim → lookup-with-policy → identical errors) is large; the only thing that should vary is the storage/strategy seam.
Test cost today:
EngineUnificationTests.cs(675 lines, 20 tests) andIssuesTests#114 exist largely to assert that all five read paths emit identicalHeaders/Values/Raw. That cross-path identity is currently a test-enforced hope; one row type makes it a structural guarantee.Proposed Interface
Internal-only. The three public interfaces (
ICsvLine,ICsvLineSpan,ICsvLineFromMemory) and everyCsvReader/CsvWriterstatic method are unchanged. This is the hybrid chosen after a parallel design exploration of four divergent shapes (seedocs/Architecture-Deepening-Candidates.md): a singleMemoryText-backed generic row (the "zero-alloc-first" spine) hardened with the allocation/empty-value discipline that the "common-caller-first" two-tier design got right.Usage — factories retarget; public bodies unchanged
The three corrections (each pins a defect the adversarial review found in the raw design)
return default!withstatic readonly MemoryText Empty = "".AsMemory().default!isstring.Emptyon net8 butnullon ns2.0 (whereMemoryText=string), silently changingReturnEmptyForMissingColumnfrom""tonull."".AsMemory()is correct on both targets. (Tests run net9-only, so this would otherwise ship undetected.)GetBlock. KeepparsedValuesinternal and preserve the{ parsedValues = values }initializer atCsvReader.cs:445; otherwise the pre-extracted sub-line values are discarded and the row re-splits a synthesizedRaw, producing wrong columns.Rawidentity / allocation parity. Carry the engine-supplied original linestringinrawString;Raw => rawString ?? rawMemory.ToString(). Without this, a uniformlyMemoryText-backed row re-materializesRawvianew string(span)on theRead/ReadAsSpanTextReaderpaths, regressing today's zero-allocRaw. The optimized factory passes norawString, so that path stays ROM-verbatim.Dependency Strategy
In-process — merged directly; no ports, adapters, or stand-ins. The split/trim variation (
SplitLine/TrimvsSplitLineOptimized/TrimOptimized) and itsCsvMemoryOptionsare wired via the injected value-type strategyTPolicy : struct, ITrimSplit— not a runtime branch and not a captured delegate.CsvMemoryOptionslives only insideOptimizedTrimSplit's field, so non-pooled rows carry zero extra state (parity with today, where onlyReadLineSpanOptimizedholds it). The existing free helpersSplitLine/Trim/SplitLineOptimized/TrimOptimized/TrimMemorykeep their signatures; the strategies just call them.Testing Strategy
ReturnEmptyForMissingColumnreturns""(notnull) — closes the coverage gap that would hide FIX 1. (Either add ns2.0 to the test TFM, or a targeted unit test.)GetBlocktest asserting pre-seeded sub-line values survive and are not re-split — guards FIX 2.line.RawonRead/ReadAsSpanover aTextReaderdoes not allocate — guards FIX 3 (extends the existingPerformanceTestsallocation-comparison style).EngineUnificationTests.csalready asserts identicalHeaders/Values/Rawacross all five paths at the public boundary — it stays, but its cross-path duplication now verifies the unification rather than papering over four hand-written classes. No test needs deletion; none construct a row class directly.Implementation Recommendations
split → validate → trimpipeline; the missing-column / short-row policy; the three error strings (now single-copy); the storage representation.string, astring-view, or a never-materializedReadOnlyMemory<char>slice; the pooled-vs-plain split/trim choice (behindITrimSplit); the dual-interface explicit implementation.ReadLineSpan,ReadLineSpanOptimized,ReadLineFromMemory; renameReadLine→CsvLine<DefaultTrimSplit>; retarget the fourIRowFactorystructs to the closed generic; update the singleGetBlockcall site. No public consumer changes. Expected ≈465 → ≈210 internal LOC.TPolicy : struct, ITrimSplitis JIT-specialized and inlined exactly like the existingIRowFactorystructs, andEnumerate's constraints are unchanged. No reflection; the two closed instantiations (CsvLine<DefaultTrimSplit>,CsvLine<OptimizedTrimSplit>) are trim-rooted by the publicRead*methods. The optimized path keeps its ROM-verbatim backing,SplitLineOptimized/TrimOptimized, and one-heap-object-per-row profile.Read()/ReadAsSpan()row now also satisfiesICsvLineFromMemoryat runtime (one type implements all three). Internal-only; the declared public return types are unchanged. A consumer relying on a negative cast would observe a change — documented as accepted, consistent with RFC: Unify the four CsvReader read loops behind one JIT-devirtualized engine #118's "same concrete type per path" consequence.this[string]differing only by return type on one class) — one short comment should explain why both exist.🤖 Generated with Claude Code — produced by an
/improve-codebase-architectureexploration (Candidate 1 ofdocs/Architecture-Deepening-Candidates.md), with the interface chosen via a 4-design parallel exploration + adversarial judging.