Stream data table rows into context files and render each context once by krlittle · Pull Request #20 · openrewrite/rewrite-prethink

krlittle · 2026-06-01T19:43:45Z

Summary

ExportContext now renders each context once per cycle and caches the rendered output on its accumulator, so generate() and getVisitor() reuse it instead of re-aggregating the same data tables for every visited context file and again in the forced second export cycle (reads per table drop from 2 × (F + 2) to 2, for F context CSVs)
It now streams rows straight from the store into the CSV writer one row at a time instead of collecting the whole table into a List first; column headers come from DataTable.getType(), so no rows are needed up front
Output is byte-identical — only the redundant re-reads and the per-render row buffering are removed; the cached value is the finished output string (which becomes the generated PlainText and must exist anyway), never the table's rows

Problem

ExportContext reads its referenced data tables back to write .moderne/context/*.csv. It re-read and re-rendered each table once in generate(), once for every context CSV visited in getVisitor(), and again in the forced second cycle — roughly 2 × (F + 2) full reads per table (~42× for a repo emitting 19 context CSVs) — and each read materialized the whole table into a List (via aggregateMatchingTables) and built the entire CSV as one String. For large repositories (one row per method/class) both CPU and the export phase's peak memory scale with table size, repeated dozens of times per repo.

Solution

Render each context a single time per cycle, cache the resulting CSV/markdown on the accumulator, and have generate()/getVisitor() read from that cache. Build each CSV by streaming getRows(...) directly into the writer one row at a time rather than buffering a List. Pairs with Stream CsvDataTableStore.getRows from disk lazily instead of buffering the whole table rewrite#7858, which makes CsvDataTableStore.getRows itself parse lazily; once released, the store side streams too and a whole table is never held in memory at any layer.

Test plan

Existing ExportContextTest passes unchanged, including the exact CSV-row + markdown content assertions in aggregatesRowsFromMultipleInstancesOfSameDataTable and the cycle-trigger regression — confirming byte-identical output
Adds aggregatesEachReferencedTableExactlyOncePerRun, which installs a counting DataTableStore and asserts each referenced table is read once per export cycle (2 total), not once per visited context file

Stream data table rows into context files and render each context once

d217348

moderne-meeseeks Bot added this to OpenRewrite Jun 1, 2026

moderne-meeseeks Bot assigned krlittle Jun 1, 2026

github-project-automation Bot moved this to In Progress in OpenRewrite Jun 1, 2026

krlittle mentioned this pull request Jun 1, 2026

Render each context once per cycle instead of re-reading per visited file #21

Draft

2 tasks

Tighten comments to single lines

51fb8e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream data table rows into context files and render each context once#20

Stream data table rows into context files and render each context once#20
krlittle wants to merge 2 commits into
mainfrom
prethink-ExportContext-stream-and-memoize

krlittle commented Jun 1, 2026 •

edited by moderne-meeseeks Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krlittle commented Jun 1, 2026 • edited by moderne-meeseeks Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krlittle commented Jun 1, 2026 •

edited by moderne-meeseeks Bot

Loading