Skip to content

damacy rewrite PR-1: file split + prefetcher producer#121

Open
nclack wants to merge 9 commits into
worktree-prefetchfrom
damacy-rewrite
Open

damacy rewrite PR-1: file split + prefetcher producer#121
nclack wants to merge 9 commits into
worktree-prefetchfrom
damacy-rewrite

Conversation

@nclack
Copy link
Copy Markdown
Owner

@nclack nclack commented May 23, 2026

Reviewer: read dev/damacy_rewrite_pr1.md first. It's the load-bearing artifact — the function-by-function plan, the open questions, and the rationale for every decision.

This is PR-1 of a multi-PR rewrite. PR-2 deletes the legacy zarr caches and rewires the planner; PR-3 updates Python tests + docs. PR-1 changes who produces samples for the planner, but the planner itself still uses the legacy zarr_meta_cache / zarr_shard_cache so this lands without touching decode.

Glossary

  • damacy.c — the orchestrator TU, formerly ~1.2 kloc, now split into 5 (lifecycle / push / plan / pop / scheduler) plus damacy_internal.h.
  • prefetcher — the metadata-prefetch subsystem (PR Metadata prefetch layer #120) that runs on its own worker, popping samples from the lookahead and driving three caches.
  • plan_reserve / plan_run / plan_commit — the three planning phases inside damacy_scheduler_step for one batch.
  • lookahead — bounded FIFO of pushed samples that have not yet been consumed by the prefetcher.
  • array_meta_cache / shard_index_cache / chunk_layout_cache — the three prefetch_cache instances the prefetcher fills, keyed by uri / (uri, shard_coord) / uri respectively.
  • batch_id — monotonic ordinal assigned at push time (pushed_samples / batch_size); the prefetcher groups in-flight samples by this key, and plan_reserve only consumes terminal-state slots whose batch_id matches the next-to-plan batch.

Summary

Splits monolithic damacy.c into focused TUs (commits 2–4: pure moves, no behavior change) then rewires the data flow so the prefetcher is the producer for plan_reserve (commits 5–7). The old path drained samples straight from the lookahead inside plan_reserve; the new path pops prefetcher_ready slots that have already cleared the array-meta / shard-index / chunk-layout fetches, and the scheduler gate becomes prefetcher_batch_full_ready(...) instead of lookahead.size >= batch_size.

Where to look first after the plan doc: src/damacy_plan.c (the new plan_reserve — that's the load-bearing behavioral change). Then src/damacy_lifecycle.c for construction order (dedicated io_queue + 3 caches + 3 fetchers + prefetcher) and src/damacy_scheduler.c for the gate swap.

Non-obvious decisions worth a second look:

  1. Plan commits 7 + 8 are merged (damacy: producer is the prefetcher). The plan suggested splitting "rewire plan/scheduler" from "rewire pop/flush idle predicates" but the boundary isn't realizable — once prefetcher_start runs, the worker drains the lookahead immediately, so damacy_pop's old lookahead.size < batch_size AGAIN gate fires before the prefetcher can finish. Both predicates must flip together.

  2. plan_reserve suppresses DAMACY_NOTFOUND from the prefetcher's error path (see src/damacy_plan.c near the staging loop). The legacy planner independently handles missing chunks via the array's fill_value; if both layers errored on NOTFOUND, every shard-deleted test would regress. PR-2 will revisit when the planner consumes prefetch handles directly.

  3. chunk_layout returns success/NULL for non-blosc codecs instead of DAMACY_DECODE. The pre-existing code comment said "non-blosc arrays fall back to worst-case caps" but the code errored; aligning the code with the comment. Test renamed from test_non_blosc_codec_errorstest_non_blosc_codec_yields_no_layout.

  4. Prefetcher pop preserves push order via a new admit_seq field on struct prefetcher_slot. The original scan-and-take loop in pop_terminal_slot_locked would return whichever slot finished its fetches first, swapping intra-batch sample order; admit_seq makes pop deterministic.

  5. Three Python tests are pytest.skip'd in python/tests/test_damacy.py. They assert push-time URI / dtype / per-array-rank errors; those checks now surface at pop time. PR-3 rewrites them.

Test plan

  • Exercised the prefetcher pipeline end-to-end via the existing C corpus — test_damacy, test_damacy_blosc, test_damacy_caps, test_damacy_ctx (31/31 passing on every commit in the series).
  • test_multi_batch is the load-bearing one for the admission-order fix; without admit_seq it shows mixed-quadrant data corruption.
  • test_missing_shard_fills exercises the NOTFOUND-suppression path in plan_reserve (the shard file is unlink'd so the prefetcher errors but the planner emits fill).

@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2026

Codecov Report

❌ Patch coverage is 72.11949% with 196 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.75%. Comparing base (23d95f0) to head (f7bd5fa).

Files with missing lines Patch % Lines
src/damacy_pop.c 54.05% 64 Missing and 21 partials ⚠️
src/damacy_lifecycle.c 79.37% 33 Missing and 20 partials ⚠️
src/damacy_plan.c 75.42% 19 Missing and 10 partials ⚠️
src/damacy_push.c 63.04% 10 Missing and 7 partials ⚠️
src/damacy_scheduler.c 79.62% 5 Missing and 6 partials ⚠️
src/prefetch/prefetcher.c 97.56% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                  Coverage Diff                  @@
##           worktree-prefetch     #121      +/-   ##
=====================================================
+ Coverage              58.21%   58.75%   +0.54%     
=====================================================
  Files                     56       60       +4     
  Lines                   7948     8016      +68     
  Branches                1396     1400       +4     
=====================================================
+ Hits                    4627     4710      +83     
+ Misses                  2746     2743       -3     
+ Partials                 575      563      -12     
Flag Coverage Δ
unittests 58.75% <72.11%> (+0.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/prefetch/chunk_layout.c 65.07% <100.00%> (ø)
src/prefetch/prefetcher.c 81.74% <97.56%> (+3.93%) ⬆️
src/damacy_scheduler.c 79.62% <79.62%> (ø)
src/damacy_push.c 63.04% <63.04%> (ø)
src/damacy_plan.c 75.42% <75.42%> (ø)
src/damacy_lifecycle.c 79.37% <79.37%> (ø)
src/damacy_pop.c 54.05% <54.05%> (ø)

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant