Coverage round 6 + 1 src/ bug fix (vec insert_at)#211
Merged
Conversation
The shift loop at vec.c:571-585 was vestigial — left over from when nulls were tracked via a separate bitmap. Today every vec type that accepts HAS_NULLS encodes its null marker IN the payload via a type-specific sentinel (NULL_I64 / NULL_I32 / NULL_F64 / NaN / etc., see sentinel_is_null + ray_vec_is_null comment "Sentinels are the sole source of truth"). The memmove at line 559 already moved every payload byte, including those sentinels, to its new slot. The shift loop then read ray_vec_is_null(v, i) on a slot whose memory had just become the *next* slot's data, mis-identified the moved sentinel as still sitting at the old index, and called ray_vec_set_null_checked(v, i+1) which wrote NULL_* on top of the real (correctly-moved) value at i+1. Reproduced by test_vec_insert_at_shift_nulls: build [10, null, 30], insert 99 at index 1; expected [10, 99, null, 30], actual [10, 99, null, null(clobbered)]. Fix: delete the loop. Updated the regression test (which had been asserting the buggy output to make a green check) to assert [10, 99, null, 30] correctly — fails before this fix, passes after. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/io/csv.c (81.39% regions / 79.82% lines after round 1) — pushes further by exercising paths the csv agent's round-2 attempt couldn't reach due to an API-overload abort. - Large-CSV round-trip (10000 rows) — triggers csv_parse_fn parallel dispatch (csv.c:902) and build_row_offsets / build_row_offsets_limited realloc paths (>initial_est rows). - Quoted field with embedded newline — slow path through scan_field_quoted + build_row_offsets. - GUID round-trip — csv_write_guid + fast_guid. - .csv.splayed CSV → splayed dir — csv_splayed_writer_open / append / close (csv.c:1834+). - Trailing-comma-no-newline + empty middle cells — boundary paths in csv_parse_serial and build_row_offsets_limited. - Full temporal round-trip (DATE/TIME/TIMESTAMP) — csv_write_date / csv_write_time / csv_write_timestamp. - Long string fields — string-pool growth in csv_intern_strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/mem/arena.c: 81.11% → 93.33% region coverage. 9 new C tests in test/test_arena.c exercising: - tiny chunk size clamp (ray_arena_new chunk_size < 256) - NULL/overflow guards in ray_arena_alloc / ray_arena_reserve / ray_arena_total_used / ray_arena_reset - ray_arena_reserve bytes==0 early-return - ray_arena_reserve oversize bump path - multi-chunk loop in ray_arena_total_used 6 remaining missed regions are all OS-level allocation failure paths (ray_sys_alloc → NULL) — structurally unreachable in tests without fault injection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/core/block.c: 65.38% → 84.62% region coverage. 8 new C tests in test/test_block.c covering: - ray_block_size RAY_LIST / RAY_DICT branches - ray_block_size RAY_SEL full computation + nrows=0 + nrows<0 paths - ray_block_size out-of-range type guard (t >= RAY_TYPE_COUNT) - ray_block_copy on LIST / SEL blocks 8 remaining regions are all unreachable: ray_alloc weak stub (replaced at link time by buddy allocator), OOM cleanup in ray_block_copy / ray_retain_owned_refs (requires fault injection). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/lang/eval.c: 80.10% → 88.20% region coverage. New file test/rfl/hof/eval_coverage2.rfl covers: - atomic_map_unary boxed-list fallback (non-numeric per-elem fn) - SYM W8/W16/W64 fast paths in atomic_map_binary_op (==/!= on SYM vecs of varying widths) - SYM null-atom path (null sym vs non-null SYM vec) - SYM general path with nulls (0Ns entries in SYM vec) - numeric_atom_i64 I32/I16/BOOL branches (affine sum with 1i, 2h, true) - affine_sum_cache hit path (double call on same vector) - let with lazy value / if with lazy condition (eval-time materialise) - error propagation in boxed-list atomic_map_unary (per-elem raise) - ray_cond_fn n<2 / ray_let_fn type-error guard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/ops/collection.c: 80.90% → 85.07% region coverage. 6 new files (cov2-cov7) covering: - distinct_sort_cmp default branch (F32 typed via CSV) — bypass count_distinct idiom rewriter by materialising distinct first - parted_to_flat_vec STR branch via parted-STR distinct - ray_find_fn collection-val error cleanup path - atomic_map_* corner cases - list/dict iteration edge cases Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/ops/linkop.c: 81.03% → 93.57% regions, 79.57% → 95.70% lines. New file test/rfl/linkop/coverage.rfl covers: - ray_col_link_fn error paths (non-sym target, non-vec int_vec) - ray_link_attach with HAS_INDEX already set (index-first then link) - ray_link_detach with HAS_INDEX still set - ray_link_deref with I32 link column (as 'I32 row-id vec) - Negative rid in link column → null result - Null in target column propagates to deref result - Null-sentinel fill paths for F64 / I32 / DATE / TIME / I16 Documented unreachable: ray_link_attach NULL/error-vec guard (caller already checks RAY_IS_ERR), slice-attach guard (no public RFL slice), negative target_sym_id guard (interned IDs always ≥0), sym_dict propagation (latent dead code — sym_dict field never initialized non-NULL anywhere; in-flight feature inherited from teide). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/ops/traverse.c: 76.69% → 76.79% (+3 regions; remaining 721 are mostly OOM guards documented as structurally unreachable). Extended/created: - traverse_coverage.rfl: exec_expand direction==1/2, exec_dijkstra with explicit dst (early-exit) and dst==src, exec_k_shortest dup detection, exec_var_expand direction==2 on K4/Star - traverse_weighted.rfl: weighted graph algorithms — Dijkstra-dst, k-shortest dup, MST rank-union, random walk, var_expand dir 1/2, expand dir 0/1/2, OOB dijkstra error - graph_algos_advanced.rfl: sampled betweenness/closeness, self-loop DFS, K3 cluster coefficient, 10-hop shortest path, k-shortest mid swap, connected components, multi-source var_expand Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/ops/group.c (72.45%): pushes toward higher coverage via: - group_key_types.rfl: multi-key GROUP BY combinations (I64+SYM, BOOL+I32, etc), sentinel handling per type, all-null groups, finalize-nulls paths - group_parallel_aggs.rfl: parallel dispatch (≥100k rows), aggregator combos (sum/avg/min/max/count/first/last in same query), holistic aggs (med, var/stddev) over groups, row-form fast paths Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/ops/temporal.c: 72.15% → 76.89% region coverage. Extended cross_cast_period.rfl, dag_extract_trunc.rfl, extract.rfl with deeper edge cases. Documented hard RFL ceiling at 76.89% — remaining 337 regions need either src/ changes (add MINUTE/HOUR/YEAR trunc bindings, add EPOCH field dispatch) or C-level tests: - RAY_EXTRACT_EPOCH never registered as DAG field - ray_temporal_trunc_from_sym maps only date→DAY and time→SECOND; no RFL syntax invokes MINUTE/HOUR/YEAR truncation - r<0 mathematical impossibility branches in DATE/TIME (always ≥0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The query agent's round-5 run ran for 9h (2278 tool uses) before aborting on API overload, but did write 551+ lines of additional tests across 5 files in test/rfl/query/. Net coverage delta on query.c is included in the round-5 baseline. Files touched: - query_dag_agg_coverage.rfl (+89 lines) - query_emit_filter_coverage.rfl (+63 lines) - query_evalgroup_coverage.rfl (+162 lines) - query_sort_take_coverage.rfl (+12 lines) - query_update_coverage.rfl (+239 lines: insert into non-TABLE error paths at query.c:9057-9079, 9121-9194) All assertions are passing post-fix. No BUG/xfail markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Round 6 coverage push since #210 merged, plus one more real bug surfaced and fixed via TDD.
Coverage delta (vs PR #210 baseline):
Per-file standout deltas:
src/ops/linkop.csrc/mem/arena.csrc/lang/eval.csrc/core/block.csrc/vec/vec.csrc/ops/temporal.csrc/ops/collection.csrc/io/csv.c(round 2)src/table/sym.cBug fix (TDD: failing test → src/ fix → passing test)
fix(vec): drop dead null-shift loop inray_vec_insert_at— commitd5bdaf02ray_vec_insert_atclobbered the value at slot N+1 after a null-shift loop wrote NULL_* sentinels on top of correctly-memmoved data.The shift loop at
vec.c:571-585was vestigial — left over from when nulls were tracked via a separate bitmap. Today every vec type that accepts HAS_NULLS encodes its null marker IN the payload via a type-specific sentinel (NULL_I64 / NULL_I32 / NULL_F64 / NaN / etc.).The memmove on line 559 already moved every payload byte — including those sentinels — to its new slot. The shift loop then read
ray_vec_is_null(v, i)on a slot whose memory had just become the next slot's data, mis-identified the moved sentinel as still at the old index, and calledset_null_checked(v, i+1)which wrote NULL_* on top of the real (correctly-moved) value.Reproduced by
test_vec_insert_at_shift_nulls: build[10, null, 30], insert 99 at index 1; expected[10, 99, null, 30], actual[10, 99, null, null(clobbered)].Fix: delete the loop. Updated the regression test (the vec agent had asserted the buggy output, route-around) to assert
[10, 99, null, 30]correctly — fails before this fix, passes after.Test commits (11 in this PR)
fix(vec)— Bug 4 fix described abovetest(csv)round 2 — parallel parse, splayed load, edge casestest(arena)— +12.22 pp via OOM/overflow guard coverage (9 C tests)test(block)— +19.24 pp via type-arm + null-bitmap coverage (8 C tests)test(eval)round 2 — +8.10 pp via SYM W8/W16, affine cache, lazytest(collection)— +4.17 pp via type arms + parted-flat + find errors (6 RFL files)test(linkop)— +12.54 pp regions / +16.13 pp lines via attach/detach/dereftest(graph)round 2 — diverse algorithm shapestest(group)— multi-key + parallel aggregator pathstest(temporal)round 2 — +4.74 pp; documents hard RFL ceiling at 76.89%test(query)late additions — 5 files +551 lines (query agent's post-API-overload work)Test plan
make clean && make test(debug, ASan+UBSan): 2657 of 2659 PASS, 0 failed (2 pre-existing skips)_probes/, no hidden xfailsrc/test-only de-staticing, no internal headers added for testsmake coveragemeasured per-file deltas aboveLatent dead infrastructure documented (not a bug, future cleanup)
sym_dict(aray_t*union field for per-column local sym dict) — declared inray_t, propagated in 6 sites across eval/sort/collection/linkop/rerank, and used as a sizing hint viaray_sym_dict_width()— but never initialized to a non-NULL value anywhere. Git archeology (teide → rayforce → rayforce2 → current) confirms this is an in-flight feature started in teide, the propagation infrastructure was extended in subsequent rebases, but the allocation/construction site (likely in CSV ingest near the existingray_sym_dict_width()call incsv.c) was never written. Future work: either implement the constructor (potential 2-4× storage/SIMD win on high-cardinality categorical SYM columns) or remove the dead propagation.Files still below 80% regions (next-round candidates)
src/ops/expr.c62.84% — RFL ceiling (DAG constant-fold, narrow DIV bails, fused evaluator I64/F64 registers only)src/ops/temporal.c76.89% — needs RFL bindings for MINUTE/HOUR/YEAR trunc or EPOCH extractsrc/ops/traverse.c76.79% — OOM guards (~200 lines),exec_astarand SCC need RFL bindingssrc/ops/journal.c77.10% — defensive OOM guardssrc/store/splay.c77.88% — sym_save bulk gate, OOM, RAY_CSV_TRACE env-gated tracesrc/ops/group.c72.45% — large file, needs dedicated big-effort roundsrc/core/block.c84.62% — at this round, near ceiling🤖 Generated with Claude Code