Coverage round 6 + 1 src/ bug fix (vec insert_at) by ser-vasilich · Pull Request #211 · RayforceDB/rayforce

ser-vasilich · 2026-05-22T08:01:36Z

Summary

Round 6 coverage push since #210 merged, plus one more real bug surfaced and fixed via TDD.

Coverage delta (vs PR #210 baseline):

Metric	Before	After	Δ
Functions	97.73% (47 missed)	97.83% (44)	−3 missed
Lines	85.74%	87.06%	+1.32 pp
Regions	80.79%	81.48%	+0.69 pp (−486 missed)
Branches	63.59%	64.34%	+0.75 pp

Per-file standout deltas:

File	Before	After	Δ
`src/ops/linkop.c`	81.03%	93.57%	+12.54 pp (lines +16.13)
`src/mem/arena.c`	81.11%	93.33%	+12.22 pp
`src/lang/eval.c`	80.10%	88.20%	+8.10 pp
`src/core/block.c`	65.38%	84.62%	+19.24 pp
`src/vec/vec.c`	81.86%	86.98%	+5.12 pp
`src/ops/temporal.c`	72.15%	76.89%	+4.74 pp (RFL ceiling)
`src/ops/collection.c`	80.90%	85.07%	+4.17 pp
`src/io/csv.c` (round 2)	81.39%	(push further)	additional paths
`src/table/sym.c`	78.17%	80.19%	+2.02 pp

Bug fix (TDD: failing test → src/ fix → passing test)

`fix(vec)`: drop dead null-shift loop in `ray_vec_insert_at` — commit `d5bdaf02`

ray_vec_insert_at clobbered the value at slot N+1 after a null-shift loop wrote NULL_* sentinels on top of correctly-memmoved data.

The shift loop at vec.c:571-585 was vestigial — left over from when nulls were tracked via a separate bitmap. Today every vec type that accepts HAS_NULLS encodes its null marker IN the payload via a type-specific sentinel (NULL_I64 / NULL_I32 / NULL_F64 / NaN / etc.).

The memmove on line 559 already moved every payload byte — including those sentinels — to its new slot. The shift loop then read ray_vec_is_null(v, i) on a slot whose memory had just become the next slot's data, mis-identified the moved sentinel as still at the old index, and called set_null_checked(v, i+1) which wrote NULL_* on top of the real (correctly-moved) value.

Reproduced by test_vec_insert_at_shift_nulls: build [10, null, 30], insert 99 at index 1; expected [10, 99, null, 30], actual [10, 99, null, null(clobbered)].

Fix: delete the loop. Updated the regression test (the vec agent had asserted the buggy output, route-around) to assert [10, 99, null, 30] correctly — fails before this fix, passes after.

Test commits (11 in this PR)

fix(vec) — Bug 4 fix described above
test(csv) round 2 — parallel parse, splayed load, edge cases
test(arena) — +12.22 pp via OOM/overflow guard coverage (9 C tests)
test(block) — +19.24 pp via type-arm + null-bitmap coverage (8 C tests)
test(eval) round 2 — +8.10 pp via SYM W8/W16, affine cache, lazy
test(collection) — +4.17 pp via type arms + parted-flat + find errors (6 RFL files)
test(linkop) — +12.54 pp regions / +16.13 pp lines via attach/detach/deref
test(graph) round 2 — diverse algorithm shapes
test(group) — multi-key + parallel aggregator paths
test(temporal) round 2 — +4.74 pp; documents hard RFL ceiling at 76.89%
test(query) late additions — 5 files +551 lines (query agent's post-API-overload work)

Test plan

make clean && make test (debug, ASan+UBSan): 2657 of 2659 PASS, 0 failed (2 pre-existing skips)
Bug 4 has TDD test that fails before the fix and passes after
No _probes/, no hidden xfail
No src/ test-only de-staticing, no internal headers added for tests
make coverage measured per-file deltas above

Latent dead infrastructure documented (not a bug, future cleanup)

sym_dict (a ray_t* union field for per-column local sym dict) — declared in ray_t, propagated in 6 sites across eval/sort/collection/linkop/rerank, and used as a sizing hint via ray_sym_dict_width() — but never initialized to a non-NULL value anywhere. Git archeology (teide → rayforce → rayforce2 → current) confirms this is an in-flight feature started in teide, the propagation infrastructure was extended in subsequent rebases, but the allocation/construction site (likely in CSV ingest near the existing ray_sym_dict_width() call in csv.c) was never written. Future work: either implement the constructor (potential 2-4× storage/SIMD win on high-cardinality categorical SYM columns) or remove the dead propagation.

Files still below 80% regions (next-round candidates)

src/ops/expr.c 62.84% — RFL ceiling (DAG constant-fold, narrow DIV bails, fused evaluator I64/F64 registers only)
src/ops/temporal.c 76.89% — needs RFL bindings for MINUTE/HOUR/YEAR trunc or EPOCH extract
src/ops/traverse.c 76.79% — OOM guards (~200 lines), exec_astar and SCC need RFL bindings
src/ops/journal.c 77.10% — defensive OOM guards
src/store/splay.c 77.88% — sym_save bulk gate, OOM, RAY_CSV_TRACE env-gated trace
src/ops/group.c 72.45% — large file, needs dedicated big-effort round
src/core/block.c 84.62% — at this round, near ceiling

🤖 Generated with Claude Code

The shift loop at vec.c:571-585 was vestigial — left over from when nulls were tracked via a separate bitmap. Today every vec type that accepts HAS_NULLS encodes its null marker IN the payload via a type-specific sentinel (NULL_I64 / NULL_I32 / NULL_F64 / NaN / etc., see sentinel_is_null + ray_vec_is_null comment "Sentinels are the sole source of truth"). The memmove at line 559 already moved every payload byte, including those sentinels, to its new slot. The shift loop then read ray_vec_is_null(v, i) on a slot whose memory had just become the *next* slot's data, mis-identified the moved sentinel as still sitting at the old index, and called ray_vec_set_null_checked(v, i+1) which wrote NULL_* on top of the real (correctly-moved) value at i+1. Reproduced by test_vec_insert_at_shift_nulls: build [10, null, 30], insert 99 at index 1; expected [10, 99, null, 30], actual [10, 99, null, null(clobbered)]. Fix: delete the loop. Updated the regression test (which had been asserting the buggy output to make a green check) to assert [10, 99, null, 30] correctly — fails before this fix, passes after. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/io/csv.c (81.39% regions / 79.82% lines after round 1) — pushes further by exercising paths the csv agent's round-2 attempt couldn't reach due to an API-overload abort. - Large-CSV round-trip (10000 rows) — triggers csv_parse_fn parallel dispatch (csv.c:902) and build_row_offsets / build_row_offsets_limited realloc paths (>initial_est rows). - Quoted field with embedded newline — slow path through scan_field_quoted + build_row_offsets. - GUID round-trip — csv_write_guid + fast_guid. - .csv.splayed CSV → splayed dir — csv_splayed_writer_open / append / close (csv.c:1834+). - Trailing-comma-no-newline + empty middle cells — boundary paths in csv_parse_serial and build_row_offsets_limited. - Full temporal round-trip (DATE/TIME/TIMESTAMP) — csv_write_date / csv_write_time / csv_write_timestamp. - Long string fields — string-pool growth in csv_intern_strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/mem/arena.c: 81.11% → 93.33% region coverage. 9 new C tests in test/test_arena.c exercising: - tiny chunk size clamp (ray_arena_new chunk_size < 256) - NULL/overflow guards in ray_arena_alloc / ray_arena_reserve / ray_arena_total_used / ray_arena_reset - ray_arena_reserve bytes==0 early-return - ray_arena_reserve oversize bump path - multi-chunk loop in ray_arena_total_used 6 remaining missed regions are all OS-level allocation failure paths (ray_sys_alloc → NULL) — structurally unreachable in tests without fault injection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/core/block.c: 65.38% → 84.62% region coverage. 8 new C tests in test/test_block.c covering: - ray_block_size RAY_LIST / RAY_DICT branches - ray_block_size RAY_SEL full computation + nrows=0 + nrows<0 paths - ray_block_size out-of-range type guard (t >= RAY_TYPE_COUNT) - ray_block_copy on LIST / SEL blocks 8 remaining regions are all unreachable: ray_alloc weak stub (replaced at link time by buddy allocator), OOM cleanup in ray_block_copy / ray_retain_owned_refs (requires fault injection). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/lang/eval.c: 80.10% → 88.20% region coverage. New file test/rfl/hof/eval_coverage2.rfl covers: - atomic_map_unary boxed-list fallback (non-numeric per-elem fn) - SYM W8/W16/W64 fast paths in atomic_map_binary_op (==/!= on SYM vecs of varying widths) - SYM null-atom path (null sym vs non-null SYM vec) - SYM general path with nulls (0Ns entries in SYM vec) - numeric_atom_i64 I32/I16/BOOL branches (affine sum with 1i, 2h, true) - affine_sum_cache hit path (double call on same vector) - let with lazy value / if with lazy condition (eval-time materialise) - error propagation in boxed-list atomic_map_unary (per-elem raise) - ray_cond_fn n<2 / ray_let_fn type-error guard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/ops/collection.c: 80.90% → 85.07% region coverage. 6 new files (cov2-cov7) covering: - distinct_sort_cmp default branch (F32 typed via CSV) — bypass count_distinct idiom rewriter by materialising distinct first - parted_to_flat_vec STR branch via parted-STR distinct - ray_find_fn collection-val error cleanup path - atomic_map_* corner cases - list/dict iteration edge cases Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/ops/linkop.c: 81.03% → 93.57% regions, 79.57% → 95.70% lines. New file test/rfl/linkop/coverage.rfl covers: - ray_col_link_fn error paths (non-sym target, non-vec int_vec) - ray_link_attach with HAS_INDEX already set (index-first then link) - ray_link_detach with HAS_INDEX still set - ray_link_deref with I32 link column (as 'I32 row-id vec) - Negative rid in link column → null result - Null in target column propagates to deref result - Null-sentinel fill paths for F64 / I32 / DATE / TIME / I16 Documented unreachable: ray_link_attach NULL/error-vec guard (caller already checks RAY_IS_ERR), slice-attach guard (no public RFL slice), negative target_sym_id guard (interned IDs always ≥0), sym_dict propagation (latent dead code — sym_dict field never initialized non-NULL anywhere; in-flight feature inherited from teide). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/ops/traverse.c: 76.69% → 76.79% (+3 regions; remaining 721 are mostly OOM guards documented as structurally unreachable). Extended/created: - traverse_coverage.rfl: exec_expand direction==1/2, exec_dijkstra with explicit dst (early-exit) and dst==src, exec_k_shortest dup detection, exec_var_expand direction==2 on K4/Star - traverse_weighted.rfl: weighted graph algorithms — Dijkstra-dst, k-shortest dup, MST rank-union, random walk, var_expand dir 1/2, expand dir 0/1/2, OOB dijkstra error - graph_algos_advanced.rfl: sampled betweenness/closeness, self-loop DFS, K3 cluster coefficient, 10-hop shortest path, k-shortest mid swap, connected components, multi-source var_expand Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/ops/group.c (72.45%): pushes toward higher coverage via: - group_key_types.rfl: multi-key GROUP BY combinations (I64+SYM, BOOL+I32, etc), sentinel handling per type, all-null groups, finalize-nulls paths - group_parallel_aggs.rfl: parallel dispatch (≥100k rows), aggregator combos (sum/avg/min/max/count/first/last in same query), holistic aggs (med, var/stddev) over groups, row-form fast paths Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/ops/temporal.c: 72.15% → 76.89% region coverage. Extended cross_cast_period.rfl, dag_extract_trunc.rfl, extract.rfl with deeper edge cases. Documented hard RFL ceiling at 76.89% — remaining 337 regions need either src/ changes (add MINUTE/HOUR/YEAR trunc bindings, add EPOCH field dispatch) or C-level tests: - RAY_EXTRACT_EPOCH never registered as DAG field - ray_temporal_trunc_from_sym maps only date→DAY and time→SECOND; no RFL syntax invokes MINUTE/HOUR/YEAR truncation - r<0 mathematical impossibility branches in DATE/TIME (always ≥0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The query agent's round-5 run ran for 9h (2278 tool uses) before aborting on API overload, but did write 551+ lines of additional tests across 5 files in test/rfl/query/. Net coverage delta on query.c is included in the round-5 baseline. Files touched: - query_dag_agg_coverage.rfl (+89 lines) - query_emit_filter_coverage.rfl (+63 lines) - query_evalgroup_coverage.rfl (+162 lines) - query_sort_take_coverage.rfl (+12 lines) - query_update_coverage.rfl (+239 lines: insert into non-TABLE error paths at query.c:9057-9079, 9121-9194) All assertions are passing post-fix. No BUG/xfail markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ser-vasilich and others added 11 commits May 21, 2026 22:49

singaraiona merged commit ce6a342 into master May 22, 2026
4 checks passed

ser-vasilich mentioned this pull request May 23, 2026

src/ fixes (BOOL CAST nulls, fused_group LIKE, emit-filter ordering) + cleanups + coverage round 9-10 #213

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coverage round 6 + 1 src/ bug fix (vec insert_at)#211

Coverage round 6 + 1 src/ bug fix (vec insert_at)#211
singaraiona merged 11 commits into
masterfrom
tests/coverage-round-5

ser-vasilich commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ser-vasilich commented May 22, 2026

Summary

Bug fix (TDD: failing test → src/ fix → passing test)

fix(vec): drop dead null-shift loop in ray_vec_insert_at — commit d5bdaf02

Test commits (11 in this PR)

Test plan

Latent dead infrastructure documented (not a bug, future cleanup)

Files still below 80% regions (next-round candidates)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`fix(vec)`: drop dead null-shift loop in `ray_vec_insert_at` — commit `d5bdaf02`