Skip to content

xlstream-eval: fix COUNTIF operator re-fold to sum row counts, not count buckets#189

Merged
cilladev merged 2 commits into
mainfrom
fix/wildcard-blank
Jun 9, 2026
Merged

xlstream-eval: fix COUNTIF operator re-fold to sum row counts, not count buckets#189
cilladev merged 2 commits into
mainfrom
fix/wildcard-blank

Conversation

@cilladev

@cilladev cilladev commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Fixes: #174

Problem

COUNTIF/COUNTIFS with operator or wildcard criteria counted matching buckets (unique values) instead of matching rows. The try_operator_criteria function re-folded already-finished per-bucket values using AggKind::Count, which counts feeds (one per bucket) instead of summing stored row counts. Example: COUNTIF(A:A,"*") with 4 text cells across 3 unique values returned 3 instead of 4.

This is the #174 re-aggregation bug. It was intended to be included in PR #188 but the prelude.rs change was lost when the workspace was deleted and re-cloned.

Fix

try_operator_criteria now finishes with AggKind::Sum for Count kind — matching how the prelude builder stores row counts (each bucket's value is the sum of 1.0-per-row feeds, finished with AggKind::Sum). The finish_kind match now explicitly mirrors prelude_plan.rs line 1195 for all aggregate kinds.

Changes

  • crates/xlstream-eval/src/prelude.rs: try_operator_criteria remaps Count → Sum for finish_kind, with explicit Average → Average arm mirroring the builder.
  • Conformance fixture issue-161-wildcard-blank.xlsx: 100 data rows + header, 15 Excel-populated formula cells covering wildcards (*, a*, *a*, ???-??????), text comparisons (>b, <banana, >=cherry, <=apple), exact match, and non-blank (<>). Verified against Excel on all 15 cells.

Known limitation (not this PR)

AVERAGEIF with operator criteria computes average-of-per-bucket-averages, which is mathematically wrong (should be weighted by count). Both the prelude builder and the re-fold finish with Average — the coupling is now explicit in the match. Fixing this requires storing (sum, count) pairs per bucket, which is architectural.

Scope note

COUNTIF(A:A,"") (blank count for whole columns) returns 30 (scanned rows) vs Excel's 1,048,505 (all 1M+ rows). This is a separate issue — the streaming model doesn't scan unoccupied rows. Excluded from the fixture. The wildcard-blank fix itself (wildcards matching only Value::Text) was PR #185.

Test plan

  • Conformance fixture created (Excel-populated) and passes (15/15, 0 mismatches)
  • cargo test -p xlstream-eval passes (2291 lib, no regressions)
  • make check passes

Closes #174

@cilladev cilladev merged commit af7d728 into main Jun 9, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COUNTIF/COUNTIFS operator criteria counts unique matching values, not all rows

1 participant