xlstream-eval: fix COUNTIF operator re-fold to sum row counts, not count buckets#189
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: #174
Problem
COUNTIF/COUNTIFS with operator or wildcard criteria counted matching buckets (unique values) instead of matching rows. The
try_operator_criteriafunction re-folded already-finished per-bucket values usingAggKind::Count, which counts feeds (one per bucket) instead of summing stored row counts. Example:COUNTIF(A:A,"*")with 4 text cells across 3 unique values returned 3 instead of 4.This is the #174 re-aggregation bug. It was intended to be included in PR #188 but the
prelude.rschange was lost when the workspace was deleted and re-cloned.Fix
try_operator_criterianow finishes withAggKind::SumforCountkind — matching how the prelude builder stores row counts (each bucket's value is the sum of 1.0-per-row feeds, finished withAggKind::Sum). Thefinish_kindmatch now explicitly mirrorsprelude_plan.rsline 1195 for all aggregate kinds.Changes
crates/xlstream-eval/src/prelude.rs:try_operator_criteriaremapsCount → Sumforfinish_kind, with explicitAverage → Averagearm mirroring the builder.issue-161-wildcard-blank.xlsx: 100 data rows + header, 15 Excel-populated formula cells covering wildcards (*,a*,*a*,???-??????), text comparisons (>b,<banana,>=cherry,<=apple), exact match, and non-blank (<>). Verified against Excel on all 15 cells.Known limitation (not this PR)
AVERAGEIF with operator criteria computes average-of-per-bucket-averages, which is mathematically wrong (should be weighted by count). Both the prelude builder and the re-fold finish with
Average— the coupling is now explicit in the match. Fixing this requires storing (sum, count) pairs per bucket, which is architectural.Scope note
COUNTIF(A:A,"")(blank count for whole columns) returns 30 (scanned rows) vs Excel's 1,048,505 (all 1M+ rows). This is a separate issue — the streaming model doesn't scan unoccupied rows. Excluded from the fixture. The wildcard-blank fix itself (wildcards matching onlyValue::Text) was PR #185.Test plan
cargo test -p xlstream-evalpasses (2291 lib, no regressions)make checkpassesCloses #174