Fix signed/unsigned long overflow on sum() using Neumaier summation#21871
Fix signed/unsigned long overflow on sum() using Neumaier summation#21871aasom143 wants to merge 1 commit into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 12562ab)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 12562ab Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 577faaa
Suggestions up to commit ea148b3
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #21871 +/- ##
============================================
- Coverage 73.51% 73.45% -0.07%
- Complexity 75582 75584 +2
============================================
Files 6034 6034
Lines 342661 342661
Branches 49294 49294
============================================
- Hits 251918 251696 -222
- Misses 70712 70974 +262
+ Partials 20031 19991 -40 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Persistent review updated to latest commit 577faaa |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit f058a69.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
Replace DataFusion's built-in SUM with a custom UDAF that casts all numeric inputs to f64 except the batch and uses Neumaier's improved Kahan compensated summation algorithm. This prevents i64 wrapping overflow when summing large values. Key changes: - New custom sum UDAF (udaf/sum.rs) returning Float64 - Neumaier compensated summation for cross-batch precision - Vectorized GroupsAccumulator for GROUP BY queries (zero perf regression) - Arrow SIMD-optimized intra-batch sum + Neumaier for cross-batch merge Signed-off-by: Somesh Gupta <someshgupta987@gmail.com>
f058a69 to
12562ab
Compare
|
Persistent review updated to latest commit 12562ab |
Replace DataFusion's built-in SUM with a custom UDAF that casts all numeric inputs to f64 and uses Neumaier's improved Kahan compensated summation algorithm. This prevents i64 wrapping overflow when summing large values.
Key changes:
Performed the performance benchmarking and perf numbers are comparable without this change.
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.