-
-
Notifications
You must be signed in to change notification settings - Fork 135
Add comprehensive Dask performance benchmark tests #1049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add comprehensive Dask performance benchmark tests #1049
Conversation
This commit adds a complete benchmark suite for measuring and optimizing Dask performance in OG-Core, with particular focus on Windows performance issues. New files: - tests/test_dask_benchmarks.py: Mock benchmark tests with synthetic workloads - tests/test_real_txfunc_benchmarks.py: Real-world tax function benchmarks - tests/run_benchmarks.py: Automated benchmark runner with reporting - tests/BENCHMARK_README.md: Comprehensive documentation and usage guide - pytest.ini: Updated with benchmark test markers Key features: - Platform-specific optimization tests (Windows, macOS, Linux) - Memory usage and compute time benchmarking - Baseline establishment and performance regression detection - Comparison of different Dask schedulers and client configurations - Real tax function estimation performance measurement - Automated identification of optimal Dask settings per platform Benefits: - Establishes performance baselines before optimization work - Identifies Windows-specific Dask performance bottlenecks - Provides automated regression detection for future changes - Enables data-driven optimization decisions - Supports continuous performance monitoring Usage: python tests/run_benchmarks.py # Run all benchmarks python tests/run_benchmarks.py --quick # Quick benchmarks only python tests/run_benchmarks.py --save-baseline # Save performance baseline python tests/run_benchmarks.py --compare-baseline # Compare against baseline 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive benchmark suite for measuring and optimizing Dask performance in OG-Core, with particular focus on Windows performance issues. The suite establishes performance baselines, provides automated regression detection, and enables data-driven optimization decisions through platform-specific testing.
- Mock benchmark tests with synthetic workloads that mimic tax function patterns
- Real tax function benchmarks using actual
txfunc.tax_func_estimatecalls - Platform-specific optimization detection with automated configuration recommendations
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_real_txfunc_benchmarks.py | Real-world tax function benchmarks with realistic data generation and platform-specific testing |
| tests/test_dask_benchmarks.py | Mock benchmark framework with synthetic workloads and comprehensive performance measurement utilities |
| tests/run_benchmarks.py | Automated benchmark runner with baseline comparison and reporting capabilities |
| tests/BENCHMARK_README.md | Comprehensive documentation covering usage, interpretation, and troubleshooting |
| pytest.ini | Updated with benchmark-specific test markers for easy test selection |
Comments suppressed due to low confidence (1)
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Some of the initial benchmarking results: ✅ Key Results Summary Platform: macOS (Darwin) - which behaves similarly to Linux for Dask 📊 Performance Analysis Threading vs Multiprocessing Performance Gap
Memory Efficiency
🔍 Key Findings
|
|
🎯 Recommendations Based on Results For OG-Core Optimization:
Instead of:results = compute(*lazy_values, scheduler=dask.multiprocessing.get)Use:results = compute(*lazy_values, scheduler="threads")
if platform.system() == "Windows":
scheduler = "threads" # 93x faster!
else:
scheduler = dask.multiprocessing.get # OK on Unix/macOS
|
This commit fixes the real tax function benchmark tests that were failing
with "'market_income'" KeyError by:
Changes:
- Added missing 'market_income' column to generated test data
- Added all required columns that txfunc expects: year, total_tax_liab,
payroll_tax_liab, weight, mtr_labinc, mtr_capinc
- Fixed variable naming inconsistency (error_msg vs error_message)
- Increased sample sizes for more realistic benchmarking
- Disabled age-specific estimation for faster benchmark execution
- Added missing pytest markers ('real', 'platform') to pytest.ini
The real tax function benchmarks now successfully run and can measure
actual OG-Core performance with different Dask configurations.
Test results show ~25s execution time for real tax function estimation,
providing valuable baseline data for optimization efforts.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes the run_benchmarks.py script to properly run both mock and real benchmark test files, addressing the errors reported when running --save-baseline with-real-benchmarks. Changes: - Modified run_benchmark_tests() to include both test_dask_benchmarks.py and test_real_txfunc_benchmarks.py - Previously was only running mock benchmarks, missing real txfunc tests - Cleaned up old confusing benchmark results to avoid stale error messages Test results: - All 13 benchmark tests now pass successfully (7 mock + 6 real) - Full benchmark suite runs in ~4:42 minutes - Successfully saved 45 benchmark results as baseline - Both mock and real benchmarks working with all Dask configurations Performance insights from real benchmarks: - Real tax function estimation: 22-44 seconds (baseline performance) - Mock benchmarks: 0.024 seconds (for regression testing) - Threaded scheduler remains fastest for all configurations - Platform-specific optimization tests working correctly 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
|
To compare performance improvements after optimizations: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1049 +/- ##
==========================================
+ Coverage 72.61% 72.63% +0.02%
==========================================
Files 20 20
Lines 5076 5080 +4
==========================================
+ Hits 3686 3690 +4
Misses 1390 1390
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@jdebacker. I have reviewed this PR and am ready to merge it as soon as you review and merge my PR to your branch. Technically, my PR to your branch just updates the OG-Core version number and black formats one of the new test files. But I also made some of the changes that Copilot suggested in this PR thread and directly committed them to your branch through this PR thread. So you will probably want to pull from your remote branch before merging my PR to your branch. Let me know if you have any questions. For any of the other Copilot-suggested changes that I didn't make, I opened issues (Issue #1051 and Issue #1050) so that we can easily address them later. |
…marks Update version, black formatting, and small Copilot revisions
|
@jdebacker. Looks great. Merging. |
This commit adds a complete benchmark suite for measuring and optimizing Dask performance in OG-Core, with particular focus on Windows performance issues.
New files:
Key features:
Benefits:
Usage:
python tests/run_benchmarks.py # Run all benchmarks
python tests/run_benchmarks.py --quick # Quick benchmarks only
python tests/run_benchmarks.py --save-baseline # Save performance baseline
python tests/run_benchmarks.py --compare-baseline # Compare against baseline
🤖 Generated with Claude Code