Skip to content

Conversation

@jdebacker
Copy link
Member

This commit adds a complete benchmark suite for measuring and optimizing Dask performance in OG-Core, with particular focus on Windows performance issues.

New files:

  • tests/test_dask_benchmarks.py: Mock benchmark tests with synthetic workloads
  • tests/test_real_txfunc_benchmarks.py: Real-world tax function benchmarks
  • tests/run_benchmarks.py: Automated benchmark runner with reporting
  • tests/BENCHMARK_README.md: Comprehensive documentation and usage guide
  • pytest.ini: Updated with benchmark test markers

Key features:

  • Platform-specific optimization tests (Windows, macOS, Linux)
  • Memory usage and compute time benchmarking
  • Baseline establishment and performance regression detection
  • Comparison of different Dask schedulers and client configurations
  • Real tax function estimation performance measurement
  • Automated identification of optimal Dask settings per platform

Benefits:

  • Establishes performance baselines before optimization work
  • Identifies Windows-specific Dask performance bottlenecks
  • Provides automated regression detection for future changes
  • Enables data-driven optimization decisions
  • Supports continuous performance monitoring

Usage:
python tests/run_benchmarks.py # Run all benchmarks
python tests/run_benchmarks.py --quick # Quick benchmarks only
python tests/run_benchmarks.py --save-baseline # Save performance baseline
python tests/run_benchmarks.py --compare-baseline # Compare against baseline

🤖 Generated with Claude Code

This commit adds a complete benchmark suite for measuring and optimizing
Dask performance in OG-Core, with particular focus on Windows performance issues.

New files:
- tests/test_dask_benchmarks.py: Mock benchmark tests with synthetic workloads
- tests/test_real_txfunc_benchmarks.py: Real-world tax function benchmarks
- tests/run_benchmarks.py: Automated benchmark runner with reporting
- tests/BENCHMARK_README.md: Comprehensive documentation and usage guide
- pytest.ini: Updated with benchmark test markers

Key features:
- Platform-specific optimization tests (Windows, macOS, Linux)
- Memory usage and compute time benchmarking
- Baseline establishment and performance regression detection
- Comparison of different Dask schedulers and client configurations
- Real tax function estimation performance measurement
- Automated identification of optimal Dask settings per platform

Benefits:
- Establishes performance baselines before optimization work
- Identifies Windows-specific Dask performance bottlenecks
- Provides automated regression detection for future changes
- Enables data-driven optimization decisions
- Supports continuous performance monitoring

Usage:
  python tests/run_benchmarks.py                    # Run all benchmarks
  python tests/run_benchmarks.py --quick           # Quick benchmarks only
  python tests/run_benchmarks.py --save-baseline   # Save performance baseline
  python tests/run_benchmarks.py --compare-baseline # Compare against baseline

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@jdebacker jdebacker requested a review from Copilot August 22, 2025 16:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive benchmark suite for measuring and optimizing Dask performance in OG-Core, with particular focus on Windows performance issues. The suite establishes performance baselines, provides automated regression detection, and enables data-driven optimization decisions through platform-specific testing.

  • Mock benchmark tests with synthetic workloads that mimic tax function patterns
  • Real tax function benchmarks using actual txfunc.tax_func_estimate calls
  • Platform-specific optimization detection with automated configuration recommendations

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_real_txfunc_benchmarks.py Real-world tax function benchmarks with realistic data generation and platform-specific testing
tests/test_dask_benchmarks.py Mock benchmark framework with synthetic workloads and comprehensive performance measurement utilities
tests/run_benchmarks.py Automated benchmark runner with baseline comparison and reporting capabilities
tests/BENCHMARK_README.md Comprehensive documentation covering usage, interpretation, and troubleshooting
pytest.ini Updated with benchmark-specific test markers for easy test selection
Comments suppressed due to low confidence (1)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jdebacker
Copy link
Member Author

Some of the initial benchmarking results:

✅ Key Results Summary

Platform: macOS (Darwin) - which behaves similarly to Linux for Dask
performance patterns

📊 Performance Analysis

Threading vs Multiprocessing Performance Gap

  • Threaded scheduler: ~0.024s average ⚡️ (FAST)
  • Multiprocessing scheduler: ~2.241s average 🐌 (93x SLOWER!)
  • Distributed processes: ~1.742s average (72x SLOWER than threads)

Memory Efficiency

  • Threaded: ~309.7MB average peak memory
  • Multiprocessing: ~301.7MB average peak memory
  • Distributed: ~317MB average peak memory

🔍 Key Findings

  1. Massive Performance Gap: Multiprocessing is 93x slower than threading
    on macOS
    - This confirms the serialization overhead issue we identified
    - Windows would likely show even worse multiprocessing performance
  2. Threading is Optimal: The threaded scheduler consistently outperforms
    all other options
    - Best compute time: ~0.024s
    - Reasonable memory usage: ~310MB
    - Most reliable across different test scenarios
  3. Distributed Client Overhead: Even distributed clients add some
    overhead vs direct threading
    - Distributed threaded: ~0.026s (slightly slower than direct threading)
    - Still much better than multiprocessing approaches

@jdebacker
Copy link
Member Author

jdebacker commented Aug 22, 2025

🎯 Recommendations Based on Results

For OG-Core Optimization:

  1. Immediate Win: Switch Windows users to threaded scheduler

Instead of:

results = compute(*lazy_values, scheduler=dask.multiprocessing.get)

Use:

results = compute(*lazy_values, scheduler="threads")
  1. Platform Detection: Implement platform-aware Dask configuration
if platform.system() == "Windows":
    scheduler = "threads"  # 93x faster!
else:
    scheduler = dask.multiprocessing.get  # OK on Unix/macOS
  1. Memory Optimization: All schedulers show reasonable memory usage
    (~300-320MB)

jdebacker and others added 2 commits August 22, 2025 14:19
This commit fixes the real tax function benchmark tests that were failing
with "'market_income'" KeyError by:

Changes:
- Added missing 'market_income' column to generated test data
- Added all required columns that txfunc expects: year, total_tax_liab,
  payroll_tax_liab, weight, mtr_labinc, mtr_capinc
- Fixed variable naming inconsistency (error_msg vs error_message)
- Increased sample sizes for more realistic benchmarking
- Disabled age-specific estimation for faster benchmark execution
- Added missing pytest markers ('real', 'platform') to pytest.ini

The real tax function benchmarks now successfully run and can measure
actual OG-Core performance with different Dask configurations.

Test results show ~25s execution time for real tax function estimation,
providing valuable baseline data for optimization efforts.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes the run_benchmarks.py script to properly run both
mock and real benchmark test files, addressing the errors reported
when running --save-baseline with-real-benchmarks.

Changes:
- Modified run_benchmark_tests() to include both test_dask_benchmarks.py
  and test_real_txfunc_benchmarks.py
- Previously was only running mock benchmarks, missing real txfunc tests
- Cleaned up old confusing benchmark results to avoid stale error messages

Test results:
- All 13 benchmark tests now pass successfully (7 mock + 6 real)
- Full benchmark suite runs in ~4:42 minutes
- Successfully saved 45 benchmark results as baseline
- Both mock and real benchmarks working with all Dask configurations

Performance insights from real benchmarks:
- Real tax function estimation: 22-44 seconds (baseline performance)
- Mock benchmarks: 0.024 seconds (for regression testing)
- Threaded scheduler remains fastest for all configurations
- Platform-specific optimization tests working correctly

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@jdebacker
Copy link
Member Author

To compare performance improvements after optimizations:
python run_benchmarks.py --compare-baseline with-real-benchmarks

@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.63%. Comparing base (ef6695e) to head (689bb83).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1049      +/-   ##
==========================================
+ Coverage   72.61%   72.63%   +0.02%     
==========================================
  Files          20       20              
  Lines        5076     5080       +4     
==========================================
+ Hits         3686     3690       +4     
  Misses       1390     1390              
Flag Coverage Δ
unittests 72.63% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
ogcore/__init__.py 100.00% <100.00%> (ø)
ogcore/household.py 88.08% <100.00%> (+0.20%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rickecon and others added 3 commits August 26, 2025 01:24
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@rickecon rickecon marked this pull request as ready for review August 26, 2025 07:01
@rickecon
Copy link
Member

rickecon commented Aug 26, 2025

@jdebacker. I have reviewed this PR and am ready to merge it as soon as you review and merge my PR to your branch. Technically, my PR to your branch just updates the OG-Core version number and black formats one of the new test files. But I also made some of the changes that Copilot suggested in this PR thread and directly committed them to your branch through this PR thread. So you will probably want to pull from your remote branch before merging my PR to your branch. Let me know if you have any questions.

For any of the other Copilot-suggested changes that I didn't make, I opened issues (Issue #1051 and Issue #1050) so that we can easily address them later.

@rickecon
Copy link
Member

@jdebacker. Looks great. Merging.

@rickecon rickecon merged commit 938073b into PSLmodels:master Aug 28, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants