Conversation
updates: - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.11.0](psf/black-pre-commit-mirror@25.1.0...25.11.0) - [github.com/pycqa/isort: 6.0.1 → 7.0.0](PyCQA/isort@6.0.1...7.0.0)
…e computation of difference of two proportions confidence intervals using wang method
There was a problem hiding this comment.
Pull request overview
This pull request fixes edge case errors in the Wang method for computing confidence intervals of the difference between two proportions, particularly when n_total equals ref_total. The fix includes corrections to the step size calculation in grid refinement, changes to plateau detection logic, and updates to tests to accommodate minor numerical differences.
Changes:
- Fixed step size calculation in
_prob2stepand_prob2steplminfunctions to use correct denominator(grid_one - 1)instead ofgrid_one - Simplified plateau detection from tolerance-based to exact equality comparison
- Updated tests with a two-tier tolerance system (strict 1e-4, loose 1e-2) to handle edge cases
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| diff_binom_confint/_specials/_wang.py | Core implementation fixes: step size calculation, plateau detection, verbose parameter threading, stable sorting, and code cleanup |
| test/test_specials.py | Added fallback test logic with dual tolerance levels, warning messages for accuracy tracking, and imports for warnings module |
| CHANGELOG.rst | Documented the bug fix for edge cases in wang method |
| .devcontainer/devcontainer.json | Added post-create command for dev environment setup |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # mansum = sumofprob.min() | ||
| # atol = 1e-14 * (abs(mansum) if mansum != 0 else 1.0) | ||
| # plateau_idx = np.where(np.isclose(sumofprob, mansum, rtol=0.0, atol=atol))[0] | ||
| plateau_idx = np.where(sumofprob == sumofprob.min())[0] |
There was a problem hiding this comment.
Using exact equality (==) for floating-point comparison could miss values that are numerically very close to the minimum due to floating-point precision issues. The commented-out code used np.isclose with a small tolerance to handle this. Consider whether exact equality is appropriate here, or if the tolerance-based comparison should be retained to handle numerical precision edge cases robustly.
| # mansum = sumofprob.min() | |
| # atol = 1e-14 * (abs(mansum) if mansum != 0 else 1.0) | |
| # plateau_idx = np.where(np.isclose(sumofprob, mansum, rtol=0.0, atol=atol))[0] | |
| plateau_idx = np.where(sumofprob == sumofprob.min())[0] | |
| mansum = sumofprob.min() | |
| atol = 1e-14 * (abs(mansum) if mansum != 0 else 1.0) | |
| plateau_idx = np.where(np.isclose(sumofprob, mansum, rtol=0.0, atol=atol))[0] |
| kk1 = kk | ||
|
|
||
| output = [val.item() if isinstance(val, np.generic) else val for val in output] | ||
| # output = [val.item() if isinstance(val, np.generic) else val for val in output] |
There was a problem hiding this comment.
The commented-out code should be removed rather than left in the codebase. This appears to be the old implementation that was replaced. Leaving commented code in production reduces readability.
| # output = [val.item() if isinstance(val, np.generic) else val for val in output] |
| if not np.isclose((r_lb, r_ub), (lb, ub), atol=ERR_LIMIT_STRICT).all(): | ||
| warnings.warn( | ||
| f"Strict test failed for {n_positive = }, {n_total = }, {ref_positive = }, {ref_total = }," | ||
| f"R result: {r_lb, r_ub}, Python result: {lb, ub}. falling back to loose test.", |
There was a problem hiding this comment.
Missing space after comma in the warning message. The message should read "...{ref_total = }, R result:" instead of "...{ref_total = },R result:" for better readability.
| f"R result: {r_lb, r_ub}, Python result: {lb, ub}. falling back to loose test.", | |
| f" R result: {r_lb, r_ub}, Python result: {lb, ub}. falling back to loose test.", |
test/test_specials.py
Outdated
| f"Strict test failed for one-sided lower {n_positive = }, {n_total = }, {ref_positive = }, {ref_total = }," | ||
| f"R result: {r_lb, r_ub}, Python result: {lb, ub}. falling back to loose test.", |
There was a problem hiding this comment.
Missing space after comma in the warning message. The message should read "...{ref_total = }, R result:" instead of "...{ref_total = },R result:" for better readability.
| lowerb = max(p0[0], p0[rightmost] - stepv) + delta | ||
| upperb = min(p0[-1], p0[leftmost] + stepv) - delta | ||
|
|
There was a problem hiding this comment.
There is a potential issue where lowerb could be greater than upperb after the calculations on lines 513-514. This happens because both lowerb and upperb are computed independently with + delta and - delta adjustments. When p0[rightmost] - stepv + delta > p0[leftmost] + stepv - delta, the second call to np.linspace(lowerb, upperb, grid_two) on line 516 would create a reversed array or fail if bounds checking is strict. Consider adding a check to ensure lowerb <= upperb, or swap them if necessary.
| lowerb = max(p0[0], p0[rightmost] - stepv) + delta | |
| upperb = min(p0[-1], p0[leftmost] + stepv) - delta | |
| raw_lowerb = max(p0[0], p0[rightmost] - stepv) + delta | |
| raw_upperb = min(p0[-1], p0[leftmost] + stepv) - delta | |
| if raw_lowerb <= raw_upperb: | |
| lowerb, upperb = raw_lowerb, raw_upperb | |
| else: | |
| # Ensure bounds are ordered for linspace; swap if necessary | |
| lowerb, upperb = raw_upperb, raw_lowerb |
| ERR_LIMIT_STRICT = 1e-4 | ||
| ERR_LIMIT_LOOSE = 1e-2 |
There was a problem hiding this comment.
The fallback to a loose tolerance of 1e-2 (ERR_LIMIT_LOOSE) represents a 100-fold increase from the strict tolerance of 1e-4 (ERR_LIMIT_STRICT). This could mask significant accuracy regressions. Consider whether this tolerance is appropriate, or if a smaller intermediate tolerance (e.g., 1e-3) would be more suitable. Additionally, consider whether tests that fall back to the loose tolerance should be logged or tracked separately to monitor accuracy degradation.
| # mansum = sumofprob.max() | ||
| # atol = 1e-14 * (mansum if mansum > 0 else 1.0) | ||
| # plateau_idx = np.where(np.isclose(sumofprob, mansum, rtol=0.0, atol=atol))[0] |
There was a problem hiding this comment.
The commented-out code should be removed rather than left in the codebase. These lines appear to be the old implementation that was replaced with the simpler plateau detection logic on line 476. Leaving commented code in production reduces readability and can cause confusion.
| # mansum = sumofprob.max() | |
| # atol = 1e-14 * (mansum if mansum > 0 else 1.0) | |
| # plateau_idx = np.where(np.isclose(sumofprob, mansum, rtol=0.0, atol=atol))[0] |
| # mansum = sumofprob.min() | ||
| # atol = 1e-14 * (abs(mansum) if mansum != 0 else 1.0) | ||
| # plateau_idx = np.where(np.isclose(sumofprob, mansum, rtol=0.0, atol=atol))[0] |
There was a problem hiding this comment.
The commented-out code should be removed rather than left in the codebase. These lines appear to be the old implementation that was replaced with the simpler plateau detection logic on line 507. Leaving commented code in production reduces readability and can cause confusion.
| # mansum = sumofprob.min() | |
| # atol = 1e-14 * (abs(mansum) if mansum != 0 else 1.0) | |
| # plateau_idx = np.where(np.isclose(sumofprob, mansum, rtol=0.0, atol=atol))[0] |
| f"Strict test failed for one-sided upper {n_positive = }, {n_total = }, {ref_positive = }, {ref_total = }," | ||
| f"R result: {r_lb, r_ub}, Python result: {lb, ub}. falling back to loose test.", |
There was a problem hiding this comment.
Missing space after comma in the warning message. The message should read "...{ref_total = }, R result:" instead of "...{ref_total = },R result:" for better readability.
| denom = (grid_one - 1) if grid_one > 1 else 1 | ||
| stepv = (p0[-1] - p0[0]) / denom | ||
| lowerb = max(p0[0], p0[rightmost] - stepv) + delta | ||
| upperb = min(p0[-1], p0[leftmost] + stepv) - delta |
There was a problem hiding this comment.
There is a potential issue where lowerb could be greater than upperb after the calculations on lines 482-483. This happens because both lowerb and upperb are computed independently with + delta and - delta adjustments. When p0[rightmost] - stepv + delta > p0[leftmost] + stepv - delta, the second call to np.linspace(lowerb, upperb, grid_two) on line 485 would create a reversed array or fail if bounds checking is strict. Consider adding a check to ensure lowerb <= upperb, or swap them if necessary.
| upperb = min(p0[-1], p0[leftmost] + stepv) - delta | |
| upperb = min(p0[-1], p0[leftmost] + stepv) - delta | |
| if lowerb > upperb: | |
| lowerb, upperb = upperb, lowerb |
Fix errors in edge cases (mainly when
n_totalequalsref_total) in the computation of the difference of two proportions confidence intervals usingwangmethod.