Skip to content

Slice the system operator to the active set once per outer step (fixes #72)#73

Merged
tschm merged 1 commit into
mainfrom
fix/presliced-operators
Jul 3, 2026
Merged

Slice the system operator to the active set once per outer step (fixes #72)#73
tschm merged 1 commit into
mainfrom
fix/presliced-operators

Conversation

@tschm

@tschm tschm commented Jul 3, 2026

Copy link
Copy Markdown
Member

Summary

#71 moved the active-set restriction inside the CG matvec: full-universe operators plus `apply_free(active_idx, v)` per iteration re-gather `X[:, active]` on every iteration (two fancy-indexed copies of the 1213x494 data per matvec). Iteration counts were unchanged, wall clock regressed ~8x with shrinkage and ~16x without (Jebel-Quant/mean_variance_solvers benchmarks; see #72).

This PR keeps the cvx-linalg operator design but restores the pre-#71 hoisting: `_system_operator(active)` builds the `SumOperator` on pre-sliced contiguous factors once per outer step, and the matvec is a plain `sigma.matvec(v)`.

S&P 500, LW alpha=0.5 solve_cg iters
main (post-#71) 0.0475 s 82
this PR 0.0063 s 83
pre-#71 (d1a15b6) 0.0058 s 82

Notes

  • The same anti-pattern exists wherever `apply_free` is called per-iteration on a full-universe `GramOperator`; a `restricted(free)` API on cvx-linalg's operators (returning a pre-sliced operator) would fix the whole class, including nncg's inner solves. Happy to file that upstream.
  • Full suite: 379 passed.

🤖 Generated with Claude Code

The operator adoption (#71) built full-universe operators and called
apply_free(active_idx, v) inside the CG matvec, which re-gathers the
columns of X (and the target blocks) on every iteration: ~10 MB of
memcpy per iteration against ~50 us of useful GEMV, an ~8x wall-clock
regression at identical iteration counts (16x without shrinkage).

_system_operator now takes the active mask and builds the SumOperator
on pre-sliced contiguous factors (X[:, active], U_k[active, :], the
dense target block), restoring the pre-#71 hoisting; the per-iteration
matvec is a plain operator product. Fixes #72.

Benchmark (S&P 500, n=494, T=1213, LW alpha=0.5, best of 3):
  before  0.0475 s / 82 iters
  after   0.0063 s / 83 iters
  parent of #71: 0.0058 s / 82 iters

Full suite: 379 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 3, 2026 15:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restores pre-#71 performance for CG/PCG inner solves by hoisting active-set slicing out of the per-iteration matvec and into a once-per-outer-step _system_operator(active) construction, so CG iterations run on pre-sliced contiguous factors.

Changes:

  • Change _system_operator to accept the active mask and build operators over pre-sliced X[:, active] / target blocks once per outer step.
  • Update _cg_step and _pcg_step to use sigma.matvec(v) directly (removing apply_free(active_idx, v) from the CG loop).
  • Expand _system_operator docstring to document the performance motivation for hoisting slicing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +210 to +211
Nothing is formed at ``n_a x n_a``; without a target the data term
carries the full weight.
@tschm tschm merged commit a3a8c49 into main Jul 3, 2026
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants