Skip to content

New inverse stream map to accelerate convergence#1919

Open
unalmis wants to merge 22 commits intomasterfrom
ku/bounce
Open

New inverse stream map to accelerate convergence#1919
unalmis wants to merge 22 commits intomasterfrom
ku/bounce

Conversation

@unalmis
Copy link
Collaborator

@unalmis unalmis commented Sep 18, 2025

Inverse stream maps

Other performance improvements

  • Check-pointing to increase speed and reduce memory consumption of reverse mode differentiation Checkpointing to reduce reverse mode AD memory usage #1347.
  • Adds low_ram mode which is same speed and less memory for objective.compute, but slower for objective.grad since JAX is poor at iterative algorithms.
  • Fully resolves Memory regression in bounce integrals #1864 by avoiding materialization of a large tensor in memory. Previously, we had closed the issue by adding nuffts as a workaround. The improvement here actually solves the JAX regression.
  • Reuses some computations in identifying bounce points to make more efficient.
  • Various small improvements to increase cache hits, fusing, and reduce floating point error.
  • Transforms an improper field line integral to one on a compact domain where the integrand is periodic to achieve faster convergence.
  • Improves performance complexity of interp_to_argmin for Bounce2D from h^4 to spectral.
  • Resolves Use OOP for surface integrals with faster methods for tensor product grids #1389 (the non-cosmetic part)

Usability

Bugs

Benchmarks

Just go to #2026 and run effective_ripple_profile.py. You will see the large performance improvement from master. The CI benchmarks do not reveal this because those benchmarks are essentially just noise. Note that, using the same parameter inputs, the resolution of this branch is also higher than master due to the faster convergence.

  • If you set use_bounce1d=True on that script, you will run out of memory as expected since it is an inferior approach (as expected, you get the OOM in the jacobian before you compute a single bounce integral).
  • If you set nufft_eps=0, you need 175 GB to run that script on master (you'll get an OOM and JAX will tell you it needs 175GB), but only 35 GB on this branch.
  • Using nuffts, the script requires only 6.7 GB on this branch.

Examples

HELIOTRON

Master branch

test_theta_chebyshev_HELIOTRON

This branch

test_delta_fourier_chebyshev_HELIOTRON

W7-X

Master branch

test_theta_chebyshev_W7-X

This branch

test_delta_fourier_chebyshev_W7-X

NCSX

Master branch

test_theta_chebyshev_NCSX

This branch

test_delta_fourier_chebyshev_NCSX

Removal of spectral aliasing

Figure_1

@unalmis unalmis self-assigned this Sep 18, 2025
@unalmis unalmis added performance New feature or request to make the code faster robustness Make the code more robust labels Sep 18, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@unalmis unalmis changed the base branch from master to ku/nufft September 18, 2025 07:45
@unalmis unalmis marked this pull request as draft September 18, 2025 07:47
@unalmis unalmis added the theory Requires theory work before coding label Sep 18, 2025
@unalmis unalmis changed the title New inverse stream maps to accelerate convergence New inverse stream map to accelerate convergence Sep 18, 2025
@unalmis unalmis added the bug fix Something was fixed label Sep 18, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 18, 2025

Memory benchmark result

|               Test Name                |      %Δ      |    Master (MB)     |      PR (MB)       |    Δ (MB)    |    Time PR (s)     |  Time Master (s)   |
| -------------------------------------- | ------------ | ------------------ | ------------------ | ------------ | ------------------ | ------------------ |
  test_objective_jac_w7x                 |    5.94 %    |     3.826e+03      |     4.053e+03      |    227.34    |       39.46        |       36.44        |
  test_proximal_jac_w7x_with_eq_update   |   -0.22 %    |     6.493e+03      |     6.479e+03      |    -14.16    |       162.81       |       160.60       |
  test_proximal_freeb_jac                |   -0.09 %    |     1.321e+04      |     1.320e+04      |    -11.76    |       84.51        |       82.10        |
  test_proximal_freeb_jac_blocked        |   -0.67 %    |     7.528e+03      |     7.478e+03      |    -50.43    |       73.78        |       73.06        |
  test_proximal_freeb_jac_batched        |    0.52 %    |     7.451e+03      |     7.490e+03      |    38.51     |       73.08        |       72.86        |
  test_proximal_jac_ripple               |   -4.61 %    |     3.531e+03      |     3.368e+03      |   -162.74    |       62.58        |       65.67        |
  test_proximal_jac_ripple_bounce1d      |   -0.35 %    |     3.583e+03      |     3.570e+03      |    -12.63    |       74.62        |       76.51        |
  test_eq_solve                          |    1.29 %    |     1.992e+03      |     2.017e+03      |    25.64     |       94.31        |       94.11        |

For the memory plots, go to the summary of Memory Benchmarks workflow and download the artifact.

@unalmis unalmis removed bug fix Something was fixed theory Requires theory work before coding labels Sep 18, 2025
@unalmis unalmis changed the title New inverse stream map to accelerate convergence better inverse stream map to accelerate convergence Sep 20, 2025
@unalmis unalmis changed the title better inverse stream map to accelerate convergence new inverse stream map to accelerate convergence Sep 20, 2025
@unalmis unalmis added the theory Requires theory work before coding label Sep 20, 2025
@unalmis unalmis force-pushed the ku/bounce branch 2 times, most recently from a6d949b to d685405 Compare September 22, 2025 04:33
@unalmis unalmis removed the theory Requires theory work before coding label Sep 22, 2025
@unalmis unalmis added the P3 Highest Priority, someone is/should be actively working on this label Sep 23, 2025
@unalmis unalmis dismissed f0uriest’s stale review September 23, 2025 08:44

addressed request

@unalmis unalmis marked this pull request as ready for review September 23, 2025 08:44
@unalmis unalmis requested review from a team, f0uriest and rahulgaur104 and removed request for a team September 23, 2025 08:44
@unalmis
Copy link
Collaborator Author

unalmis commented Jan 21, 2026

I assume there is no math error...

Yes, and I have resolved your other comments concerning cosmetics in the code comments.

@unalmis unalmis requested review from a team, YigitElma, ddudt, dpanici, f0uriest and rahulgaur104 and removed request for a team, YigitElma, ddudt, dpanici, f0uriest and rahulgaur104 January 27, 2026 04:29
When drafting a reply to a reviewer comment, I realized that the atomic
derivative computed by the autodiff tool for the `spline=True` option is
not correct if the bounce point lies near a local maxima. The
`spline=False` option is fine. It is unlikely this would have affected
optimization. 

See section 3 of 

[autodiff.pdf](https://github.com/user-attachments/files/24988182/autodiff.pdf)
@unalmis unalmis requested review from YigitElma and dpanici and removed request for YigitElma, ddudt, dpanici, f0uriest and rahulgaur104 February 13, 2026 02:13
@unalmis unalmis requested a review from f0uriest February 16, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix Something was fixed easy Short and simple to code or review P3 Highest Priority, someone is/should be actively working on this performance New feature or request to make the code faster robustness Make the code more robust run_benchmarks Run timing benchmarks on this PR against current master branch stable Awaiting merge to master. Only updates will be merging from master.

Projects

None yet

4 participants

Comments