Training techniques: natural gradients, hybrid optimizers, and more

Ideas for improving the training pipeline beyond what ptgp currently ships (`compile_training_step` with Adam, `compile_scipy_objective` with L-BFGS-B, `minimize_staged_vfe`).

- **Natural gradients for SVGP** — update `q_mu`/`q_sqrt` in natural parameter space. Typically 10-100x fewer iterations to the same ELBO. The `param_groups` infrastructure already supports separate learning rates; this needs a custom optimizer that computes the Fisher-metric inverse for the variational parameters.
- **Adam warmup + L-BFGS-B polish** — Adam for the first ~200 steps to get into the right basin, then L-BFGS-B for precise convergence. Handles mixed-scale optimization (hyperparams + inducing points) more robustly than L-BFGS-B alone.

## What else?

What other training techniques or optimizer improvements would be useful?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training techniques: natural gradients, hybrid optimizers, and more #13

What else?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Training techniques: natural gradients, hybrid optimizers, and more #13

Description

What else?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions