Skip to content

Training techniques: natural gradients, hybrid optimizers, and more #13

@bwengals

Description

@bwengals

Ideas for improving the training pipeline beyond what ptgp currently ships (compile_training_step with Adam, compile_scipy_objective with L-BFGS-B, minimize_staged_vfe).

  • Natural gradients for SVGP — update q_mu/q_sqrt in natural parameter space. Typically 10-100x fewer iterations to the same ELBO. The param_groups infrastructure already supports separate learning rates; this needs a custom optimizer that computes the Fisher-metric inverse for the variational parameters.
  • Adam warmup + L-BFGS-B polish — Adam for the first ~200 steps to get into the right basin, then L-BFGS-B for precise convergence. Handles mixed-scale optimization (hyperparams + inducing points) more robustly than L-BFGS-B alone.

What else?

What other training techniques or optimizer improvements would be useful?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions