Ideas for improving the training pipeline beyond what ptgp currently ships (compile_training_step with Adam, compile_scipy_objective with L-BFGS-B, minimize_staged_vfe).
- Natural gradients for SVGP — update
q_mu/q_sqrt in natural parameter space. Typically 10-100x fewer iterations to the same ELBO. The param_groups infrastructure already supports separate learning rates; this needs a custom optimizer that computes the Fisher-metric inverse for the variational parameters.
- Adam warmup + L-BFGS-B polish — Adam for the first ~200 steps to get into the right basin, then L-BFGS-B for precise convergence. Handles mixed-scale optimization (hyperparams + inducing points) more robustly than L-BFGS-B alone.
What else?
What other training techniques or optimizer improvements would be useful?
Ideas for improving the training pipeline beyond what ptgp currently ships (
compile_training_stepwith Adam,compile_scipy_objectivewith L-BFGS-B,minimize_staged_vfe).q_mu/q_sqrtin natural parameter space. Typically 10-100x fewer iterations to the same ELBO. Theparam_groupsinfrastructure already supports separate learning rates; this needs a custom optimizer that computes the Fisher-metric inverse for the variational parameters.What else?
What other training techniques or optimizer improvements would be useful?