feat: adds uncertainty sampling for training data by ShubhamSRath · Pull Request #60 · eckelsjd/amisc

ShubhamSRath · 2025-11-18T17:31:12Z

Changes:

Integrated Uncertainty sampling into SparseGrid dataclass for training data
Added two methods for generating training points using GP Uncertainty : (1) Local Variance, (2) Global Variance

eckelsjd

This looks like a good start -- the changes are localized to training.py which is good.

Here are a few high-level remarks/issues:

It feels like uncertainty sampling should be independent of the specific Interpolator method. For example, what if I want to use a neural network or linear regression for prediction, but I want the experimental design to be based on GPR-like uncertainty metrics? This first attempt at passing the GPR state to TrainingData is intuitive, but it feels like a special case rather than a general rule. Since GPR training really isn't that expensive, maybe it makes sense when doing TrainingData.refine to just train a quick GPR purely for the sake of experimental design via uncertainty sampling -- it already has access to all the training data.
Uncertainty sampling via GPR usually samples points over the full N-dimensional domain, but here it is being used in collocation_1d. I do not quite understand how this is working. I understand wanting to reuse SparseGrid and just replace the leja option, but the SparseGrid relies on the ability to refine a single dimension at a time -- it's not clear how GPR is being used to accomplish this. In addition, if we really are only refining 1 dimension, then the first bullet point about running GPR on-the-fly will be even easier.
There needs to be test cases for the new functionality. At a bare minimum, we need a test case for the new sampling method by itself, then within an Interpolator, then Component, and possibly one integration test within a full System.

We should really try to avoid changing function signatures as much as possible. For example, I don't see a need to change collocation_1d. And given the first bullet point above, it might not be necessary to change TrainingData.refine() either.
The params that control the uncertainty sampling method should be grouped into something like collocation_opts, very similar to how it was done for the GPR interpolator object. It seems some of these opts may be custom rather than passed directly to sklearn, in which case we might need a TypedDict for better documentation of what the opts do. I wouldn't worry about this until all ther other points are addressed first though.

ShubhamSRath added 4 commits November 18, 2025 12:26

feat: adds uncertainty sampling for training data

a91aca6

fix: fixes failing test cases due to training data changes

8153c11

chore: retrigger CI

926d54b

chore: retrigger CI

bd31ea6

eckelsjd requested changes Nov 25, 2025

View reviewed changes

feat: adds grouping of variables for training data generation

8c88595