feat: adds uncertainty sampling for training data#60
Open
ShubhamSRath wants to merge 5 commits intoeckelsjd:mainfrom
Open
feat: adds uncertainty sampling for training data#60ShubhamSRath wants to merge 5 commits intoeckelsjd:mainfrom
ShubhamSRath wants to merge 5 commits intoeckelsjd:mainfrom
Conversation
eckelsjd
requested changes
Nov 25, 2025
Owner
eckelsjd
left a comment
There was a problem hiding this comment.
This looks like a good start -- the changes are localized to training.py which is good.
Here are a few high-level remarks/issues:
- It feels like uncertainty sampling should be independent of the specific
Interpolatormethod. For example, what if I want to use a neural network or linear regression for prediction, but I want the experimental design to be based on GPR-like uncertainty metrics? This first attempt at passing the GPR state toTrainingDatais intuitive, but it feels like a special case rather than a general rule. Since GPR training really isn't that expensive, maybe it makes sense when doingTrainingData.refineto just train a quick GPR purely for the sake of experimental design via uncertainty sampling -- it already has access to all the training data. - Uncertainty sampling via GPR usually samples points over the full N-dimensional domain, but here it is being used in
collocation_1d. I do not quite understand how this is working. I understand wanting to reuseSparseGridand just replace thelejaoption, but theSparseGridrelies on the ability to refine a single dimension at a time -- it's not clear how GPR is being used to accomplish this. In addition, if we really are only refining 1 dimension, then the first bullet point about running GPR on-the-fly will be even easier. - There needs to be test cases for the new functionality. At a bare minimum, we need a test case for the new sampling method by itself, then within an
Interpolator, thenComponent, and possibly one integration test within a fullSystem.
A few smaller comments:
- We should really try to avoid changing function signatures as much as possible. For example, I don't see a need to change
collocation_1d. And given the first bullet point above, it might not be necessary to changeTrainingData.refine()either. - The params that control the uncertainty sampling method should be grouped into something like
collocation_opts, very similar to how it was done for the GPR interpolator object. It seems some of these opts may be custom rather than passed directly to sklearn, in which case we might need a TypedDict for better documentation of what the opts do. I wouldn't worry about this until all ther other points are addressed first though.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes: