I noticed a few small efficiency improvements that can be made in the debias model training file under pecan_debias.
-
In _fit_extratrees(X, y), ExtraTreesRegressor is refit after GridSearchCV, even though GridSearchCV already refits the best estimator since by default refit=True. The additional fit() call is extra.
-
The blend-weight RMSE search currently uses a for loop over candidate weights. This can be vectorized using NumPy to improve performance.
-
The current Cross-Validation max(2, min(5, len(y))) is not optimised for small dataset. If cv = min(5, max(2, len(y)//2)) is used like this, the number of folds will be reasonable.
These changes do not alter model behavior but improve computational efficiency and stability.
If this approach looks reasonable, I would be happy to open a PR implementing these updates.
I noticed a few small efficiency improvements that can be made in the debias model training file under pecan_debias.
In _fit_extratrees(X, y), ExtraTreesRegressor is refit after GridSearchCV, even though GridSearchCV already refits the best estimator since by default refit=True. The additional fit() call is extra.
The blend-weight RMSE search currently uses a for loop over candidate weights. This can be vectorized using NumPy to improve performance.
The current Cross-Validation
max(2, min(5, len(y)))is not optimised for small dataset. Ifcv = min(5, max(2, len(y)//2))is used like this, the number of folds will be reasonable.These changes do not alter model behavior but improve computational efficiency and stability.
If this approach looks reasonable, I would be happy to open a PR implementing these updates.