Optimize debias model training by removing extra GridSearchCV refit and vectorizing RMSE blend-weight search

I noticed a few small efficiency improvements that can be made in the debias model training file under pecan_debias.

1. In _fit_extratrees(X, y),  ExtraTreesRegressor is refit after GridSearchCV, even though GridSearchCV already refits the best estimator since by default refit=True. The additional fit() call is extra.

2. The blend-weight RMSE search currently uses a for loop over candidate weights. This can be vectorized using NumPy to improve performance.

3. The current Cross-Validation `max(2, min(5, len(y)))` is not optimised for small dataset. If `cv = min(5, max(2, len(y)//2))` is used like this, the number of folds will be reasonable.

These changes do not alter model behavior but improve computational efficiency and stability.

If this approach looks reasonable, I would be happy to open a PR implementing these updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize debias model training by removing extra GridSearchCV refit and vectorizing RMSE blend-weight search #3907

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optimize debias model training by removing extra GridSearchCV refit and vectorizing RMSE blend-weight search #3907

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions