Skip to content

Optimize debias model training by removing extra GridSearchCV refit and vectorizing RMSE blend-weight search #3907

@Vibhanshu230

Description

@Vibhanshu230

I noticed a few small efficiency improvements that can be made in the debias model training file under pecan_debias.

  1. In _fit_extratrees(X, y), ExtraTreesRegressor is refit after GridSearchCV, even though GridSearchCV already refits the best estimator since by default refit=True. The additional fit() call is extra.

  2. The blend-weight RMSE search currently uses a for loop over candidate weights. This can be vectorized using NumPy to improve performance.

  3. The current Cross-Validation max(2, min(5, len(y))) is not optimised for small dataset. If cv = min(5, max(2, len(y)//2)) is used like this, the number of folds will be reasonable.

These changes do not alter model behavior but improve computational efficiency and stability.

If this approach looks reasonable, I would be happy to open a PR implementing these updates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions