-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
After the most recent Array API conversion (#9 ), several modules still contain NumPy-only boundaries or are entirely unconverted. This issue tracks those gaps, their root causes, and which ones could be resolved.
Modules with intentional NumPy boundaries
These modules have been partially converted — Array API is used for data manipulation, but a NumPy boundary exists before calling into a library that requires it.
model/refit_kalman.py
| Boundary | Reason | Removable? |
|---|---|---|
_compute_gain() — converts all matrices to numpy before scipy.linalg.solve_discrete_are |
No Array API or even generic DARE solver exists outside scipy | No — would require a pure Array API DARE implementation |
refit() — per-sample mutation loop uses np.linalg.norm on 2-element vectors, scalar element assignment |
Element-wise mutation with Python-level indexing; impractical to vectorise | Possibly — the loop could be vectorised with masked operations, which would also be a performance win |
process/slda.py
| Boundary | Reason | Removable? |
|---|---|---|
_process() — converts to numpy before sklearn.LDA.predict_proba |
sklearn LinearDiscriminantAnalysis requires numpy |
Yes — sklearn 1.8 supports LinearDiscriminantAnalysis with solver="svd" under array_api_dispatch=True (see below) |
_reset_state() — model init and .mat loading use numpy throughout |
Template creation and weight loading are one-time setup | Low priority |
process/adaptive_linear_regressor.py
| Boundary | Reason | Removable? |
|---|---|---|
partial_fit() / _process() — sklearn path converts to numpy before model.partial_fit / model.predict |
sklearn SGDRegressor / PassiveAggressiveRegressor have no Array API support |
No |
partial_fit() / _process() — river path converts to pandas before learn_many / predict_many |
river has no Array API support | No |
dim_reduce/adaptive_decomp.py
| Boundary | Reason | Removable? |
|---|---|---|
partial_fit() — converts to numpy before sklearn.IncrementalPCA.partial_fit |
IncrementalPCA has no Array API support. sklearn supports PCA (batch) with array_api_dispatch=True but not IncrementalPCA |
No (not yet in sklearn) |
_process() — converts to numpy before estimator.transform, then converts back |
Same as above | No |
MiniBatchNMF — same pattern |
MiniBatchNMF has no Array API support in sklearn |
No |
Fully unconverted modules
These modules were not touched by the Array API conversion.
process/linear_regressor.py
- Uses
np.any(np.isnan(...))guard +sklearn.LinearModel.predict sklearn.linear_model.Ridgedoes support Array API withsolver="svd"andarray_api_dispatch=True- The
predictcall would also return arrays in the source namespace under dispatch mode - This module could be made fully Array API compliant if:
- NaN check converted to
xp.any(xp.isnan(...)) sklearn.set_config(array_api_dispatch=True)is enabled- Ridge is configured with
solver="svd"
- NaN check converted to
process/sgd.py
- Uses
np.any(np.isnan(...))guard,.reshape,sklearn.SGDClassifier._predict_proba_lr SGDClassifierhas no Array API support in sklearn- Could trivially convert the NaN check and reshape to xp, but the sklearn boundary would remain
- Low value — the estimator is the bottleneck
process/sklearn.py
- Generic wrapper — accepts arbitrary sklearn/river/hmmlearn models by class name string
- Uses numpy for reshaping, prediction output conversion, axis construction
- Cannot be generically converted because the wrapped model is unknown at code time
- Some specific model classes used through this wrapper (e.g.
Ridge,LinearDiscriminantAnalysis) do support Array API dispatch, but the wrapper cannot assume this
dim_reduce/incremental_decomp.py
- Uses
np.prod(train_msg.data.shape)andnp.asarray(range(...))in_partial_fit_windowed - These are trivial (shape is a Python tuple, range is a Python object) but the module delegates to
adaptive_decomp.pywhich has the sklearn boundary anyway - Low value — the underlying estimator is the bottleneck
process/rnn.py
- PyTorch-native — uses
torch.Tensorthroughout, not Array API - No conversion needed or appropriate; PyTorch is the target backend
Opportunities with sklearn array_api_dispatch=True
sklearn 1.8.0 (currently installed) supports Array API dispatch for a specific set of estimators. The relevant ones for ezmsg-learn are:
| sklearn estimator | Used in | Array API support | Constraint |
|---|---|---|---|
LinearDiscriminantAnalysis |
process/slda.py |
Yes | Requires solver="svd" |
Ridge |
process/linear_regressor.py (via StaticLinearRegressor) |
Yes | Requires solver="svd" |
PCA |
Not directly used (uses IncrementalPCA instead) |
Yes | IncrementalPCA is not supported |
SGDClassifier |
process/sgd.py |
No | — |
SGDRegressor |
process/adaptive_linear_regressor.py |
No | — |
IncrementalPCA |
dim_reduce/adaptive_decomp.py |
No | — |
MiniBatchNMF |
dim_reduce/adaptive_decomp.py |
No | — |
What enabling dispatch would require
- Set
sklearn.set_config(array_api_dispatch=True)at import time or in a config context - For
slda.py: change LDA solver to"svd"(currently"lsqr"for the.matpath; pickle path is user-controlled) - For
linear_regressor.py: ensure Ridge usessolver="svd" - Remove the
np.asarraynumpy boundary in those modules — sklearn would accept and return arrays in the source namespace directly
Risks
array_api_dispatchis marked experimental in sklearn 1.8- Solver constraints (
solver="svd") may change numerical results slightly - The
.mat-loading path inslda.pymanually constructs LDA weights and usessolver="lsqr"withshrinkage="auto"— the"svd"solver does not support shrinkage, so this path cannot be converted - Enabling dispatch globally could have unintended effects on other sklearn usage in the same process
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels