Parallel Matrix Operations

Currently only sparse by sparse products are parallel in the `smmp` module. Converting the current sparse by dense products using `ndarray::parallel` should be straight forward. Here is an implementation for `par_csr_mulacc_dense_colmaj` that gives a significant speedup on my machine:

```rust
pub fn par_csr_mulacc_dense_colmaj<'a, N, A, B, I, Iptr>(
    lhs: CsMatViewI<A, I, Iptr>,
    rhs: ArrayView<B, Ix2>,
    mut out: ArrayViewMut<'a, N, Ix2>,
) where
    A: Send + Sync,
    B: Send + Sync,
    N: 'a + crate::MulAcc<A, B>  + Send + Sync,
    I: 'a + SpIndex,
    Iptr: 'a + SpIndex,
{
    assert_eq!(lhs.cols(), rhs.shape()[0], "Dimension mismatch");
    assert_eq!(lhs.rows(), out.shape()[0], "Dimension mismatch");
    assert_eq!(rhs.shape()[1], out.shape()[1], "Dimension mismatch");
    assert!(lhs.is_csr(), "Storage mismatch");

    let axis1 = Axis(1);
    ndarray::Zip::from(out.axis_iter_mut(axis1))
        .and(rhs.axis_iter(axis1))
        .par_for_each(|mut ocol, rcol| {
            for (orow, lrow) in lhs.outer_iterator().enumerate() {
                let oval = &mut ocol[[orow]];
                for (rrow, lval) in lrow.iter() {
                    let rval = &rcol[[rrow]];
                    oval.mul_acc(lval, rval);
                }
            }
        });
}
```

The only changes here are the parallel iterator, adding the `rayon` feature for `ndarray`,  and adding the `Sync` and `Send` trait bounds to the data types inside the matrices. My concern is that adding `Send + Sync` will result in these trait requirements to be unnecessarily added in many places.

Looking at the `impl Mul` for `CsMatBase` and `CsMatBase` I see that `Sync + Send` is required no matter if `multi_thread` is enabled or not. Is it okay to propagate these trait requirements all the way up to many of the trait `impl`s for `CsMatBase` and then use the conditional feature compilation on the lowest level functions found in the `prod` module? Conditionally compiling at all the higher level implementations sounds like it would get nasty very quickly, especially as more parallel features get added.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Matrix Operations #298

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallel Matrix Operations #298

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions