Skip to content

Add distributed Ginkgo solver#522

Merged
greole merged 15 commits into
developfrom
enh/distributedGko
May 30, 2026
Merged

Add distributed Ginkgo solver#522
greole merged 15 commits into
developfrom
enh/distributedGko

Conversation

@greole
Copy link
Copy Markdown
Contributor

@greole greole commented May 23, 2026

Adds distributed (MPI) Ginkgo solver support to NeoN. The LinearSystem template is extended so the RHS value type can differ from the matrix value type, and it now carries a CommunicationPattern under MPI; Solver::solve dispatches to new solveDist overloads when a non-empty pattern is present. A new ginkgoDistributed.cpp builds a Ginkgo distributed matrix/vector from the local CSR plus the off-diagonal COO and runs scalar (and per-component Vec3) distributed solves. A SetReference post-assembly DSL functor is added to pin a reference cell so pure-Neumann systems are nonsingular, and the distributed operator test now validates the distributed solve against a global one.

Changes:

  • Add distributed Ginkgo matrix assembly + scalar/Vec3 solveDist paths, gated on NF_WITH_GINKGO and NF_WITH_MPI_SUPPORT.
  • Generalize LinearSystem with a separate RHSValueType and store/copy a CommunicationPattern; rebuild off-diagonal sparsity in createEmptyLinearSystem from the communication pattern.
  • Add SetReference post-assembly hook (pointer-based ps API across Expression/solver) and rework Ginkgo helpers (gkoVecView, retrieve) into the header; also add a (currently unused) StoppingCriterion/DistStoppingCriterion source file.

@greole greole changed the base branch from develop to enh/procOperatorImpl May 23, 2026 19:41
@github-actions
Copy link
Copy Markdown

Thank you for your PR, here are some useful tips:

@greole greole force-pushed the enh/procOperatorImpl branch from 0d5dd9e to a3a0153 Compare May 26, 2026 17:09
@greole greole changed the title Enh/distributed gko Add distributed Ginkgo solvers May 26, 2026
@greole greole changed the base branch from enh/procOperatorImpl to develop May 26, 2026 20:50
@greole greole force-pushed the enh/distributedGko branch from 9555607 to 59caf1b Compare May 26, 2026 20:51
@greole greole linked an issue May 27, 2026 that may be closed by this pull request
@greole greole requested a review from Copilot May 27, 2026 08:37
@greole greole marked this pull request as ready for review May 27, 2026 08:37
@greole greole force-pushed the enh/distributedGko branch from a072087 to 84d57f7 Compare May 27, 2026 08:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds distributed (MPI) Ginkgo solver support to NeoN. The LinearSystem template is extended so the RHS value type can differ from the matrix value type, and it now carries a CommunicationPattern under MPI; Solver::solve dispatches to new solveDist overloads when a non-empty pattern is present. A new ginkgoDistributed.cpp builds a Ginkgo distributed matrix/vector from the local CSR plus the off-diagonal COO and runs scalar (and per-component Vec3) distributed solves. A SetReference post-assembly DSL functor is added to pin a reference cell so pure-Neumann systems are nonsingular, and the distributed operator test now validates the distributed solve against a global one.

Changes:

  • Add distributed Ginkgo matrix assembly + scalar/Vec3 solveDist paths, gated on NF_WITH_GINKGO and NF_WITH_MPI_SUPPORT.
  • Generalize LinearSystem with a separate RHSValueType and store/copy a CommunicationPattern; rebuild off-diagonal sparsity in createEmptyLinearSystem from the communication pattern.
  • Add SetReference post-assembly hook (pointer-based ps API across Expression/solver) and rework Ginkgo helpers (gkoVecView, retrieve) into the header; also add a (currently unused) StoppingCriterion/DistStoppingCriterion source file.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
include/NeoN/linearAlgebra/ginkgo.hpp Moves gkoArrayView/gkoVecView/retrieve into the header and adds solveDist declarations plus the scalar-matrix/Vec3-rhs overload on GinkgoSolver.
include/NeoN/linearAlgebra/solver.hpp Adds default-throwing SolverFactory virtuals for the new overloads and dispatches scalar/Vec3 Solver::solve to solveDist when commPattern().sendCounts is non-empty.
include/NeoN/linearAlgebra/linearSystem.hpp Adds RHSValueType template parameter, optional CommunicationPattern member, new convenience ctor, and rebuilds off-diagonal sparsity in createEmptyLinearSystem from recvIdx.
include/NeoN/dsl/expression.hpp Makes PostAssemblyBase::operator() const, switches post-assembly list to pointer-vector, and adds the new SetReference functor.
include/NeoN/dsl/solver.hpp Updates iterativeSolveImpl/solve signatures to take a pointer-vector of post-assembly hooks.
src/linearAlgebra/ginkgo/ginkgo.cpp Generalizes solve_impl to be templated on vector type, drops the now-header helpers, and adds the scalar-matrix Vec3-rhs solve via segregated component solves.
src/linearAlgebra/ginkgo/ginkgoDistributed.cpp New: distributed Ginkgo vector/matrix view builders, distributed solve_impl_dist, and GinkgoSolver::solveDist for scalar and Vec3 systems.
src/linearAlgebra/ginkgo/ginkgoL1Stop.cpp New: defines a distributed L1-style stopping criterion class that is not wired into any solver yet.
src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp Reorders boundary-data fetches and uses views(...) (with a typo-fix from bFaceAreas to bFaceArea at the use site).
src/CMakeLists.txt Repaths ginkgo.cpp under the new ginkgo/ subdir and adds the two new distributed Ginkgo sources under NeoN_WITH_MPI.
test/linearAlgebra/ginkgo.cpp Adds tests for gkoVecView scalar/Vec3, mutable/const overloads and a scalar-matrix multi-Vec3-rhs solve.
test/distributed/operator.cpp Renames helpers, uses SetReference, switches to GMRES with 200 iters, and compares distributed final residual + per-rank solution slice against a global solve.
CHANGELOG.md Adds PR #522 to the experimental MPI support entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +12 to +246
class StoppingCriterion
{
using vec = gko::matrix::Dense<scalar>;
using mtx = gko::matrix::Csr<scalar>;
using val_array = gko::array<scalar>;
using idx_array = gko::array<localIdx>;

using dist_vec = gko::experimental::distributed::Vector<scalar>;

class DistStoppingCriterion :
public gko::EnablePolymorphicObject<DistStoppingCriterion, gko::stop::Criterion>
{
friend class gko::EnablePolymorphicObject<DistStoppingCriterion, gko::stop::Criterion>;
using Criterion = gko::stop::Criterion;

public:

GKO_CREATE_FACTORY_PARAMETERS(parameters, Factory)
{
/**
* Boolean set by the user to stop the iteration process
*/
// TODO check why GKO_FACTORY_PARAMETER_SCALAR does not work
scalar GKO_FACTORY_PARAMETER(absolute_tolerance, 1.0e-6);

scalar GKO_FACTORY_PARAMETER(relative_tolerance, 0.0);

localIdx GKO_FACTORY_PARAMETER(minIter, 0);

localIdx GKO_FACTORY_PARAMETER(maxIter, 0);

localIdx GKO_FACTORY_PARAMETER(frequency, 1);

std::add_pointer<localIdx>::type GKO_FACTORY_PARAMETER_SCALAR(iter, NULL);

std::add_pointer<scalar>::type GKO_FACTORY_PARAMETER_SCALAR(time, NULL);

std::add_pointer<scalar>::type GKO_FACTORY_PARAMETER_SCALAR(residual_norm, NULL);

std::shared_ptr<vec> GKO_FACTORY_PARAMETER_SCALAR(residual_norms, {});

std::add_pointer<scalar>::type GKO_FACTORY_PARAMETER_SCALAR(init_residual_norm, NULL);

localIdx GKO_FACTORY_PARAMETER(verbose, 0);

bool GKO_FACTORY_PARAMETER(export_res, false);

std::shared_ptr<const gko::LinOp> GKO_FACTORY_PARAMETER(gkomatrix, {});

std::shared_ptr<dist_vec> GKO_FACTORY_PARAMETER(x, {});

std::shared_ptr<dist_vec> GKO_FACTORY_PARAMETER(b, {});
};

GKO_ENABLE_CRITERION_FACTORY(DistStoppingCriterion, parameters, Factory);

GKO_ENABLE_BUILD_METHOD(Factory);

/* Compute the SpMV of A with x_ref, where x_ref is a vector containing
* the average of x in every row. This is needed to initialise the
* normfactor in the first iteration.
* */
void compute_Axref_dist(
size_t global_size,
size_t local_size,
std::shared_ptr<const gko::Executor> device_exec,
std::shared_ptr<const gko::LinOp> gkomatrix,
std::shared_ptr<const dist_vec> x,
std::shared_ptr<dist_vec> res
) const;

/* Compute the normfactor ie || Ax - x* || + || b - x* ||
* or rewritten as || r - ( b - x* ) || + || (b - x*) ||
* */
scalar compute_normfactor_dist(
std::shared_ptr<const gko::Executor> device_exec,
const dist_vec* r,
std::shared_ptr<const gko::LinOp> gkomatrix,
std::shared_ptr<const dist_vec> x,
std::shared_ptr<const dist_vec> b
) const;

/* Implementation of the residual norm check
* */
bool check_impl(
gko::uint8 stoppingId,
bool setFinalized,
gko::array<gko::stopping_status>* stop_status,
bool* one_changed,
const Criterion::Updater& updater
) override;


explicit DistStoppingCriterion(std::shared_ptr<const gko::Executor> exec)
: EnablePolymorphicObject<DistStoppingCriterion, Criterion>(std::move(exec))
{}

explicit DistStoppingCriterion(const Factory* factory, const gko::stop::CriterionArgs&)

: EnablePolymorphicObject<DistStoppingCriterion, Criterion>(factory->get_executor()),
parameters_ {factory->get_parameters()}
{}

void set_eval_norm_factor(bool eval_norm_factor) { eval_norm_factor_ = eval_norm_factor; }

mutable bool first_iter_ = true;

mutable scalar norm_factor_ = 1;

mutable bool eval_norm_factor_ = true;

mutable std::vector<scalar> res_norms_ {};
};

mutable localIdx maxIter_;

const localIdx minIter_;

const scalar tolerance_;

const scalar relTol_;

const scalar res_norm_eval_;

const localIdx norm_eval_limit_;

const localIdx frequency_;

const scalar relaxationFactor_;

const bool adapt_minIter_;

const std::shared_ptr<vec> normalised_res_norms_;

mutable scalar init_normalised_res_norm_;

mutable scalar normalised_res_norm_;

mutable localIdx iter_;

mutable scalar time_;

public:

StoppingCriterion(const Dictionary& controlDict)
: maxIter_(controlDict.get<localIdx>("maxIter", 1000)),
minIter_(controlDict.get<localIdx>("minIter", 0)),
tolerance_(controlDict.get<scalar>("tolerance", 1e-6)),
relTol_(controlDict.get<scalar>("relTol", 1e-6)),
res_norm_eval_(controlDict.get<scalar>("resNormEval", 0.1)),
norm_eval_limit_(controlDict.get<localIdx>("normEvalLimit", 100)),
frequency_(controlDict.get<localIdx>("evalFrequency", 1)),
relaxationFactor_(controlDict.get<scalar>("relaxationFactor", 0.6)),
adapt_minIter_(controlDict.get<bool>("adaptMinIter", true)),
normalised_res_norms_(gko::share(vec::create(
gko::ReferenceExecutor::create(),
gko::dim<2> {static_cast<gko::dim<2>::dimension_type>(maxIter_), 1}
))),
init_normalised_res_norm_(0), normalised_res_norm_(0), iter_(0), time_(0)
{
normalised_res_norms_->fill(0.0);
if (controlDict.get<std::string>("solver") == "GKOBiCGStab") maxIter_ *= 2;
}

std::shared_ptr<const gko::stop::CriterionFactory> build_dist_stopping_criterion(
std::shared_ptr<gko::Executor> device_exec,
std::shared_ptr<const gko::LinOp> gkomatrix,
std::shared_ptr<dist_vec> x,
std::shared_ptr<dist_vec> b,
label verbose,
bool export_res,
label prev_solve_iters,
scalar prev_rel_cost
) const
{
std::string frequencyMode = "optimizer";
label minIter = minIter_;
label frequency = frequency_;
// in case of export_res all residuals need to be computed
if (!export_res)
{
if (prev_solve_iters > 0 && adapt_minIter_ && prev_rel_cost > 0)
{
minIter = prev_solve_iters * relaxationFactor_;
if (frequencyMode == "optimizer")
{
auto alpha =
sqrt(1.0 / (prev_solve_iters * (1.0 - relaxationFactor_)) * prev_rel_cost);
frequency = std::min(norm_eval_limit_, std::max(1, localIdx(1 / alpha)));
}
if (frequencyMode == "relative")
{
frequency = localIdx(prev_solve_iters * 0.075) + 1;
}
}
}

std::string msg = "\nCreating stopping criterion\n\tminIter: " + std::to_string(minIter)
+ "\n\tfrequency: " + std::to_string(frequency) + "\n\tprev_solve_iters: "
+ std::to_string(prev_solve_iters) + "\n\tadapt_minIter: "
+ std::to_string(adapt_minIter_) + "\n\tprev_rel_cost: ";

NeoN::Logging::info(msg);

return DistStoppingCriterion::build()
.with_absolute_tolerance(tolerance_)
.with_relative_tolerance(relTol_)
.with_minIter(minIter)
.with_maxIter(maxIter_)
.with_frequency(frequency)
.with_verbose(verbose)
.with_export_res(export_res)
.with_init_residual_norm(&init_normalised_res_norm_)
.with_residual_norm(&normalised_res_norm_)
.with_residual_norms(normalised_res_norms_)
.with_iter(&iter_)
.with_time(&time_)
.with_gkomatrix(gkomatrix)
.with_x(x)
.with_b(b)
.on(device_exec);
}

scalar get_init_res_norm() const { return init_normalised_res_norm_; }

scalar get_res_norm() const { return normalised_res_norm_; }

std::shared_ptr<vec> get_res_norms() const { return normalised_res_norms_; }

label get_is_final() const { return relTol_ == 0.0; }

label get_num_iters() const { return iter_; }

scalar get_res_norm_time() const { return time_; }
};

#endif

} // namespace Foam
Comment thread include/NeoN/dsl/expression.hpp
Comment on lines +125 to 132
SolverStats solve(
const LinearSystem<scalar, CSRMatrix<scalar, localIdx>, COOMatrix<scalar, localIdx>, Vec3>&
ls,
Vector<Vec3>& field
) const
{
return solverInstance_->solve(ls, field);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also flagged by my session.

Comment on lines +167 to +177
LinearSystem<MatrixValueType, SystemMatrixType, BoundaryMatrixType, RHSValueType> ls {
matrix_.copyToExecutor(exec),
rhs_.copyToExecutor(exec),
offDiagonalMatrix_.copyToExecutor(exec),
boundaryMatrix_.copyToExecutor(exec),
boundaryRhs_.copyToExecutor(exec)
};
#ifdef NF_WITH_MPI_SUPPORT
ls.commPattern_ = commPattern_;
#endif
return ls;
Comment on lines +228 to +248
template<unsigned int I>
void solveComponentDist(auto& sys, auto& x, auto& exec, auto& factory, auto& stats)
{
auto rhs = getComponent<I>(sys.rhs());
auto xcopy = getComponent<I>(x);
auto values = getComponent<I>(sys.matrix().values());
auto sparsity = sys.matrix().sparsity();
auto mtx = CSRMatrix<scalar, localIdx> {values, sparsity};

auto nonLocalValues = getComponent<I>(sys.offDiagonalMatrix().values());
auto nonLocalSparsity = sys.offDiagonalMatrix().sparsity();
auto nonLocalMtx = COOMatrix<scalar, localIdx> {nonLocalValues, nonLocalSparsity};

const CommunicationPattern& commPattern = sys.commPattern();
bool forceHostBuffer = false;
auto comm = gko::experimental::mpi::communicator(commPattern.env.comm(), forceHostBuffer);
auto gkoMtx = createGkoMtxDist(exec, comm, mtx, nonLocalMtx, commPattern);
auto solver = factory->generate(gkoMtx);
stats.entries.push_back(solve_impl_dist(exec, comm, rhs, xcopy, gkoMtx, std::move(solver)));
setComponent<I>(xcopy, x);
}
@greole greole requested review from HendriceH and lupeterm May 27, 2026 08:54
@greole greole added the ready-for-review Set this label to indicate that the PR is ready for review label May 27, 2026
Copy link
Copy Markdown
Collaborator

@lupeterm lupeterm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Collaborator

@HendriceH HendriceH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the processorBC (not in this PR) we also might need a fence? The proc boundary data is filled and directly communicated afterwards, in processor.hpp:

virtual void correctBoundaryCondition([[maybe_unused]] Field<ValueType>& domainVector) final
    {
        detail::setProcBoundaryValue(domainVector, mesh_, this->range());
#ifdef NF_WITH_MPI_SUPPORT
        const int neighborRank =
            static_cast<int>(mesh_.boundaryMesh().neighbourRankForRange(this->range()));
        domainVector.boundaryData().communicate(this->range(), neighborRank);
#endif
    }

Comment on lines +114 to +118
const auto numNonLocalElements = imap.get_non_local_size();

auto non_loc_vals = gko::array<scalar>::const_view(
exec, static_cast<gko::size_type>(numNonLocalElements), bmtx.values().data()
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review from my Claude:
The distributed matrix silently drops off-diagonal entries. The non-local COO block is sized from imap.get_non_local_size() (count of unique remote columns), but the values/rows/cols are taken as the first numNonLocalElements entries of bmtx (which has nProcFaces entries). This is only correct when #unique remote neighbour cells == #processor faces. The moment a local cell couples to the same remote cell through more than one face (non-convex partition boundaries, baffles, certain scotch/metis cuts), get_non_local_size() < nProcFaces, and the trailing couplings are dropped and the value slice misaligns with the column map — silent wrong matrix, converging solver, wrong answer. Use bmtx.values().size() (= nProcFaces) as the COO nnz and let the index_map handle column deduplication.

And I think this could be true? During my tests, hierarchical behaved much better than scotch.

Comment on lines +125 to 132
SolverStats solve(
const LinearSystem<scalar, CSRMatrix<scalar, localIdx>, COOMatrix<scalar, localIdx>, Vec3>&
ls,
Vector<Vec3>& field
) const
{
return solverInstance_->solve(ls, field);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also flagged by my session.

@greole
Copy link
Copy Markdown
Contributor Author

greole commented May 27, 2026

In the processorBC (not in this PR) we also might need a fence? The proc boundary data is filled and directly communicated afterwards, in processor.hpp:

virtual void correctBoundaryCondition([[maybe_unused]] Field<ValueType>& domainVector) final
    {
        detail::setProcBoundaryValue(domainVector, mesh_, this->range());
#ifdef NF_WITH_MPI_SUPPORT
        const int neighborRank =
            static_cast<int>(mesh_.boundaryMesh().neighbourRankForRange(this->range()));
        domainVector.boundaryData().communicate(this->range(), neighborRank);
#endif
    }

this currently aims for non blocking communication. the communicate kicks of the exchange in a peer to peer fashion. however when you want to read a value from boundaryData it'll wait until all communication is ready.

Edit: @HendriceH ah i see your point the fence should be before communicate.

@greole greole added build Everything related to building NF and removed build Everything related to building NF labels May 28, 2026
@greole greole force-pushed the enh/distributedGko branch from ec4eb50 to c84ea7f Compare May 28, 2026 19:13
@greole greole added build Everything related to building NF and removed build Everything related to building NF labels May 28, 2026
@greole greole added build Everything related to building NF and removed build Everything related to building NF labels May 29, 2026
@greole greole changed the title Add distributed Ginkgo solvers Add distributed Ginkgo solver May 29, 2026
@greole greole added build Everything related to building NF and removed build Everything related to building NF labels May 29, 2026
@greole greole merged commit b42b40a into develop May 30, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review Set this label to indicate that the PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check postAssmebly functor implementation

4 participants