Add distributed Ginkgo solver#522
Conversation
|
Thank you for your PR, here are some useful tips:
|
0d5dd9e to
a3a0153
Compare
9555607 to
59caf1b
Compare
a072087 to
84d57f7
Compare
There was a problem hiding this comment.
Pull request overview
Adds distributed (MPI) Ginkgo solver support to NeoN. The LinearSystem template is extended so the RHS value type can differ from the matrix value type, and it now carries a CommunicationPattern under MPI; Solver::solve dispatches to new solveDist overloads when a non-empty pattern is present. A new ginkgoDistributed.cpp builds a Ginkgo distributed matrix/vector from the local CSR plus the off-diagonal COO and runs scalar (and per-component Vec3) distributed solves. A SetReference post-assembly DSL functor is added to pin a reference cell so pure-Neumann systems are nonsingular, and the distributed operator test now validates the distributed solve against a global one.
Changes:
- Add distributed Ginkgo matrix assembly + scalar/Vec3
solveDistpaths, gated onNF_WITH_GINKGOandNF_WITH_MPI_SUPPORT. - Generalize
LinearSystemwith a separateRHSValueTypeand store/copy aCommunicationPattern; rebuild off-diagonal sparsity increateEmptyLinearSystemfrom the communication pattern. - Add
SetReferencepost-assembly hook (pointer-basedpsAPI acrossExpression/solver) and rework Ginkgo helpers (gkoVecView,retrieve) into the header; also add a (currently unused)StoppingCriterion/DistStoppingCriterionsource file.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| include/NeoN/linearAlgebra/ginkgo.hpp | Moves gkoArrayView/gkoVecView/retrieve into the header and adds solveDist declarations plus the scalar-matrix/Vec3-rhs overload on GinkgoSolver. |
| include/NeoN/linearAlgebra/solver.hpp | Adds default-throwing SolverFactory virtuals for the new overloads and dispatches scalar/Vec3 Solver::solve to solveDist when commPattern().sendCounts is non-empty. |
| include/NeoN/linearAlgebra/linearSystem.hpp | Adds RHSValueType template parameter, optional CommunicationPattern member, new convenience ctor, and rebuilds off-diagonal sparsity in createEmptyLinearSystem from recvIdx. |
| include/NeoN/dsl/expression.hpp | Makes PostAssemblyBase::operator() const, switches post-assembly list to pointer-vector, and adds the new SetReference functor. |
| include/NeoN/dsl/solver.hpp | Updates iterativeSolveImpl/solve signatures to take a pointer-vector of post-assembly hooks. |
| src/linearAlgebra/ginkgo/ginkgo.cpp | Generalizes solve_impl to be templated on vector type, drops the now-header helpers, and adds the scalar-matrix Vec3-rhs solve via segregated component solves. |
| src/linearAlgebra/ginkgo/ginkgoDistributed.cpp | New: distributed Ginkgo vector/matrix view builders, distributed solve_impl_dist, and GinkgoSolver::solveDist for scalar and Vec3 systems. |
| src/linearAlgebra/ginkgo/ginkgoL1Stop.cpp | New: defines a distributed L1-style stopping criterion class that is not wired into any solver yet. |
| src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp | Reorders boundary-data fetches and uses views(...) (with a typo-fix from bFaceAreas to bFaceArea at the use site). |
| src/CMakeLists.txt | Repaths ginkgo.cpp under the new ginkgo/ subdir and adds the two new distributed Ginkgo sources under NeoN_WITH_MPI. |
| test/linearAlgebra/ginkgo.cpp | Adds tests for gkoVecView scalar/Vec3, mutable/const overloads and a scalar-matrix multi-Vec3-rhs solve. |
| test/distributed/operator.cpp | Renames helpers, uses SetReference, switches to GMRES with 200 iters, and compares distributed final residual + per-rank solution slice against a global solve. |
| CHANGELOG.md | Adds PR #522 to the experimental MPI support entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class StoppingCriterion | ||
| { | ||
| using vec = gko::matrix::Dense<scalar>; | ||
| using mtx = gko::matrix::Csr<scalar>; | ||
| using val_array = gko::array<scalar>; | ||
| using idx_array = gko::array<localIdx>; | ||
|
|
||
| using dist_vec = gko::experimental::distributed::Vector<scalar>; | ||
|
|
||
| class DistStoppingCriterion : | ||
| public gko::EnablePolymorphicObject<DistStoppingCriterion, gko::stop::Criterion> | ||
| { | ||
| friend class gko::EnablePolymorphicObject<DistStoppingCriterion, gko::stop::Criterion>; | ||
| using Criterion = gko::stop::Criterion; | ||
|
|
||
| public: | ||
|
|
||
| GKO_CREATE_FACTORY_PARAMETERS(parameters, Factory) | ||
| { | ||
| /** | ||
| * Boolean set by the user to stop the iteration process | ||
| */ | ||
| // TODO check why GKO_FACTORY_PARAMETER_SCALAR does not work | ||
| scalar GKO_FACTORY_PARAMETER(absolute_tolerance, 1.0e-6); | ||
|
|
||
| scalar GKO_FACTORY_PARAMETER(relative_tolerance, 0.0); | ||
|
|
||
| localIdx GKO_FACTORY_PARAMETER(minIter, 0); | ||
|
|
||
| localIdx GKO_FACTORY_PARAMETER(maxIter, 0); | ||
|
|
||
| localIdx GKO_FACTORY_PARAMETER(frequency, 1); | ||
|
|
||
| std::add_pointer<localIdx>::type GKO_FACTORY_PARAMETER_SCALAR(iter, NULL); | ||
|
|
||
| std::add_pointer<scalar>::type GKO_FACTORY_PARAMETER_SCALAR(time, NULL); | ||
|
|
||
| std::add_pointer<scalar>::type GKO_FACTORY_PARAMETER_SCALAR(residual_norm, NULL); | ||
|
|
||
| std::shared_ptr<vec> GKO_FACTORY_PARAMETER_SCALAR(residual_norms, {}); | ||
|
|
||
| std::add_pointer<scalar>::type GKO_FACTORY_PARAMETER_SCALAR(init_residual_norm, NULL); | ||
|
|
||
| localIdx GKO_FACTORY_PARAMETER(verbose, 0); | ||
|
|
||
| bool GKO_FACTORY_PARAMETER(export_res, false); | ||
|
|
||
| std::shared_ptr<const gko::LinOp> GKO_FACTORY_PARAMETER(gkomatrix, {}); | ||
|
|
||
| std::shared_ptr<dist_vec> GKO_FACTORY_PARAMETER(x, {}); | ||
|
|
||
| std::shared_ptr<dist_vec> GKO_FACTORY_PARAMETER(b, {}); | ||
| }; | ||
|
|
||
| GKO_ENABLE_CRITERION_FACTORY(DistStoppingCriterion, parameters, Factory); | ||
|
|
||
| GKO_ENABLE_BUILD_METHOD(Factory); | ||
|
|
||
| /* Compute the SpMV of A with x_ref, where x_ref is a vector containing | ||
| * the average of x in every row. This is needed to initialise the | ||
| * normfactor in the first iteration. | ||
| * */ | ||
| void compute_Axref_dist( | ||
| size_t global_size, | ||
| size_t local_size, | ||
| std::shared_ptr<const gko::Executor> device_exec, | ||
| std::shared_ptr<const gko::LinOp> gkomatrix, | ||
| std::shared_ptr<const dist_vec> x, | ||
| std::shared_ptr<dist_vec> res | ||
| ) const; | ||
|
|
||
| /* Compute the normfactor ie || Ax - x* || + || b - x* || | ||
| * or rewritten as || r - ( b - x* ) || + || (b - x*) || | ||
| * */ | ||
| scalar compute_normfactor_dist( | ||
| std::shared_ptr<const gko::Executor> device_exec, | ||
| const dist_vec* r, | ||
| std::shared_ptr<const gko::LinOp> gkomatrix, | ||
| std::shared_ptr<const dist_vec> x, | ||
| std::shared_ptr<const dist_vec> b | ||
| ) const; | ||
|
|
||
| /* Implementation of the residual norm check | ||
| * */ | ||
| bool check_impl( | ||
| gko::uint8 stoppingId, | ||
| bool setFinalized, | ||
| gko::array<gko::stopping_status>* stop_status, | ||
| bool* one_changed, | ||
| const Criterion::Updater& updater | ||
| ) override; | ||
|
|
||
|
|
||
| explicit DistStoppingCriterion(std::shared_ptr<const gko::Executor> exec) | ||
| : EnablePolymorphicObject<DistStoppingCriterion, Criterion>(std::move(exec)) | ||
| {} | ||
|
|
||
| explicit DistStoppingCriterion(const Factory* factory, const gko::stop::CriterionArgs&) | ||
|
|
||
| : EnablePolymorphicObject<DistStoppingCriterion, Criterion>(factory->get_executor()), | ||
| parameters_ {factory->get_parameters()} | ||
| {} | ||
|
|
||
| void set_eval_norm_factor(bool eval_norm_factor) { eval_norm_factor_ = eval_norm_factor; } | ||
|
|
||
| mutable bool first_iter_ = true; | ||
|
|
||
| mutable scalar norm_factor_ = 1; | ||
|
|
||
| mutable bool eval_norm_factor_ = true; | ||
|
|
||
| mutable std::vector<scalar> res_norms_ {}; | ||
| }; | ||
|
|
||
| mutable localIdx maxIter_; | ||
|
|
||
| const localIdx minIter_; | ||
|
|
||
| const scalar tolerance_; | ||
|
|
||
| const scalar relTol_; | ||
|
|
||
| const scalar res_norm_eval_; | ||
|
|
||
| const localIdx norm_eval_limit_; | ||
|
|
||
| const localIdx frequency_; | ||
|
|
||
| const scalar relaxationFactor_; | ||
|
|
||
| const bool adapt_minIter_; | ||
|
|
||
| const std::shared_ptr<vec> normalised_res_norms_; | ||
|
|
||
| mutable scalar init_normalised_res_norm_; | ||
|
|
||
| mutable scalar normalised_res_norm_; | ||
|
|
||
| mutable localIdx iter_; | ||
|
|
||
| mutable scalar time_; | ||
|
|
||
| public: | ||
|
|
||
| StoppingCriterion(const Dictionary& controlDict) | ||
| : maxIter_(controlDict.get<localIdx>("maxIter", 1000)), | ||
| minIter_(controlDict.get<localIdx>("minIter", 0)), | ||
| tolerance_(controlDict.get<scalar>("tolerance", 1e-6)), | ||
| relTol_(controlDict.get<scalar>("relTol", 1e-6)), | ||
| res_norm_eval_(controlDict.get<scalar>("resNormEval", 0.1)), | ||
| norm_eval_limit_(controlDict.get<localIdx>("normEvalLimit", 100)), | ||
| frequency_(controlDict.get<localIdx>("evalFrequency", 1)), | ||
| relaxationFactor_(controlDict.get<scalar>("relaxationFactor", 0.6)), | ||
| adapt_minIter_(controlDict.get<bool>("adaptMinIter", true)), | ||
| normalised_res_norms_(gko::share(vec::create( | ||
| gko::ReferenceExecutor::create(), | ||
| gko::dim<2> {static_cast<gko::dim<2>::dimension_type>(maxIter_), 1} | ||
| ))), | ||
| init_normalised_res_norm_(0), normalised_res_norm_(0), iter_(0), time_(0) | ||
| { | ||
| normalised_res_norms_->fill(0.0); | ||
| if (controlDict.get<std::string>("solver") == "GKOBiCGStab") maxIter_ *= 2; | ||
| } | ||
|
|
||
| std::shared_ptr<const gko::stop::CriterionFactory> build_dist_stopping_criterion( | ||
| std::shared_ptr<gko::Executor> device_exec, | ||
| std::shared_ptr<const gko::LinOp> gkomatrix, | ||
| std::shared_ptr<dist_vec> x, | ||
| std::shared_ptr<dist_vec> b, | ||
| label verbose, | ||
| bool export_res, | ||
| label prev_solve_iters, | ||
| scalar prev_rel_cost | ||
| ) const | ||
| { | ||
| std::string frequencyMode = "optimizer"; | ||
| label minIter = minIter_; | ||
| label frequency = frequency_; | ||
| // in case of export_res all residuals need to be computed | ||
| if (!export_res) | ||
| { | ||
| if (prev_solve_iters > 0 && adapt_minIter_ && prev_rel_cost > 0) | ||
| { | ||
| minIter = prev_solve_iters * relaxationFactor_; | ||
| if (frequencyMode == "optimizer") | ||
| { | ||
| auto alpha = | ||
| sqrt(1.0 / (prev_solve_iters * (1.0 - relaxationFactor_)) * prev_rel_cost); | ||
| frequency = std::min(norm_eval_limit_, std::max(1, localIdx(1 / alpha))); | ||
| } | ||
| if (frequencyMode == "relative") | ||
| { | ||
| frequency = localIdx(prev_solve_iters * 0.075) + 1; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| std::string msg = "\nCreating stopping criterion\n\tminIter: " + std::to_string(minIter) | ||
| + "\n\tfrequency: " + std::to_string(frequency) + "\n\tprev_solve_iters: " | ||
| + std::to_string(prev_solve_iters) + "\n\tadapt_minIter: " | ||
| + std::to_string(adapt_minIter_) + "\n\tprev_rel_cost: "; | ||
|
|
||
| NeoN::Logging::info(msg); | ||
|
|
||
| return DistStoppingCriterion::build() | ||
| .with_absolute_tolerance(tolerance_) | ||
| .with_relative_tolerance(relTol_) | ||
| .with_minIter(minIter) | ||
| .with_maxIter(maxIter_) | ||
| .with_frequency(frequency) | ||
| .with_verbose(verbose) | ||
| .with_export_res(export_res) | ||
| .with_init_residual_norm(&init_normalised_res_norm_) | ||
| .with_residual_norm(&normalised_res_norm_) | ||
| .with_residual_norms(normalised_res_norms_) | ||
| .with_iter(&iter_) | ||
| .with_time(&time_) | ||
| .with_gkomatrix(gkomatrix) | ||
| .with_x(x) | ||
| .with_b(b) | ||
| .on(device_exec); | ||
| } | ||
|
|
||
| scalar get_init_res_norm() const { return init_normalised_res_norm_; } | ||
|
|
||
| scalar get_res_norm() const { return normalised_res_norm_; } | ||
|
|
||
| std::shared_ptr<vec> get_res_norms() const { return normalised_res_norms_; } | ||
|
|
||
| label get_is_final() const { return relTol_ == 0.0; } | ||
|
|
||
| label get_num_iters() const { return iter_; } | ||
|
|
||
| scalar get_res_norm_time() const { return time_; } | ||
| }; |
|
|
||
| #endif | ||
|
|
||
| } // namespace Foam |
| SolverStats solve( | ||
| const LinearSystem<scalar, CSRMatrix<scalar, localIdx>, COOMatrix<scalar, localIdx>, Vec3>& | ||
| ls, | ||
| Vector<Vec3>& field | ||
| ) const | ||
| { | ||
| return solverInstance_->solve(ls, field); | ||
| } |
There was a problem hiding this comment.
Also flagged by my session.
| LinearSystem<MatrixValueType, SystemMatrixType, BoundaryMatrixType, RHSValueType> ls { | ||
| matrix_.copyToExecutor(exec), | ||
| rhs_.copyToExecutor(exec), | ||
| offDiagonalMatrix_.copyToExecutor(exec), | ||
| boundaryMatrix_.copyToExecutor(exec), | ||
| boundaryRhs_.copyToExecutor(exec) | ||
| }; | ||
| #ifdef NF_WITH_MPI_SUPPORT | ||
| ls.commPattern_ = commPattern_; | ||
| #endif | ||
| return ls; |
| template<unsigned int I> | ||
| void solveComponentDist(auto& sys, auto& x, auto& exec, auto& factory, auto& stats) | ||
| { | ||
| auto rhs = getComponent<I>(sys.rhs()); | ||
| auto xcopy = getComponent<I>(x); | ||
| auto values = getComponent<I>(sys.matrix().values()); | ||
| auto sparsity = sys.matrix().sparsity(); | ||
| auto mtx = CSRMatrix<scalar, localIdx> {values, sparsity}; | ||
|
|
||
| auto nonLocalValues = getComponent<I>(sys.offDiagonalMatrix().values()); | ||
| auto nonLocalSparsity = sys.offDiagonalMatrix().sparsity(); | ||
| auto nonLocalMtx = COOMatrix<scalar, localIdx> {nonLocalValues, nonLocalSparsity}; | ||
|
|
||
| const CommunicationPattern& commPattern = sys.commPattern(); | ||
| bool forceHostBuffer = false; | ||
| auto comm = gko::experimental::mpi::communicator(commPattern.env.comm(), forceHostBuffer); | ||
| auto gkoMtx = createGkoMtxDist(exec, comm, mtx, nonLocalMtx, commPattern); | ||
| auto solver = factory->generate(gkoMtx); | ||
| stats.entries.push_back(solve_impl_dist(exec, comm, rhs, xcopy, gkoMtx, std::move(solver))); | ||
| setComponent<I>(xcopy, x); | ||
| } |
HendriceH
left a comment
There was a problem hiding this comment.
In the processorBC (not in this PR) we also might need a fence? The proc boundary data is filled and directly communicated afterwards, in processor.hpp:
virtual void correctBoundaryCondition([[maybe_unused]] Field<ValueType>& domainVector) final
{
detail::setProcBoundaryValue(domainVector, mesh_, this->range());
#ifdef NF_WITH_MPI_SUPPORT
const int neighborRank =
static_cast<int>(mesh_.boundaryMesh().neighbourRankForRange(this->range()));
domainVector.boundaryData().communicate(this->range(), neighborRank);
#endif
}
| const auto numNonLocalElements = imap.get_non_local_size(); | ||
|
|
||
| auto non_loc_vals = gko::array<scalar>::const_view( | ||
| exec, static_cast<gko::size_type>(numNonLocalElements), bmtx.values().data() | ||
| ); |
There was a problem hiding this comment.
Review from my Claude:
The distributed matrix silently drops off-diagonal entries. The non-local COO block is sized from imap.get_non_local_size() (count of unique remote columns), but the values/rows/cols are taken as the first numNonLocalElements entries of bmtx (which has nProcFaces entries). This is only correct when #unique remote neighbour cells == #processor faces. The moment a local cell couples to the same remote cell through more than one face (non-convex partition boundaries, baffles, certain scotch/metis cuts), get_non_local_size() < nProcFaces, and the trailing couplings are dropped and the value slice misaligns with the column map — silent wrong matrix, converging solver, wrong answer. Use bmtx.values().size() (= nProcFaces) as the COO nnz and let the index_map handle column deduplication.
And I think this could be true? During my tests, hierarchical behaved much better than scotch.
| SolverStats solve( | ||
| const LinearSystem<scalar, CSRMatrix<scalar, localIdx>, COOMatrix<scalar, localIdx>, Vec3>& | ||
| ls, | ||
| Vector<Vec3>& field | ||
| ) const | ||
| { | ||
| return solverInstance_->solve(ls, field); | ||
| } |
There was a problem hiding this comment.
Also flagged by my session.
this currently aims for non blocking communication. the communicate kicks of the exchange in a peer to peer fashion. however when you want to read a value from boundaryData it'll wait until all communication is ready. Edit: @HendriceH ah i see your point the fence should be before communicate. |
ec4eb50 to
c84ea7f
Compare
Adds distributed (MPI) Ginkgo solver support to NeoN. The LinearSystem template is extended so the RHS value type can differ from the matrix value type, and it now carries a CommunicationPattern under MPI; Solver::solve dispatches to new solveDist overloads when a non-empty pattern is present. A new ginkgoDistributed.cpp builds a Ginkgo distributed matrix/vector from the local CSR plus the off-diagonal COO and runs scalar (and per-component Vec3) distributed solves. A SetReference post-assembly DSL functor is added to pin a reference cell so pure-Neumann systems are nonsingular, and the distributed operator test now validates the distributed solve against a global one.
Changes: