Add RowScatterer class by pratikvn · Pull Request #1995 · ginkgo-project/ginkgo

pratikvn · 2026-03-25T01:18:24Z

This PR adds a RowScatterer class, similar to the RowGatherer class. It can be created from the RowGatherer, as the communication is essentially the inverse.

It also adds a weighted scattering, which scatters diag(weights)* local values. To overlap communication and computation, it is split into two parts, apply_async and a wait_and_accumulate, and computation can be done after apply_async returning a mpi::request. Then wait_and_accumulate does a req.wait() internally before doing the accumulation and the receivers.

It also adds a scatter_add operation + kernel for Dense. I kept it in this PR, but can also extract that into a separate PR.

This functionality will be useful when building DD preconditioners (overlapping Schwarz, BDDC etc)

MarcelKoch

The overall structure is good. I have only minor things to add. One thing I definetly would like to see is the removal of the weighting stuff.

MarcelKoch · 2026-03-25T08:10:46Z

include/ginkgo/core/distributed/row_scatterer.hpp

+ * This is the transpose of the RowGatherer operation:
+ * - RowGatherer does:  y_local = R * x_distributed  (gather remote values)
+ * - RowScatterer does: x_distributed += R^T * y_local (scatter and accumulate)


nit: It probably makes sense to have the reversed of this in the RowGatherer doc.

MarcelKoch · 2026-03-25T08:17:41Z

include/ginkgo/core/distributed/row_scatterer.hpp

+     *
+     * @return  a unique_ptr to the created distributed::RowScatterer
+     */
+    static std::unique_ptr<RowScatterer> create_from_gatherer(


Probably also want to have the inverse of this in RowGatherer.

MarcelKoch · 2026-03-25T08:49:51Z

include/ginkgo/core/distributed/row_scatterer.hpp

+    [[nodiscard]] mpi::request apply_async(
+        ptr_param<const LinOp> weights,
+        ptr_param<const LinOp> local_values) const;


Since the weighting is just an extra step before sending the data, I would prefer not to have this overload and instead just require on the user side that they scale their input.

MarcelKoch · 2026-03-25T08:53:50Z

core/distributed/row_scatterer.cpp

+                auto send_size_in_bytes =
+                    sizeof(ValueType) * send_size[0] * send_size[1];
+                if (!send_workspace_.get_executor() ||
+                    !mpi_exec->memory_accessible(
+                        send_workspace_.get_executor())) {
+                    send_workspace_.set_executor(mpi_exec);
+                }
+                if (send_size_in_bytes > send_workspace_.get_size()) {
+                    send_workspace_.resize_and_reset(send_size_in_bytes);
+                }
+                auto send_buffer = matrix::Dense<ValueType>::create(
+                    mpi_exec, send_size,
+                    make_array_view(mpi_exec, send_size[0] * send_size[1],
+                                    reinterpret_cast<ValueType*>(
+                                        send_workspace_.get_data())),
+                    send_size[1]);


I think this might be replaceable with using GenericDenseCache instead of the array as workspace. Same for RowGatherer.

Why is the send workspace necessary in the first place? Can't we send from local_values directly (assuming GPU aware mpi)?

MarcelKoch · 2026-03-25T08:59:42Z

core/distributed/row_scatterer.cpp

+                dim<2> recv_size(coll_comm_->get_recv_size(), ncols);
+                auto recv_size_in_bytes =
+                    sizeof(ValueType) * recv_size[0] * recv_size[1];
+                if (!recv_workspace_.get_executor() ||
+                    !mpi_exec->memory_accessible(
+                        recv_workspace_.get_executor())) {
+                    recv_workspace_.set_executor(mpi_exec);
+                }
+                if (recv_size_in_bytes > recv_workspace_.get_size()) {
+                    recv_workspace_.resize_and_reset(recv_size_in_bytes);
+                }


Same regarding the GenericDenseCache

MarcelKoch · 2026-03-25T15:55:47Z

core/distributed/row_scatterer.cpp

+                // Synchronize before MPI (GPU stream safety)
+                std::shared_ptr<const gko::detail::Event> ev = nullptr;
+                lv_local->get_executor()->run(event::make_record_event(ev));
+                ev->synchronize();


This seems unnecessary. I would not add the event until we split this up as we do for the row gatherer (although I think splitting the row scatter up doesn't make sense).

Suggested change

// Synchronize before MPI (GPU stream safety)

std::shared_ptr<const gko::detail::Event> ev = nullptr;

lv_local->get_executor()->run(event::make_record_event(ev));

ev->synchronize();

exec->synchronize();

MarcelKoch · 2026-03-26T16:14:53Z

include/ginkgo/core/matrix/dense.hpp

+     *                Must have the same number of columns as this matrix
+     *                and `scatter_indices->get_size()` rows.
+     */
+    void scatter_add(const array<int32>* scatter_indices, const Dense* source);


should this also be row_scatter_add?

Probably not. In the long term row_gather should probably be changed instead.

MarcelKoch · 2026-03-26T16:25:55Z

include/ginkgo/core/distributed/row_scatterer.hpp

+     */
+    static std::unique_ptr<RowScatterer> create_from_gatherer(
+        std::shared_ptr<const Executor> exec,
+        const RowGatherer<LocalIndexType>& gatherer);


Suggested change

const RowGatherer<LocalIndexType>& gatherer);

ptr_param<const RowGatherer> gatherer);

Or bare pointer, but it shouldn't be a reference.

MarcelKoch · 2026-03-26T16:35:17Z

core/test/mpi/distributed/row_scatterer.cpp

+}
+
+
+TYPED_TEST(RowScatterer, CanOverlapWorkWithScatter)


This test and the ones below are also in test/.... I would suggest removing the ones here.

MarcelKoch · 2026-03-26T16:36:45Z

omp/matrix/dense_kernels.cpp

+#pragma omp critical
+            tgt_vals[target_row * tgt_stride + j] += val;


Maybe use the atomic_add instead?

pratikvn self-assigned this Mar 25, 2026

pratikvn added is:new-feature A request or implementation of a feature that does not exist yet. 1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. type:distributed-functionality labels Mar 25, 2026

pratikvn force-pushed the feat/row-scatterer branch from 73b715e to 32e7d13 Compare March 25, 2026 02:13

pratikvn requested review from MarcelKoch and yhmtsai March 25, 2026 08:38

MarcelKoch requested changes Mar 26, 2026

View reviewed changes

pratikvn added 7 commits March 31, 2026 14:09

add scatter_add for Dense

4fb8cd9

Add RowScatterer class + test

e506da5

add device side tests

bcda7e9

fix complex issue

b92b69e

add dpcpp kernel

bfe0154

review updates

1b5dd90

use GenericDenseCache

368863c

pratikvn force-pushed the feat/row-scatterer branch from 51d146e to 368863c Compare March 31, 2026 12:27

pratikvn changed the base branch from develop to row-gatherer-with-cache March 31, 2026 12:28

remove event for RowScatterer

4bafc27

	const RowGatherer<LocalIndexType>& gatherer);
	ptr_param<const RowGatherer> gatherer);

		#pragma omp critical
		tgt_vals[target_row * tgt_stride + j] += val;

		}


		TYPED_TEST(RowScatterer, CanOverlapWorkWithScatter)

Conversation

pratikvn commented Mar 25, 2026

Uh oh!

MarcelKoch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants