Skip to content

Add RowScatterer class#1995

Open
pratikvn wants to merge 8 commits intorow-gatherer-with-cachefrom
feat/row-scatterer
Open

Add RowScatterer class#1995
pratikvn wants to merge 8 commits intorow-gatherer-with-cachefrom
feat/row-scatterer

Conversation

@pratikvn
Copy link
Copy Markdown
Member

This PR adds a RowScatterer class, similar to the RowGatherer class. It can be created from the RowGatherer, as the communication is essentially the inverse.

It also adds a weighted scattering, which scatters diag(weights)* local values. To overlap communication and computation, it is split into two parts, apply_async and a wait_and_accumulate, and computation can be done after apply_async returning a mpi::request. Then wait_and_accumulate does a req.wait() internally before doing the accumulation and the receivers.

It also adds a scatter_add operation + kernel for Dense. I kept it in this PR, but can also extract that into a separate PR.

This functionality will be useful when building DD preconditioners (overlapping Schwarz, BDDC etc)

@pratikvn pratikvn self-assigned this Mar 25, 2026
@pratikvn pratikvn added is:new-feature A request or implementation of a feature that does not exist yet. 1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. type:distributed-functionality labels Mar 25, 2026
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:reference This is related to the reference module. type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. labels Mar 25, 2026
@pratikvn pratikvn force-pushed the feat/row-scatterer branch from 73b715e to 32e7d13 Compare March 25, 2026 02:13
@pratikvn pratikvn requested review from MarcelKoch and yhmtsai March 25, 2026 08:38
Copy link
Copy Markdown
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure is good. I have only minor things to add. One thing I definetly would like to see is the removal of the weighting stuff.

Comment on lines +34 to +36
* This is the transpose of the RowGatherer operation:
* - RowGatherer does: y_local = R * x_distributed (gather remote values)
* - RowScatterer does: x_distributed += R^T * y_local (scatter and accumulate)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It probably makes sense to have the reversed of this in the RowGatherer doc.

*
* @return a unique_ptr to the created distributed::RowScatterer
*/
static std::unique_ptr<RowScatterer> create_from_gatherer(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also want to have the inverse of this in RowGatherer.

Comment on lines +99 to +101
[[nodiscard]] mpi::request apply_async(
ptr_param<const LinOp> weights,
ptr_param<const LinOp> local_values) const;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the weighting is just an extra step before sending the data, I would prefer not to have this overload and instead just require on the user side that they scale their input.

Comment on lines +59 to +74
auto send_size_in_bytes =
sizeof(ValueType) * send_size[0] * send_size[1];
if (!send_workspace_.get_executor() ||
!mpi_exec->memory_accessible(
send_workspace_.get_executor())) {
send_workspace_.set_executor(mpi_exec);
}
if (send_size_in_bytes > send_workspace_.get_size()) {
send_workspace_.resize_and_reset(send_size_in_bytes);
}
auto send_buffer = matrix::Dense<ValueType>::create(
mpi_exec, send_size,
make_array_view(mpi_exec, send_size[0] * send_size[1],
reinterpret_cast<ValueType*>(
send_workspace_.get_data())),
send_size[1]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be replaceable with using GenericDenseCache instead of the array as workspace. Same for RowGatherer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the send workspace necessary in the first place? Can't we send from local_values directly (assuming GPU aware mpi)?

Comment on lines +78 to +88
dim<2> recv_size(coll_comm_->get_recv_size(), ncols);
auto recv_size_in_bytes =
sizeof(ValueType) * recv_size[0] * recv_size[1];
if (!recv_workspace_.get_executor() ||
!mpi_exec->memory_accessible(
recv_workspace_.get_executor())) {
recv_workspace_.set_executor(mpi_exec);
}
if (recv_size_in_bytes > recv_workspace_.get_size()) {
recv_workspace_.resize_and_reset(recv_size_in_bytes);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same regarding the GenericDenseCache

Comment on lines +90 to +93
// Synchronize before MPI (GPU stream safety)
std::shared_ptr<const gko::detail::Event> ev = nullptr;
lv_local->get_executor()->run(event::make_record_event(ev));
ev->synchronize();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary. I would not add the event until we split this up as we do for the row gatherer (although I think splitting the row scatter up doesn't make sense).

Suggested change
// Synchronize before MPI (GPU stream safety)
std::shared_ptr<const gko::detail::Event> ev = nullptr;
lv_local->get_executor()->run(event::make_record_event(ev));
ev->synchronize();
exec->synchronize();

* Must have the same number of columns as this matrix
* and `scatter_indices->get_size()` rows.
*/
void scatter_add(const array<int32>* scatter_indices, const Dense* source);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also be row_scatter_add?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. In the long term row_gather should probably be changed instead.

*/
static std::unique_ptr<RowScatterer> create_from_gatherer(
std::shared_ptr<const Executor> exec,
const RowGatherer<LocalIndexType>& gatherer);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const RowGatherer<LocalIndexType>& gatherer);
ptr_param<const RowGatherer> gatherer);

Or bare pointer, but it shouldn't be a reference.

}


TYPED_TEST(RowScatterer, CanOverlapWorkWithScatter)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and the ones below are also in test/.... I would suggest removing the ones here.

Comment on lines +487 to +488
#pragma omp critical
tgt_vals[target_row * tgt_stride + j] += val;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the atomic_add instead?

@pratikvn pratikvn force-pushed the feat/row-scatterer branch from 51d146e to 368863c Compare March 31, 2026 12:27
@pratikvn pratikvn changed the base branch from develop to row-gatherer-with-cache March 31, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-for-review This PR is ready for review is:new-feature A request or implementation of a feature that does not exist yet. mod:all This touches all Ginkgo modules. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. mod:reference This is related to the reference module. reg:build This is related to the build system. reg:testing This is related to testing. type:distributed-functionality type:matrix-format This is related to the Matrix formats

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants