Enable threaded data initialization for CPU.#11974
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors gradient index construction on CPU to consistently use Context-based threading and introduces a new block-based parallel helper for initializing reference-counted buffers.
Changes:
- Update
GHistIndexMatrixconstructors and internal methods to takeContextand use newMakeFixedVecWithMalloc(ctx, ...)for multi-threaded, malloc-backed buffers on CPU (and hook up corresponding call sites and tests). - Refactor gradient-index page sources (
SparsePageDMatrix::GetGradientIndexandGradientIndexPageSource) to accept aContextand constructGHistIndexMatrixwith it, including for external-memory workflows. - Add
ParallelForBlockand aContext-awareMakeFixedVecWithMallocinref_resource_view, using block-parallel initialization, and wire GPU-side code to the newResizeIndex(ctx, ...)API.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/cpp/tree/test_quantile_hist.cc |
Adjusts quantile histogram partitioner tests to use the new GHistIndexMatrix constructor taking Context const* instead of a raw thread count. |
tests/cpp/tree/hist/test_histogram.cc |
Updates external-memory histogram test to construct GHistIndexMatrix with Context const*, matching the new API. |
tests/cpp/data/test_sparse_page_dmatrix.cc |
Updates gradient index external-memory tests to use the Context-aware GHistIndexMatrix constructor. |
src/data/sparse_page_dmatrix.cc |
Refactors GetGradientIndex to construct GradientIndexPageSource with a Context const*, so downstream gradient-index construction can use ctx->Threads(). |
src/data/gradient_index_page_source.h |
Changes GradientIndexPageSource to accept Context const* and pass ctx->Threads() into PageSourceIncMixIn, preparing for Context-driven CPU threading. |
src/data/gradient_index_page_source.cc |
In GradientIndexPageSource::Fetch, constructs a local Context with nthreads_ and uses it to build GHistIndexMatrix with the new (Context const*, SparsePage const&, ...) constructor. |
src/data/gradient_index.h |
Refactors GHistIndexMatrix API: PushBatch and PushBatchImpl now take Context const*, the external-memory constructor is updated to take Context const*, and ResizeIndex now also receives Context const*. |
src/data/gradient_index.cu |
Updates the GPU-side constructor to call ResizeIndex(ctx, ...), aligning with the new CPU/GPU-agnostic signature. |
src/data/gradient_index.cc |
Uses MakeFixedVecWithMalloc(ctx, ...) for hit_count and row_ptr, switches PushBatch and PushAdapterBatch to the Context-aware versions, and updates ResizeIndex to allocate using the new MakeFixedVecWithMalloc(ctx, ...). |
src/data/gradient_index_page_source.h |
(Same as above file) Ensures CPU gradient-index page source derives its thread count from Context instead of a bare nthreads parameter. |
src/common/threading_utils.h |
Adds ParallelForBlock intended to use n_threads as block count, but the current implementation can construct invalid Range1d and out-of-bounds ranges when size <= n_threads and ignores the clamped end variable. |
src/common/ref_resource_view.h |
Adds a Context-aware MakeFixedVecWithMalloc that uses ParallelForBlock to parallelize initialization of RefResourceView storage, and wires in the necessary includes for Context and threading utilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@razdoburdin Could you please help review the PR when you are available? I have listed you as the author. Please let me know if you prefer otherwise. |
yes, I will take a look |
| max_numeric_bins_per_feat{max_bins_per_feat}, | ||
| base_rowid{batch.base_rowid}, | ||
| isDense_{is_dense} { | ||
| CHECK_GE(n_threads, 1); |
There was a problem hiding this comment.
Is the n_threads >= 1 checked elseware ?
There was a problem hiding this comment.
As long as the number of threads comes from the Context class, it should go through this check:
xgboost/src/common/threading_utils.cc
Line 120 in f5f1e65
src/common/threading_utils.h
Outdated
| std::size_t blk_size = size / n_threads + (size % n_threads > 0); | ||
| ParallelFor(n_threads, n_threads, [&](auto tid) { | ||
| auto blk_beg = tid * blk_size; | ||
| if (blk_beg >= size) { |
There was a problem hiding this comment.
This check looks redundant. If (size == 0), blk_size == blk_beg == end == 0, so the (end == blk_beg) check shall work in this case.
There was a problem hiding this comment.
It's not for size==0. Say size==1, but n_threads==8, then the block_size=1, the second block (thread) would have block_begin = 1 * 1 = 1, end would be end = 2 * 1 = 2, and block_begin != end.
There was a problem hiding this comment.
Ah, I see.
May be just replace if (end == blk_beg) by if (end <= blk_beg)?
There was a problem hiding this comment.
Thank you for the suggestion, done.
4f6f74b to
9b02b0f
Compare
|
Haven't done this very often. I made sure the only commit here has @razdoburdin as the author, I hope squashing and merging the commit with github will retain the information. |
|
No.. the author is switched. |
Extracted from #11390 , modified to use the context instead. In addition, a new
ParallelForBlockis created.