Use log-scaled quantile sketch budgets and rank-based accuracy checks#12129
Open
RAMitchell wants to merge 10 commits intodmlc:masterfrom
Open
Use log-scaled quantile sketch budgets and rank-based accuracy checks#12129RAMitchell wants to merge 10 commits intodmlc:masterfrom
RAMitchell wants to merge 10 commits intodmlc:masterfrom
Conversation
…uantile-logn-budget # Conflicts: # tests/cpp/common/test_hist_util.cu
…uantile-logn-budget # Conflicts: # tests/cpp/common/test_hist_util.cc # tests/cpp/common/test_hist_util.cu # tests/cpp/common/test_hist_util.h
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates quantile sketch budgeting to follow the same O(log n / eps) summary-size behavior as the single-machine sketch (including distributed CPU merge/prune), and refreshes test coverage to validate the rank-error contract instead of comparing cut values directly.
Changes:
- Track per-feature represented element counts in
WQuantileSketch, serialize them in the distributed CPU sketch allreduce payload, and recompute merge/prune budgets from those counts. - Route multiple CPU/GPU sketch sizing paths through shared budget helpers (
SketchSummaryBudget), including the GPU intermediate prune target. - Replace/extend C++ and Python tests to use rank-based cut validation (plus exact-cut coverage when the budget can retain all unique values).
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/common/quantile.h |
Adds element-count tracking to WQuantileSketch and an exact-summary fast path for sorted weighted input. |
src/common/quantile.cc |
Extends distributed sketch payload to include element counts and uses SketchSummaryBudget during merge/prune and sorted ingestion. |
src/common/quantile.cu |
Uses SketchSummaryBudget for GPU intermediate pruning instead of a local helper. |
src/common/quantile.cuh |
Removes IntermediateNumCuts() helper (now replaced by shared budget helper usage). |
src/common/hist_util.cu |
Switches sample-cut sizing to SketchSummaryBudget. |
tests/cpp/common/test_hist_util.h |
Tightens/aligns rank-error thresholds, updates exact-value validation, and adds a weight-aware validation wrapper. |
tests/cpp/common/test_hist_util.cu |
Uses the new weight-aware validation wrapper for GPU sketch tests. |
tests/cpp/common/test_hist_util.cc |
Adjusts rank-error validation for weighted CPU cases and adds a sorted weighted exact-cut regression test. |
tests/cpp/common/test_quantile.cc |
Reworks distributed CPU quantile tests to validate rank error (row/column split + sparse count skew). |
tests/cpp/common/test_quantile.cu |
Aligns distributed GPU weighted tolerance usage with the shared weighted threshold. |
python-package/xgboost/testing/quantile_dmatrix.py |
Adds shared Python rank-error validation helpers and uses them in reference-cut checks. |
python-package/xgboost/testing/updater.py |
Adds rank-error assertions for get_quantile_cut device tests (numerical case). |
tests/python/test_data_iterator.py |
Replaces local rank-error helper with shared Python helper. |
tests/python/test_quantile_dmatrix.py |
Adds rank-error assertions for iterator-vs-array quantile cuts in training test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
trivialfis
reviewed
Apr 2, 2026
Comment on lines
+14
to
+15
| MAX_NORMALIZED_RANK_ERROR = 2.0 | ||
| MAX_WEIGHTED_NORMALIZED_RANK_ERROR = 14.0 |
Member
There was a problem hiding this comment.
Could you please provide some brief comments on utilities here?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR aligns quantile sketch sizing more closely with the single-machine algorithm and updates the test suite to validate rank-error guarantees instead of cut-value deltas.
The main functional change is on the CPU distributed sketch path: we now track the number of represented elements per feature, serialize those counts through the distributed sketch payload, and recompute
SketchSummaryBudget(...)after merge/prune using the summed per-feature counts. This changes the distributed CPU merge budget from a fixedO(1 / eps)cap to the sameO(log n / eps)budget shape used by the underlying sketch.In addition, this PR cleans up related sizing paths and strengthens quantile accuracy coverage across C++ and Python.
What This Changes
WQuantileSketchSketchSummaryBudget(...)Test Changes
QuantileDMatrix/ quantile-cut testsTesting
Ran locally:
./build-cpu/testxgboost --gtest_filter='Quantile.*:HistUtil.*'./build-cuda-local/testxgboost --gtest_filter='HistUtil.*:GPUQuantile.*'pytest tests/python/test_data_iterator.py tests/python/test_quantile_dmatrix.py tests/python/test_updaters.py -k "test_data_iterator or test_training or test_ref_quantile_cut or test_get_quantile_cut"Notes
This PR is no longer limited to CPU distributed merge/prune only. It now includes:
log n / epsbudget plumbing