[Cleanup] Combine Batched and Regular KMeans Impl#2015
Open
tarang-jain wants to merge 16 commits intorapidsai:mainfrom
Open
[Cleanup] Combine Batched and Regular KMeans Impl#2015tarang-jain wants to merge 16 commits intorapidsai:mainfrom
tarang-jain wants to merge 16 commits intorapidsai:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…nto combine-batch
tarang-jain
commented
Apr 14, 2026
| int batch_centroids; | ||
|
|
||
| /** Check inertia during iterations for early convergence. */ | ||
| /** Deprecated, ignored. Kept for ABI compatibility. */ |
Contributor
Author
There was a problem hiding this comment.
We probably shouldn't be modifying the wording here. And we probably want to use a different struct that breaks ABI, suffixed by the version (26.06).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Combine batched and regular k-means implementations
fitinto a singlekmeans_fittemplate that works with both host and device mdspans viabatch_load_iteratorinit_centroids— both now useraft::matrix::sample_rowswhich handles host/device transparentlyinertia_checkparameter — inertia-based convergence checking now always runs. Zero clustering cost (perfect fit) logs a warning instead of asserting. This is needed because spectral clustering can cause all points to converge on the cluster centroids itself.init_sizeparameter to control how many samples are drawn for KMeansPlusPlus initialization. Defaults ton_samplesfor device data,min(3 * n_clusters, n_samples)for host dataraft::copywithstd::swapof buffer pointersprocess_batchno longer computes norms internallycudaPointerGetAttributescall withraft::memory_type_from_pointercompute_weight_scaleto use raft handle and mdspan-basedraft::copyminClusterAndDistanceComputevia a new optionalprecomputed_centroid_normsparameter, avoiding redundant recomputation across batches