[Cleanup] Combine Batched and Regular KMeans Impl by tarang-jain · Pull Request #2015 · rapidsai/cuvs

tarang-jain · 2026-04-10T22:59:21Z

Combine batched and regular k-means implementations

Unified the batched (host-data) and regular (device-data) k-means fit into a single kmeans_fit template that works with both host and device mdspans via batch_load_iterator
Unified the device and host initialization paths in init_centroids — both now use raft::matrix::sample_rows which handles host/device transparently
Removed the inertia_check parameter — inertia-based convergence checking now always runs. Zero clustering cost (perfect fit) logs a warning instead of asserting. This is needed because spectral clustering can cause all points to converge on the cluster centroids itself.
Added init_size parameter to control how many samples are drawn for KMeansPlusPlus initialization. Defaults to n_samples for device data, min(3 * n_clusters, n_samples) for host data
Replaced per-iteration centroid raft::copy with std::swap of buffer pointers
For streaming fit, precompute data norms once and cache them: host norms cached to a host buffer on the first iteration and copied back for subsequent iterations. process_batch no longer computes norms internally
Replaced raw cudaPointerGetAttributes call with raft::memory_type_from_pointer
Updated compute_weight_scale to use raft handle and mdspan-based raft::copy
Precompute centroid norms once per Lloyd iteration and pass to minClusterAndDistanceCompute via a new optional precomputed_centroid_norms parameter, avoiding redundant recomputation across batches

copy-pr-bot · 2026-04-10T22:59:25Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…nto combine-batch

tarang-jain · 2026-04-14T16:23:25Z

c/include/cuvs/cluster/kmeans.h

  int batch_centroids;

-  /** Check inertia during iterations for early convergence. */
+  /** Deprecated, ignored. Kept for ABI compatibility. */


We probably shouldn't be modifying the wording here. And we probably want to use a different struct that breaks ABI, suffixed by the version (26.06).

combine impls

66d7fd3

github-project-automation bot added this to Unstructured Data Processing Apr 10, 2026

tarang-jain self-assigned this Apr 10, 2026

tarang-jain added improvement Improves an existing functionality non-breaking Introduces a non-breaking change cpp labels Apr 10, 2026

tarang-jain and others added 11 commits April 13, 2026 11:50

rm inertia_check

0a09e6f

change to warning

99a5730

style

a077406

add init_size param

d659875

Merge branch 'main' into combine-batch

ec2e8b7

docs

03a6473

Merge branch 'combine-batch' of https://github.com/tarang-jain/cuvs i…

42a8d9d

…nto combine-batch

rm direct cuda api calls

86af2fa

std::swap instead of raft::copy

d4e4e2c

cache batch norms

0819af5

centroid norms can also be cached per iteration

e0f079c

tarang-jain marked this pull request as ready for review April 14, 2026 01:10

tarang-jain requested review from a team as code owners April 14, 2026 01:10

tarang-jain and others added 4 commits April 13, 2026 18:11

mg n_iter

c2f7390

pre-commit

b9c3102

do not break c abi

e3956c1

Merge branch 'main' into combine-batch

986d78a

tarang-jain commented Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cleanup] Combine Batched and Regular KMeans Impl#2015

[Cleanup] Combine Batched and Regular KMeans Impl#2015
tarang-jain wants to merge 16 commits intorapidsai:mainfrom
tarang-jain:combine-batch

tarang-jain commented Apr 10, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

tarang-jain Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tarang-jain commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Combine batched and regular k-means implementations

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

tarang-jain Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tarang-jain commented Apr 10, 2026 •

edited

Loading