Spectral clustering: support pre-calculated affinity matrix, revamp API#1835
Spectral clustering: support pre-calculated affinity matrix, revamp API#1835ClaudiaComito wants to merge 45 commits intomainfrom
Conversation
…y_matrix_in_ht_cluster_Spectral
…y_matrix_in_ht_cluster_Spectral
|
Thank you for the PR! |
…y_matrix_in_ht_cluster_Spectral
for more information, see https://pre-commit.ci
|
Thank you for the PR! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1835 +/- ##
==========================================
+ Coverage 88.76% 91.61% +2.84%
==========================================
Files 89 89
Lines 14012 14037 +25
==========================================
+ Hits 12438 12860 +422
+ Misses 1574 1177 -397
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
see #1824 for the eigh function prototype |
| raise NotImplementedError("Not implemented for other splitting-axes") | ||
|
|
||
| _, eigenvectors = self._spectral_embedding(x) | ||
| _, eigenvectors = self.spectral_embedding(x, self.eigen_solver) |
There was a problem hiding this comment.
Here, the eigenvalue problem is solved another time for the input data of predict aka the "test-data". However, I think the same spectral embedding of the "training data" (input of fit) should be re-used here.
| raise NotImplementedError("Not implemented for other splitting-axes") | ||
| # 2. Embed Dataset into lower-dimensional Eigenvector space | ||
| eigenvalues, eigenvectors = self._spectral_embedding(x) | ||
| eigenvalues, eigenvectors = self.spectral_embedding(x, self.eigen_solver) |
There was a problem hiding this comment.
Since this is the by far most expensive step of the algorithm, it could make sense to save the result of this as an _-attribute of the clusterer-object for later re-usage, e.g., in predict.
mrfh92
left a comment
There was a problem hiding this comment.
The API-changes look fine to me 👍
I have only the two comments directly attached to the code regarding re-usability of the (expensive) results of the eigenvalue decomposition. I would suggest to save them somewhere in the object in order to avoid recomputation.
|
addition to my review: just saw that scikit-learn's SpectralClustering does not have a predict on its own but only a fit_predict. |
…y_matrix_in_ht_cluster_Spectral
…y_matrix_in_ht_cluster_Spectral
…D_based_on_Zolotarev-polar_decomposition' into features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral
|
Thank you for the PR! |
|
This pull request is stale because it has been open for 60 days with no activity. |
|
Update after #1964 is merged |
…y_matrix_in_ht_cluster_Spectral
…y_matrix_in_ht_cluster_Spectral
ClaudiaComito
left a comment
There was a problem hiding this comment.
various updates after merging #1457
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Change return type from Tuple to DNDarray for the function.
for more information, see https://pre-commit.ci
Removed outdated notes about eigenvalues and added comments regarding the Laplacian matrix properties.
Due Diligence
Description
This PR introduces some changes to the Spectral clustering class (see below), this work in connection with parallelization efforts of the SCIMES package with @dcolombo
Issue/s resolved: #1740
Changes proposed:
SpectralClustering(formerlySpectral) to match API of the current scikit-learn versionmetricparameter toaffinityto match sklearneigen_solver, with optionslanczosandzolotarev(Default:zolotarev) (e.g. Implement polar decomposition #1697 and Features/1723 Symmetric Eigenvalue Decomposition (eigh) and full SVD (svd) based on Zolotarev Polar Decomposition #1824 )n_components,random_stateto match scikit-learn API__spectral_embeddingis now just the embedding, like for scikit-learn (instead of eigenvalues and eigenvectors);DNDarray.diagonal()for convenience and to match numpy APIType of change
Memory requirements
NA
Performance
NA
Does this change modify the behaviour of other functions? If so, which?
no