Skip to content

Spectral clustering: support pre-calculated affinity matrix, revamp API#1835

Open
ClaudiaComito wants to merge 45 commits intomainfrom
features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral
Open

Spectral clustering: support pre-calculated affinity matrix, revamp API#1835
ClaudiaComito wants to merge 45 commits intomainfrom
features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral

Conversation

@ClaudiaComito
Copy link
Copy Markdown
Member

@ClaudiaComito ClaudiaComito commented Mar 21, 2025

Due Diligence

  • General:
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • benchmarks: created for new functionality
    • benchmarks: performance improved or maintained
    • documentation updated where needed

Description

This PR introduces some changes to the Spectral clustering class (see below), this work in connection with parallelization efforts of the SCIMES package with @dcolombo

Issue/s resolved: #1740

Changes proposed:

Type of change

Memory requirements

NA

Performance

NA

Does this change modify the behaviour of other functions? If so, which?

no

@github-actions
Copy link
Copy Markdown
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito added this to the 1.6 milestone Mar 31, 2025
@github-project-automation github-project-automation Bot moved this to Todo in Roadmap Mar 31, 2025
@ClaudiaComito ClaudiaComito requested a review from mrfh92 March 31, 2025 09:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2025

Thank you for the PR!

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 4, 2025

Codecov Report

❌ Patch coverage is 86.20690% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.61%. Comparing base (206a523) to head (fb921ea).

Files with missing lines Patch % Lines
heat/cluster/spectral.py 85.71% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1835      +/-   ##
==========================================
+ Coverage   88.76%   91.61%   +2.84%     
==========================================
  Files          89       89              
  Lines       14012    14037      +25     
==========================================
+ Hits        12438    12860     +422     
+ Misses       1574     1177     -397     
Flag Coverage Δ
unit 91.61% <86.20%> (+2.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mrfh92
Copy link
Copy Markdown
Collaborator

mrfh92 commented Apr 7, 2025

see #1824 for the eigh function prototype

Comment thread heat/cluster/spectral.py Outdated
raise NotImplementedError("Not implemented for other splitting-axes")

_, eigenvectors = self._spectral_embedding(x)
_, eigenvectors = self.spectral_embedding(x, self.eigen_solver)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the eigenvalue problem is solved another time for the input data of predict aka the "test-data". However, I think the same spectral embedding of the "training data" (input of fit) should be re-used here.

Comment thread heat/cluster/spectral.py Outdated
raise NotImplementedError("Not implemented for other splitting-axes")
# 2. Embed Dataset into lower-dimensional Eigenvector space
eigenvalues, eigenvectors = self._spectral_embedding(x)
eigenvalues, eigenvectors = self.spectral_embedding(x, self.eigen_solver)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the by far most expensive step of the algorithm, it could make sense to save the result of this as an _-attribute of the clusterer-object for later re-usage, e.g., in predict.

Copy link
Copy Markdown
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API-changes look fine to me 👍

I have only the two comments directly attached to the code regarding re-usability of the (expensive) results of the eigenvalue decomposition. I would suggest to save them somewhere in the object in order to avoid recomputation.

@github-project-automation github-project-automation Bot moved this from Todo to In Progress in Roadmap Apr 7, 2025
@mrfh92
Copy link
Copy Markdown
Collaborator

mrfh92 commented Apr 7, 2025

addition to my review: just saw that scikit-learn's SpectralClustering does not have a predict on its own but only a fit_predict.

…D_based_on_Zolotarev-polar_decomposition' into features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito modified the milestones: 1.6, 1.7.0 Aug 25, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 3, 2025

This pull request is stale because it has been open for 60 days with no activity.

@ClaudiaComito
Copy link
Copy Markdown
Member Author

Update after #1964 is merged

@ClaudiaComito ClaudiaComito modified the milestones: 1.7.0, 1.8.0 Dec 2, 2025
@ClaudiaComito ClaudiaComito modified the milestones: 1.8.0, 1.9.0 Mar 3, 2026
Copy link
Copy Markdown
Member Author

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

various updates after merging #1457

Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Support pre-calculated affinity matrix in ht.cluster.Spectral

2 participants