Skip to content

Implementation of local outlier factor#1825

Open
Hakdag97 wants to merge 282 commits intomainfrom
features/1758-Implementation_of_local_outlier_factor
Open

Implementation of local outlier factor#1825
Hakdag97 wants to merge 282 commits intomainfrom
features/1758-Implementation_of_local_outlier_factor

Conversation

@Hakdag97
Copy link
Copy Markdown
Collaborator

Description

Added an implementation of the local outlier factor (lof) used for outlier classification. The bottleneck to reduce memory consumption lies in the pairwise distance matrix. Memory consumption can be significantly reduced by taking only the n smallest elements in the distance matrix into account (only these are needed to compute the lof). Thus, a new distance matrix, called cdist_small, was established that combines the functionality of cdist and topk, but has the advantage that the smallest n distances are dynamically choosen during computation without evaluating the whole distance matrix cdist at once.

Issue/s resolved: #1758

Changes proposed:

  • New implementation of lof
  • New implementation of distance matrix cdist_small with efficient memory consumption

Type of change

  • New feature

Does this change modify the behaviour of other functions? If so, which?

no

ClaudiaComito and others added 30 commits December 12, 2022 12:09
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for the PR!

@github-actions
Copy link
Copy Markdown
Contributor

Thank you for the PR!

mrfh92
mrfh92 previously approved these changes Dec 19, 2025
Copy link
Copy Markdown
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments so far are mostly on the docs
I now approve to let the CI matrix run

Comment thread heat/decomposition/tests/test_dmd.py Outdated
Comment thread heat/spatial/distance.py Outdated
Comment thread heat/spatial/distance.py Outdated
Define if the distances on each process are calculated iteratively. For example, if ``chunks=2``, the
each processes will first compute one half of the distance matrix and then the second half.

Returns
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we only have Parameters, Attributes, Notes and References, but no Raises or Returns section

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have deleted them

Comment thread heat/classification/localoutlierfactor.py Outdated
idx : DNDarray
The indices used for advanced indexing.

Returns
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we dont use Returns as section
(also at the other functions)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapted this in all functions

Comment thread heat/classification/tests/test_lof.py Outdated
self.assertTrue(condition)

# test with memory-efficient implementation
lof = LocalOutlierFactor(n_neighbors=n_neighbors, fully_distributed=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could move the memory-efficient lof to a different test such that, in case it fails, its clear which configuration failed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@github-project-automation github-project-automation Bot moved this from In Progress to Merge queue in Roadmap Dec 19, 2025
Copy link
Copy Markdown
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • lines 267-282 in _chunk_wise_topk (i.e. the whole "else-block" for chunks!=1) seem not to get tested.
  • lines 266-277 in _map_idx_to_proc are not covered.

If these parts of the code are somehow core ideas of the changes (are they?), they should be covered to ensure that they work.

@github-actions
Copy link
Copy Markdown
Contributor

Thank you for the PR!

@github-actions
Copy link
Copy Markdown
Contributor

Thank you for the PR!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 5, 2026

Thank you for the PR!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 5, 2026

Thank you for the PR!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 5, 2026

Thank you for the PR!

@JuanPedroGHM JuanPedroGHM modified the milestones: 1.7.0, 1.8.0 Jan 19, 2026
@ClaudiaComito ClaudiaComito modified the milestones: 1.8.0, 1.9.0 Mar 3, 2026
@github-actions github-actions Bot removed the stale label Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Merge queue

Development

Successfully merging this pull request may close these issues.

Implementation of local outlier factor

4 participants