Conversation
for more information, see https://pre-commit.ci
|
Thank you for the PR! |
|
Thank you for the PR! |
mrfh92
left a comment
There was a problem hiding this comment.
The comments so far are mostly on the docs
I now approve to let the CI matrix run
| Define if the distances on each process are calculated iteratively. For example, if ``chunks=2``, the | ||
| each processes will first compute one half of the distance matrix and then the second half. | ||
|
|
||
| Returns |
There was a problem hiding this comment.
Usually, we only have Parameters, Attributes, Notes and References, but no Raises or Returns section
There was a problem hiding this comment.
Good point. I have deleted them
| idx : DNDarray | ||
| The indices used for advanced indexing. | ||
|
|
||
| Returns |
There was a problem hiding this comment.
Usually, we dont use Returns as section
(also at the other functions)
There was a problem hiding this comment.
Adapted this in all functions
| self.assertTrue(condition) | ||
|
|
||
| # test with memory-efficient implementation | ||
| lof = LocalOutlierFactor(n_neighbors=n_neighbors, fully_distributed=True) |
There was a problem hiding this comment.
Maybe we could move the memory-efficient lof to a different test such that, in case it fails, its clear which configuration failed
There was a problem hiding this comment.
- lines 267-282 in _chunk_wise_topk (i.e. the whole "else-block" for chunks!=1) seem not to get tested.
- lines 266-277 in _map_idx_to_proc are not covered.
If these parts of the code are somehow core ideas of the changes (are they?), they should be covered to ensure that they work.
…Implementation_of_local_outlier_factor
…processes in cdist_small
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
Description
Added an implementation of the local outlier factor (lof) used for outlier classification. The bottleneck to reduce memory consumption lies in the pairwise distance matrix. Memory consumption can be significantly reduced by taking only the n smallest elements in the distance matrix into account (only these are needed to compute the lof). Thus, a new distance matrix, called cdist_small, was established that combines the functionality of cdist and topk, but has the advantage that the smallest n distances are dynamically choosen during computation without evaluating the whole distance matrix cdist at once.
Issue/s resolved: #1758
Changes proposed:
Type of change
Does this change modify the behaviour of other functions? If so, which?
no