Skip to content

LazyReferenceMapper cache leaking memory #1983

@honzaflash

Description

@honzaflash

Preface: There could be reasons things are the way they currently are. I am not very familiar with the library.

That being said, I have observed a memory leak when periodically opening a parquet dataset via xarray+kerchunk
and I have traced it to the lru_cache usage on instance methods of LazyReferenceMapper.
Specifically, LazyReferenceMapper.listdir and LazyReferenceMapper._key_to_record.
Turns out lru_cache on methods is a common pitfall and it tends to result in memory leaks.

LazyReferenceMapper.open_refs is also cached BUT it is defined as a closure per-instance (in .setup) so no memory leakage there.

Assuming this is a bug, I would suggest giving listdir and _key_to_record a similar treatment as open_refs
or even better: applying the decorator in the constructor manually as per this stack overflow answer.

I am happy to work on a PR if desired.

(In case the reader is not aware of the peril of lru_cache on methods:

If a method is cached, the self instance argument is included in the cache

- https://docs.python.org/3/library/functools.html#functools.lru_cache

The two principal tools for caching methods are functools.cached_property() and functools.lru_cache(). The former stores results at the instance level and the latter at the class level.

- https://docs.python.org/3/faq/programming.html#faq-cache-method-calls

Therefore, the instance reference (self) is stored at the class level)

I can also provide an example to reproduce but it seems like this is a case of an issue that is well enough documented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions