-
Notifications
You must be signed in to change notification settings - Fork 427
Description
Preface: There could be reasons things are the way they currently are. I am not very familiar with the library.
That being said, I have observed a memory leak when periodically opening a parquet dataset via xarray+kerchunk
and I have traced it to the lru_cache usage on instance methods of LazyReferenceMapper.
Specifically, LazyReferenceMapper.listdir and LazyReferenceMapper._key_to_record.
Turns out lru_cache on methods is a common pitfall and it tends to result in memory leaks.
LazyReferenceMapper.open_refs is also cached BUT it is defined as a closure per-instance (in .setup) so no memory leakage there.
Assuming this is a bug, I would suggest giving listdir and _key_to_record a similar treatment as open_refs
or even better: applying the decorator in the constructor manually as per this stack overflow answer.
I am happy to work on a PR if desired.
(In case the reader is not aware of the peril of lru_cache on methods:
If a method is cached, the self instance argument is included in the cache
- https://docs.python.org/3/library/functools.html#functools.lru_cache
The two principal tools for caching methods are functools.cached_property() and functools.lru_cache(). The former stores results at the instance level and the latter at the class level.
- https://docs.python.org/3/faq/programming.html#faq-cache-method-calls
Therefore, the instance reference (self) is stored at the class level)
I can also provide an example to reproduce but it seems like this is a case of an issue that is well enough documented.