Skip to content

Slow performance #39

@bkuczenski

Description

@bkuczenski

Hi- I've been using pylzma to handle large(ish) 7z files ranging from 50MB-1.0GB compressed. I am trying to access individual files from the archive, one at a time, and I noticed that performance can be highly variable, and is very slow in comparison to ZipFile.

Below I compared performance for two archives containing the same files (I created the ZIP by extracting the 7z file and recompressing it with zip):

http://nbviewer.jupyter.org/github/bkuczenski/lca-tools/blob/master/doc/7z%20profiling.ipynb

On the one hand, the ZIP file is almost 6x as large as the 7Z file; on the other hand, 7z access seems 10x-100x slower.

My question: is there a way for me to improve the performance of py7zlib? is there a better way to use the archive to reference single files? Or is there a technical limitation that prevents this?

n.b. the performance is no different if I keep the archive open between successive retrievals. It is consistent for the same file over multiple trials (some are fast, others are slow- in this case all the files are about the same size so that's not the issue).

Thanks for any feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions