Skip to content

Conversation

@inequation
Copy link

I'd have filled out the PR template, but the link in CONTRIBUTING.md points at a non-existent file.

This is a side effect of some load times optimisation work I've done for OpenMoHAA.

Uncompressed streams cannot really be sought because each symbol depends on the symbols decoded before it. The existing implementation does a bunch of acrobatics to emulate seeking via reopening the stream, and reading and discarding all the data until the desired offset is reached.

The new implementation does the same thing, but in a less convoluted way and avoiding some unnecessary work:

  1. Regardless of seek origin, just compute the target offset. Clamp to file bounds for simplicity - this departs slightly from stdio semantics (where fseek will appear to succeed when moving out of bounds, and only subsequent ops like fread will return EOF), but makes it easier to implement the decompress-discard loop.
  2. Only reopen the stream if the target offset is earlier than the current offset (you can't decompress backwards). This potentially avoids disk I/O necessary to re-read the ZIP CD and metadata.
  3. Decompress and discard as much data as necessary to reach the target offset.

The offset clamping is the only real departure from how the original works. Implementing the stdio fseek semantic (i.e. accepting out-of-bounds offsets and deferring error reporting until actually attempting a read) would be possible, but it leaves open the question of what to do with the decompress-drop loop - just don't do it? How do we keep track of the fact that the last seek was invalid, so that we can return the error upon read - with another field in fileHandleData_t? Etc. etc. Feels like it's better to sidestep this issue and either silently clamp, or perhaps return an error.

Uncompressed streams cannot really be sought because each symbol depends on the symbols decoded before it. The existing implementation does a bunch of acrobatics to emulate seeking via reopening the stream, and reading and discarding all the data until the desired offset is reached.

The new implementation does the same thing, but in a less convoluted way and avoiding some unnecessary work:
1. Regardless of seek origin, just compute the target offset. Clamp to file bounds for simplicity - this departs slightly from stdio semantics (where fseek will appear to succeed when moving out of bounds, and only subsequent ops like fread will return EOF), but makes it easier to implement the decompress-discard loop.
2. Only reopen the stream if the target offset is earlier than the current offset (you can't decompress backwards). This potentially avoids disk I/O necessary to re-read the ZIP CD and metadata.
3. Decompress and discard as much data as necessary to reach the target offset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant