Skip to content

Comments

Correctly parse mode indicators in hashsum output#673

Open
native-api wants to merge 10 commits intokellyjonbrazil:masterfrom
native-api:haslib_mode
Open

Correctly parse mode indicators in hashsum output#673
native-api wants to merge 10 commits intokellyjonbrazil:masterfrom
native-api:haslib_mode

Conversation

@native-api
Copy link

E.g. currently

In [2]: s=requests.get("https://github.com/openssl/openssl/releases/download/openssl-3.6.0/openssl-3.6.0.tar.gz.sha256").text

In [10]: s
Out[10]: 'b6a5f44b7eb69e3fa35dbf15524405b44837a481d43d81daddde3ff21fcbb8e9 *openssl-3.6.0.tar.gz\n'

In [5]: jc.parse('hashsum',s)
Out[5]:
[{'filename': '*openssl-3.6.0.tar.gz',
  'hash': 'b6a5f44b7eb69e3fa35dbf15524405b44837a481d43d81daddde3ff21fcbb8e9'}]

@native-api
Copy link
Author

For md5 output which doesn't use mode indicators, I set it as blank.
Maybe we should omit this field altogether in that case instead.

@native-api
Copy link
Author

Reference: https://linux.die.net/man/1/shasum

The default mode
is to print a line with checksum, a character indicating type (*' for binary, ?' for portable, ` ' for text), and name for each FILE.

@native-api native-api closed this Dec 11, 2025
@native-api native-api deleted the haslib_mode branch December 11, 2025 05:06
@native-api native-api restored the haslib_mode branch December 11, 2025 05:06
@native-api
Copy link
Author

Dammit. I fixed a typo in the branch name and the pull request was automatically closed.

@kellyjonbrazil
Copy link
Owner

Can you add some tests with the shasum output with the indicators and the appropriate JSON output?

@native-api native-api restored the haslib_mode branch December 15, 2025 20:10
@native-api native-api reopened this Dec 15, 2025
@native-api
Copy link
Author

Can you add some tests with the shasum output with the indicators and the appropriate JSON output?

Done.

@native-api
Copy link
Author

(Since you seem to prefer to talk here, I've restored the erroneously-named branch, reopened this PR and closed the duplicate one. The branch name doesn't matter much, after all.)

@native-api
Copy link
Author

native-api commented Dec 15, 2025

Didn't add a test for shasum ? symbol ("portable mode") because that mode has been deprecated and then removed in Perl 5.28. I was only able to find it in EOL Ubuntu-18.04 (Perl 5.26).

The code will still accept it (it now accepts any symbol for future-proofing), I just don't think we need to specifically test for it.

@kellyjonbrazil
Copy link
Owner

kellyjonbrazil commented Dec 15, 2025

Looking good!

Ok, I see this works because the older output has two spaces between the hash and the filename. I think this looks good, but we should change "mode" to null if it is blank or space. Also, it would be better to translate the codes into words ("binary", "portable", "text", etc.) in the _process() function so the user can choose to get the translated output by default and original output with --raw. Could do this with a dictionary map.

@native-api
Copy link
Author

native-api commented Dec 15, 2025

we should change "mode" to null if it is blank or space

Agree in the case of BSD-style ("legacy md5") output. But in the standard case, space is just another indicator symbol, same as others. Why should it get special treatment? What would we gain from introducing such inconsistency that would compensate for its drawbacks (which is, giving the user false information about the file's contents)?

Also, it would be better to translate the codes into words ("binary", "portable", "text", etc.) in the _process() function

Done. Left "portable" as unsupported and used it as a test case for an unsupported mode indicator.

@kellyjonbrazil
Copy link
Owner

kellyjonbrazil commented Dec 16, 2025

Agree in the case of BSD-style ("legacy md5") output. But in the standard case, space is just another indicator symbol, same as others. Why should it get special treatment?

Ah, didn't realize space was another indicator. What is it an indicator for? I didn't see space defined in the documentation you referenced.

Edit: Oh, I reread the docs... must have missed it with the funky quoting:

` ' for text

Yeah, makes sense to keep the space in the raw JSON output, then. My only final concern is whether this will erroneously report every file as "text" when parsing older versions of shasum.

@native-api
Copy link
Author

native-api commented Dec 16, 2025

My only final concern is whether this will erroneously report every file as "text" when parsing older versions of shasum.

What "older versions of shasum" are you referring to?
The test data only has versions from osx-10.14.16, ubuntu-18.04 and ubuntu-24.04. They all use the standard format.

$ find -name shasum\*.out -type f
./tests/fixtures/osx-10.14.6/shasum.out
./tests/fixtures/ubuntu-18.04/shasum-portable.out
./tests/fixtures/ubuntu-24.04/shasum-universal-bits.out

@native-api
Copy link
Author

native-api commented Dec 28, 2025

My only final concern is whether this will erroneously report every file as "text" when parsing older versions of shasum.

I've checked the history of shasum at https://github.com/gitpan/Digest-SHA/blob/master/shasum . The current code does correctly parse anything it ever produced.

It only ever produced the standard format: <hash> <mode symbol><filename>. Only the supported mode symbols changed with time:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants