Skip to content

Conversation

@rdesgroppes
Copy link
Contributor

Prior to Bazel 8, manifest files containing non-ASCII characters were written with UTF-16LE encoding instead of UTF-8 on Windows:

This led to disable failing tests in CI:

//tests/zip:unicode_test:

File "pkg\private\manifest.py", line 59, in read_entries_from
  raw_entries = json.loads(fh.read())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 338: invalid start byte

//tests/mappings:utf8_manifest_test:

File "tests\mappings\manifest_test_lib.py", line 39, in assertManifestsMatch
  got = json.loads(g_fp.read())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 354: invalid start byte

Since the manifest is plain JSON, the fix simply consists in detecting whether the second byte is 0, where the default UTF-8 decoding would fail, in which case we assume the file is UTF-16LE-encoded.

The code is slightly reorganized to factor out the encoding selection.

This allows to enable //tests/mappings:utf8_manifest_test and //tests/zip:unicode_test tests in Windows CI.

@rdesgroppes rdesgroppes marked this pull request as ready for review February 8, 2026 16:44
@rdesgroppes rdesgroppes force-pushed the fix-handling-of-utf16-manifests-on-windows branch from 24b2ac0 to a79b1ee Compare February 10, 2026 07:30
Prior to Bazel 8, manifest files containing non-ASCII characters were
written with UTF-16LE encoding instead of UTF-8 on Windows:
- bazelbuild/bazel#24231
- bazelbuild/bazel#24350
- bazelbuild/bazel#24403

This led to disable failing tests in CI:

`//tests/zip:unicode_test`:
```
File "pkg\private\manifest.py", line 59, in read_entries_from
  raw_entries = json.loads(fh.read())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 338: invalid start byte
```

`//tests/mappings:utf8_manifest_test`:
```
File "tests\mappings\manifest_test_lib.py", line 39, in assertManifestsMatch
  got = json.loads(g_fp.read())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 354: invalid start byte
```

Since the manifest is plain JSON, the fix simply consists in detecting
whether the second byte is `0`, where the default UTF-8 decoding would
fail, in which case we assume the file is UTF-16LE-encoded.

The code is slightly reorganized to factor out the encoding selection.

This allows to enable `//tests/mappings:utf8_manifest_test` and
`//tests/zip:unicode_test` tests in Windows CI.
@rdesgroppes rdesgroppes force-pushed the fix-handling-of-utf16-manifests-on-windows branch from a79b1ee to 35bf0df Compare February 10, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant