This repository was archived by the owner on Mar 24, 2026. It is now read-only.
Unicode XML character entities fix#82
Conversation
Author
|
Note that the following change is a potentially breaking change which is why the major release version has been updated. |
ziluvatar
suggested changes
Apr 25, 2018
Contributor
ziluvatar
left a comment
There was a problem hiding this comment.
Code is fine. I think we might need to integrate the change first and if we consider it stable enough we can merge and create the major version.
Summary: I'll leave the PR open for now.
…de character entity pair smashing.
…rsion includes an update to the xmldom dependency that will break an incorrect behavior where pairs of unicode XML character entities were combined into a single value.
Author
|
@robinbijlani @machuga Do one of you know who owns this work now? |
Contributor
|
Hi @mikeops . Yes, Apollo will be handling continuing this work. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates
xmldomto fix an issue where pairs of unicode values represented as XML-encoded character entities are combined into a single value.This comes from an issue filed on an internal project where the signature digest was not being properly calculated for assertion values that use two or more XML-encoded Chinese characters.
The meat of the investigation of that internal ticket is below.
---- Reproduced Text ----
Update to this. I checked out a couple of what I-believe-to-be-authoritative documents about multiple-byte character entities.
According to the W3C QA doc "Using character escapes in markup and CSS":
https://www.w3.org/International/questions/qa-escapes
Note that the case for which the two-byte character entity squashing code was added appears to be the German "LATIN SMALL LETTER U DIAERESIS" also commonly referred to as an "umlaut".
http://www.fileformat.info/info/unicode/char/fc/index.htm
The test case in
auth0/passport-wsfed-saml2that attempts to validate multi character entity encoding is using the UTF-8 two-byte hexidecimal representationü.passport-wsfed-saml2/test/interop.tests.js
Lines 192 to 205 in 6d50336
Note that the Unicode Binary and UTF-8 w/o formatting bits generates the same representation 00FC after padding the value with leading zeroes.
From the XML 1.0 specification "4.1 Character and Entity References" https://www.w3.org/TR/xml/#sec-references:
This would appear to indicate that the "code point" is the correct value representation and not the UTF-8 encoded value. This appears to be confirmed by testing using the Chrome browser and HTML.
Based on the spec it seems then that the correct approach is to drop support for two-byte UTF-8 encoded character entity representations entirely.