Reference: Discussion in #1494
Affects: 2.21.0/2.21.1 at least, possibly 3.0.x (but not 3.1.0+)
Seems like something in broken/missing in field name decoding with JSON escapes.
In the minimum repro unit tests below, acceptJsonEscapedSurrogatePairInFieldName is failing but acceptJsonEscapedSurrogatePairInStringValue passes. I don't know nearly enough about unicode but seems like either both should fail or both should pass (as evidenced by similar tests in UTF8SurrogateValidation363Test). Seems like the field name codepath is doing something different.
@Test
void acceptJsonEscapedSurrogatePairInFieldName() throws Exception
{
// JSON: {"\ud83d\udc4d":"value"}
byte[] doc = new byte[] {
'{', '"',
'\\', 'u', 'd', '8', '3', 'd', // JSON escape: \ud83d (high surrogate)
'\\', 'u', 'd', 'c', '4', 'd', // JSON escape: \udc4d (low surrogate)
'"', ':', '"', 'v', 'a', 'l', 'u', 'e', '"',
'}'
};
try (JsonParser p = FACTORY.createParser(doc)) {
assertToken(JsonToken.START_OBJECT, p.nextToken());
assertToken(JsonToken.FIELD_NAME, p.nextToken());
// The escaped surrogate pair should decode to U+1F44D (thumbs up emoji)
assertEquals("\uD83D\uDC4D", p.currentName());
assertToken(JsonToken.VALUE_STRING, p.nextToken());
assertEquals("value", p.getText());
assertToken(JsonToken.END_OBJECT, p.nextToken());
}
}
/**
* Test that JSON escape sequence \ud83d\udc4d in string value is accepted.
*
* JSON: {"key":"\ud83d\udc4d"}
*/
@Test
void acceptJsonEscapedSurrogatePairInStringValue() throws Exception
{
// JSON: {"key":"\ud83d\udc4d"}
byte[] doc = new byte[] {
'{', '"', 'k', 'e', 'y', '"', ':', '"',
'\\', 'u', 'd', '8', '3', 'd', // JSON escape: \ud83d (high surrogate)
'\\', 'u', 'd', 'c', '4', 'd', // JSON escape: \udc4d (low surrogate)
'"',
'}'
};
try (JsonParser p = FACTORY.createParser(doc)) {
assertToken(JsonToken.START_OBJECT, p.nextToken());
assertToken(JsonToken.FIELD_NAME, p.nextToken());
assertEquals("key", p.currentName());
assertToken(JsonToken.VALUE_STRING, p.nextToken());
// The escaped surrogate pair should decode to U+1F44D (thumbs up emoji)
assertEquals("\uD83D\uDC4D", p.getText());
assertToken(JsonToken.END_OBJECT, p.nextToken());
}
}
Here's what Claude is saying about this fwiw:
When parsing field names, the code at lines 2025-2045 re-encodes the decoded escape sequence value (e.g., 0xD83D from \ud83d) as a 3-byte UTF-8 sequence (0xED 0xA0 0xBD) into the quads buffer, which later gets rejected by addName() as an illegal surrogate—whereas string value parsing avoids this entirely by storing the decoded value directly into a char[] buffer where Java natively handles surrogate pairs.
Reference: Discussion in #1494
Affects: 2.21.0/2.21.1 at least, possibly 3.0.x (but not 3.1.0+)
Seems like something in broken/missing in field name decoding with JSON escapes.
In the minimum repro unit tests below,
acceptJsonEscapedSurrogatePairInFieldNameis failing butacceptJsonEscapedSurrogatePairInStringValuepasses. I don't know nearly enough about unicode but seems like either both should fail or both should pass (as evidenced by similar tests inUTF8SurrogateValidation363Test). Seems like the field name codepath is doing something different.Here's what Claude is saying about this fwiw: