Unexpected Illegal surrogate character when parsing field names

Reference: Discussion in https://github.com/FasterXML/jackson-core/pull/1494

Affects: 2.21.0/2.21.1 at least, possibly 3.0.x (but not 3.1.0+)

Seems like something in broken/missing in field name decoding with JSON escapes.

In the minimum repro unit tests below, `acceptJsonEscapedSurrogatePairInFieldName` is failing but `acceptJsonEscapedSurrogatePairInStringValue` passes. I don't know nearly enough about unicode but seems like either both should fail or both should pass (as evidenced by similar tests in `UTF8SurrogateValidation363Test`). Seems like the field name codepath is doing something different.

```java
    @Test
    void acceptJsonEscapedSurrogatePairInFieldName() throws Exception
    {
        // JSON: {"\ud83d\udc4d":"value"}
        byte[] doc = new byte[] {
            '{', '"',
            '\\', 'u', 'd', '8', '3', 'd',  // JSON escape: \ud83d (high surrogate)
            '\\', 'u', 'd', 'c', '4', 'd',  // JSON escape: \udc4d (low surrogate)
            '"', ':', '"', 'v', 'a', 'l', 'u', 'e', '"',
            '}'
        };

        try (JsonParser p = FACTORY.createParser(doc)) {
            assertToken(JsonToken.START_OBJECT, p.nextToken());
            assertToken(JsonToken.FIELD_NAME, p.nextToken());
            // The escaped surrogate pair should decode to U+1F44D (thumbs up emoji)
            assertEquals("\uD83D\uDC4D", p.currentName());
            assertToken(JsonToken.VALUE_STRING, p.nextToken());
            assertEquals("value", p.getText());
            assertToken(JsonToken.END_OBJECT, p.nextToken());
        }
    }

    /**
     * Test that JSON escape sequence \ud83d\udc4d in string value is accepted.
     *
     * JSON: {"key":"\ud83d\udc4d"}
     */
    @Test
    void acceptJsonEscapedSurrogatePairInStringValue() throws Exception
    {
        // JSON: {"key":"\ud83d\udc4d"}
        byte[] doc = new byte[] {
            '{', '"', 'k', 'e', 'y', '"', ':', '"',
            '\\', 'u', 'd', '8', '3', 'd',  // JSON escape: \ud83d (high surrogate)
            '\\', 'u', 'd', 'c', '4', 'd',  // JSON escape: \udc4d (low surrogate)
            '"',
            '}'
        };

        try (JsonParser p = FACTORY.createParser(doc)) {
            assertToken(JsonToken.START_OBJECT, p.nextToken());
            assertToken(JsonToken.FIELD_NAME, p.nextToken());
            assertEquals("key", p.currentName());
            assertToken(JsonToken.VALUE_STRING, p.nextToken());
            // The escaped surrogate pair should decode to U+1F44D (thumbs up emoji)
            assertEquals("\uD83D\uDC4D", p.getText());
            assertToken(JsonToken.END_OBJECT, p.nextToken());
        }
    }
```

Here's what Claude is saying about this fwiw:

> When parsing field names, the code at lines 2025-2045 re-encodes the decoded escape sequence value (e.g., `0xD83D` from `\ud83d`) as a 3-byte UTF-8 sequence (`0xED 0xA0 0xBD)` into the quads buffer, which later gets rejected by `addName()` as an illegal surrogate—whereas string value parsing avoids this entirely by storing the decoded value directly into a `char[]` buffer where Java natively handles surrogate pairs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected Illegal surrogate character when parsing field names #1541

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Unexpected Illegal surrogate character when parsing field names #1541

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions