Skip to content

Add fuzz_chunked_reader target for chunk-boundary parser bugs#959

Merged
Mingun merged 1 commit into
tafia:masterfrom
dobermai:fuzz/chunked-bufread-target
May 7, 2026
Merged

Add fuzz_chunked_reader target for chunk-boundary parser bugs#959
Mingun merged 1 commit into
tafia:masterfrom
dobermai:fuzz/chunked-bufread-target

Conversation

@dobermai
Copy link
Copy Markdown
Contributor

@dobermai dobermai commented May 6, 2026

Follow-up to #958 / #957, and a response to your request in #958 (comment) about improving the fuzzing setup.

Why

fuzz_target_1 drives Reader::from_reader via Cursor::new(data). <Cursor as BufRead>::fill_buf always returns the entire remaining input in a single window, so the harness never exercises parser states that span fill_buf calls.

That gap is exactly what hid #950 and #957. Both regression tests in tests/issues.rs reproduce by wrapping the input in BufReader::with_capacity(N, ..) for a small N — the parser has to resume across chunk boundaries, and state that's only valid within a single fill_buf window can grow past its bounds. Neither bug is reachable from a Cursor-backed harness, regardless of fuzz budget.

What

Adds fuzz_chunked_reader. The first input byte is the BufReader capacity (clamped to 1..=255); the rest is fed as XML. Same input shape as fuzz_target_1, so corpora can be shared between the two.

The 1..=255 range is deliberate — capacities >= data.len() degenerate to the Cursor case already covered by fuzz_target_1, so this is the most efficient use of the fuzz budget for the chunk-boundary regime.

Verification

I reverted 9fd1d58 (the #957 fix) on a local branch, seeded fuzz/corpus/fuzz_chunked_reader/ with tests/issues.rs::issue957's input prefixed with 0x04 (the capacity byte), and ran cargo fuzz run fuzz_chunked_reader corpus/fuzz_chunked_reader. libFuzzer reproduced the slice_index_fail in DtdParser::feed on the seed within one run — same panic site as the original bug.

With the fix back in place, 5000 runs over the seeded corpus are clean.

Notes

  • No corpus is committed — fuzz/.gitignore already excludes corpus/.
  • No new dependencies; reuses existing arbitrary / libfuzzer-sys.
  • fuzz/README.md updated to describe the three targets and how they differ.
  • Doesn't touch fuzz_target_1 or structured_roundtrip — keeping each target single-purpose.

The existing fuzz_target_1 harness drives `Reader::from_reader` over
`Cursor<&[u8]>`, whose `BufRead::fill_buf` always returns the entire
remaining input in a single window. A class of parser bugs only
manifests when the underlying reader hands back a partial window and
the parser has to resume on the next `fill_buf` call. Issues tafia#950 and
tafia#957 are both in that class — both regression tests reproduce by
wrapping the input in `BufReader::with_capacity(N, ..)` for small N.

Add a third target, `fuzz_chunked_reader`, that uses the first input
byte as a `BufReader` capacity in 1..=255 and feeds the rest as XML.
Capacities >= input length degenerate to the existing Cursor case, so
the small range is the right place to spend fuzz budget.

Verified the harness catches tafia#957 by reverting commit 9fd1d58 locally,
seeding the corpus with the malformed DTD from `tests/issues.rs::issue957`
prefixed with capacity byte 0x04, and observing libFuzzer reproduce the
`slice_index_fail` panic in `DtdParser::feed` on the seed within one
run. With the fix in place, 5000 runs over the seeded corpus stay clean.
@dobermai
Copy link
Copy Markdown
Contributor Author

dobermai commented May 6, 2026

Unfortunately (or fortunately?) the new fuzz runs seems to have surfaced another issue. I'll give it a shot to fix. Opened #960 to track

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.48%. Comparing base (a759d65) to head (697a03e).
⚠️ Report is 27 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #959      +/-   ##
==========================================
+ Coverage   55.08%   56.48%   +1.39%     
==========================================
  Files          44       44              
  Lines       16911    17653     +742     
==========================================
+ Hits         9316     9971     +655     
- Misses       7595     7682      +87     
Flag Coverage Δ
unittests 56.48% <ø> (+1.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Mingun Mingun merged commit 42cbb86 into tafia:master May 7, 2026
6 of 7 checks passed
@Mingun
Copy link
Copy Markdown
Collaborator

Mingun commented May 7, 2026

Thanks!

Mingun pushed a commit that referenced this pull request May 7, 2026
#960)

Sibling of #957. The fix in #958 prevents `skipped` from growing across
multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the
same panic was still reachable via the *initial* assignment one arm
earlier:

    *self = Self::UndecidedMarkup(cur.len() - i - 1);

When a chunk delivered `<` followed by 9+ bytes of unknown markup (so
`switch()` returned `None` on a window long enough to definitively rule
out all keywords), `skipped` was set to that long value in one shot. The
next entry into `UndecidedMarkup` panicked on the very first line:

    bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]);
    // range end index 24 out of range for slice of length 9

CIFuzz on PR #959 (a new chunked-`BufRead` fuzz target) reproduced this
within ~3s, 192,123 executions. Local minimal reproducer:

    let reader = BufReader::with_capacity(
        24,
        "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(),
    );

Mirrors the bail-out already present in the `UndecidedMarkup` arm: when
`cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`,
`<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we
transition to `InElementDecl` (skip until `>`) instead of staging more
bytes than the 9-byte work buffer can hold.

Adds `issue960` regression test in the same style as `issue957`. Full
`cargo test` stays green.
Mingun pushed a commit to Mingun/quick-xml that referenced this pull request May 8, 2026
tafia#960)

Sibling of tafia#957. The fix in tafia#958 prevents `skipped` from growing across
multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the
same panic was still reachable via the *initial* assignment one arm
earlier:

    *self = Self::UndecidedMarkup(cur.len() - i - 1);

When a chunk delivered `<` followed by 9+ bytes of unknown markup (so
`switch()` returned `None` on a window long enough to definitively rule
out all keywords), `skipped` was set to that long value in one shot. The
next entry into `UndecidedMarkup` panicked on the very first line:

    bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]);
    // range end index 24 out of range for slice of length 9

CIFuzz on PR tafia#959 (a new chunked-`BufRead` fuzz target) reproduced this
within ~3s, 192,123 executions. Local minimal reproducer:

    let reader = BufReader::with_capacity(
        24,
        "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(),
    );

Mirrors the bail-out already present in the `UndecidedMarkup` arm: when
`cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`,
`<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we
transition to `InElementDecl` (skip until `>`) instead of staging more
bytes than the 9-byte work buffer can hold.

Adds `issue960` regression test in the same style as `issue957`. Full
`cargo test` stays green.

(cherry picked from commit 0672dfa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants