Add fuzz_chunked_reader target for chunk-boundary parser bugs#959
Merged
Conversation
The existing fuzz_target_1 harness drives `Reader::from_reader` over `Cursor<&[u8]>`, whose `BufRead::fill_buf` always returns the entire remaining input in a single window. A class of parser bugs only manifests when the underlying reader hands back a partial window and the parser has to resume on the next `fill_buf` call. Issues tafia#950 and tafia#957 are both in that class — both regression tests reproduce by wrapping the input in `BufReader::with_capacity(N, ..)` for small N. Add a third target, `fuzz_chunked_reader`, that uses the first input byte as a `BufReader` capacity in 1..=255 and feeds the rest as XML. Capacities >= input length degenerate to the existing Cursor case, so the small range is the right place to spend fuzz budget. Verified the harness catches tafia#957 by reverting commit 9fd1d58 locally, seeding the corpus with the malformed DTD from `tests/issues.rs::issue957` prefixed with capacity byte 0x04, and observing libFuzzer reproduce the `slice_index_fail` panic in `DtdParser::feed` on the seed within one run. With the fix in place, 5000 runs over the seeded corpus stay clean.
Contributor
Author
|
Unfortunately (or fortunately?) the new fuzz runs seems to have surfaced another issue. I'll give it a shot to fix. Opened #960 to track |
This was referenced May 6, 2026
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #959 +/- ##
==========================================
+ Coverage 55.08% 56.48% +1.39%
==========================================
Files 44 44
Lines 16911 17653 +742
==========================================
+ Hits 9316 9971 +655
- Misses 7595 7682 +87
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Collaborator
|
Thanks! |
Mingun
pushed a commit
that referenced
this pull request
May 7, 2026
#960) Sibling of #957. The fix in #958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR #959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green.
Mingun
pushed a commit
to Mingun/quick-xml
that referenced
this pull request
May 8, 2026
tafia#960) Sibling of tafia#957. The fix in tafia#958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR tafia#959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green. (cherry picked from commit 0672dfa)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #958 / #957, and a response to your request in #958 (comment) about improving the fuzzing setup.
Why
fuzz_target_1drivesReader::from_readerviaCursor::new(data).<Cursor as BufRead>::fill_bufalways returns the entire remaining input in a single window, so the harness never exercises parser states that spanfill_bufcalls.That gap is exactly what hid #950 and #957. Both regression tests in
tests/issues.rsreproduce by wrapping the input inBufReader::with_capacity(N, ..)for a small N — the parser has to resume across chunk boundaries, and state that's only valid within a singlefill_bufwindow can grow past its bounds. Neither bug is reachable from aCursor-backed harness, regardless of fuzz budget.What
Adds
fuzz_chunked_reader. The first input byte is theBufReadercapacity (clamped to1..=255); the rest is fed as XML. Same input shape asfuzz_target_1, so corpora can be shared between the two.The
1..=255range is deliberate — capacities>= data.len()degenerate to theCursorcase already covered byfuzz_target_1, so this is the most efficient use of the fuzz budget for the chunk-boundary regime.Verification
I reverted 9fd1d58 (the #957 fix) on a local branch, seeded
fuzz/corpus/fuzz_chunked_reader/withtests/issues.rs::issue957's input prefixed with0x04(the capacity byte), and rancargo fuzz run fuzz_chunked_reader corpus/fuzz_chunked_reader. libFuzzer reproduced theslice_index_failinDtdParser::feedon the seed within one run — same panic site as the original bug.With the fix back in place, 5000 runs over the seeded corpus are clean.
Notes
fuzz/.gitignorealready excludescorpus/.arbitrary/libfuzzer-sys.fuzz/README.mdupdated to describe the three targets and how they differ.fuzz_target_1orstructured_roundtrip— keeping each target single-purpose.