Add fuzz_chunked_reader target for chunk-boundary parser bugs by dobermai · Pull Request #959 · tafia/quick-xml

dobermai · 2026-05-06T20:34:11Z

Follow-up to #958 / #957, and a response to your request in #958 (comment) about improving the fuzzing setup.

Why

fuzz_target_1 drives Reader::from_reader via Cursor::new(data). <Cursor as BufRead>::fill_buf always returns the entire remaining input in a single window, so the harness never exercises parser states that span fill_buf calls.

That gap is exactly what hid #950 and #957. Both regression tests in tests/issues.rs reproduce by wrapping the input in BufReader::with_capacity(N, ..) for a small N — the parser has to resume across chunk boundaries, and state that's only valid within a single fill_buf window can grow past its bounds. Neither bug is reachable from a Cursor-backed harness, regardless of fuzz budget.

What

Adds fuzz_chunked_reader. The first input byte is the BufReader capacity (clamped to 1..=255); the rest is fed as XML. Same input shape as fuzz_target_1, so corpora can be shared between the two.

The 1..=255 range is deliberate — capacities >= data.len() degenerate to the Cursor case already covered by fuzz_target_1, so this is the most efficient use of the fuzz budget for the chunk-boundary regime.

Verification

I reverted 9fd1d58 (the #957 fix) on a local branch, seeded fuzz/corpus/fuzz_chunked_reader/ with tests/issues.rs::issue957's input prefixed with 0x04 (the capacity byte), and ran cargo fuzz run fuzz_chunked_reader corpus/fuzz_chunked_reader. libFuzzer reproduced the slice_index_fail in DtdParser::feed on the seed within one run — same panic site as the original bug.

With the fix back in place, 5000 runs over the seeded corpus are clean.

Notes

No corpus is committed — fuzz/.gitignore already excludes corpus/.
No new dependencies; reuses existing arbitrary / libfuzzer-sys.
fuzz/README.md updated to describe the three targets and how they differ.
Doesn't touch fuzz_target_1 or structured_roundtrip — keeping each target single-purpose.

The existing fuzz_target_1 harness drives `Reader::from_reader` over `Cursor<&[u8]>`, whose `BufRead::fill_buf` always returns the entire remaining input in a single window. A class of parser bugs only manifests when the underlying reader hands back a partial window and the parser has to resume on the next `fill_buf` call. Issues tafia#950 and tafia#957 are both in that class — both regression tests reproduce by wrapping the input in `BufReader::with_capacity(N, ..)` for small N. Add a third target, `fuzz_chunked_reader`, that uses the first input byte as a `BufReader` capacity in 1..=255 and feeds the rest as XML. Capacities >= input length degenerate to the existing Cursor case, so the small range is the right place to spend fuzz budget. Verified the harness catches tafia#957 by reverting commit 9fd1d58 locally, seeding the corpus with the malformed DTD from `tests/issues.rs::issue957` prefixed with capacity byte 0x04, and observing libFuzzer reproduce the `slice_index_fail` panic in `DtdParser::feed` on the seed within one run. With the fix in place, 5000 runs over the seeded corpus stay clean.

dobermai · 2026-05-06T20:38:53Z

Unfortunately (or fortunately?) the new fuzz runs seems to have surfaced another issue. I'll give it a shot to fix. Opened #960 to track

codecov-commenter · 2026-05-06T20:46:32Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.48%. Comparing base (a759d65) to head (697a03e).
⚠️ Report is 27 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #959      +/-   ##
==========================================
+ Coverage   55.08%   56.48%   +1.39%     
==========================================
  Files          44       44              
  Lines       16911    17653     +742     
==========================================
+ Hits         9316     9971     +655     
- Misses       7595     7682      +87

Flag	Coverage Δ
unittests	`56.48% <ø> (+1.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mingun · 2026-05-07T09:18:46Z

Thanks!

#960) Sibling of #957. The fix in #958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR #959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green.

tafia#960) Sibling of tafia#957. The fix in tafia#958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR tafia#959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green. (cherry picked from commit 0672dfa)

dobermai mentioned this pull request May 6, 2026

Fix slice-index panic in DtdParser on chunked unknown markup (#957) #958

Merged

This was referenced May 6, 2026

DtdParser panics on chunked input with long unknown markup after < #960

Closed

Fix slice-index panic in DtdParser on long single-chunk unknown markup (#960) #961

Merged

Mingun merged commit 42cbb86 into tafia:master May 7, 2026
6 of 7 checks passed

Mingun added the enhancement label May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fuzz_chunked_reader target for chunk-boundary parser bugs#959

Add fuzz_chunked_reader target for chunk-boundary parser bugs#959
Mingun merged 1 commit into
tafia:masterfrom
dobermai:fuzz/chunked-bufread-target

dobermai commented May 6, 2026

Uh oh!

dobermai commented May 6, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 6, 2026

Uh oh!

Uh oh!

Mingun commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dobermai commented May 6, 2026

Why

What

Verification

Notes

Uh oh!

dobermai commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 6, 2026

Codecov Report

Uh oh!

Uh oh!

Mingun commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dobermai commented May 6, 2026 •

edited

Loading