Commit fbe4993
authored
[fix](job) fix streaming job fails with "No new files found" on second scheduling (#61249)
### What problem does this PR solve?
#### When a streaming job processes S3 files, the second scheduling
fails with:
No new files found in path: ...
Root cause: In S3ObjStorage.globListInternal, currentMaxFile was
unconditionally set to the last raw S3
object key returned in the response page, without checking whether it
matched the glob pattern.
This affects two scenarios:
**Scenario 1** — reachLimit=false (all matched files consumed in one
listing):
The S3 page still contains non-matching keys after the last matched file
(e.g.
test_csv_comma_header.csv.lz4 sitting after test_csv_comma_header.csv).
currentMaxFile gets set to the
.lz4 key, so hasMoreDataToConsume() returns true. The next scheduling
calls startAfter("...csv"), S3
returns only .lz4 which doesn't match the glob → rfiles empty →
exception.
**Scenario 2** — reachLimit=true (batch limit hit mid-page):
After the limit is hit, the remaining page objects are not inspected.
The original code set currentMaxFile
to the last raw key in the entire page (which may be a non-matching
sibling), causing the same failure on
the next scheduling attempt.
#### Fix
Track lastMatchedKey (the last S3 key that actually matched the glob)
during the listing loop.
When reachLimit=true, instead of breaking out of the for loop
immediately, continue scanning the remaining
objects already fetched in the current page to find the first next
glob-matching key as currentMaxFile.
No extra S3 API call is needed.
When no next matching key is found in the remaining page objects, fall
back to lastMatchedKey instead of
the raw last S3 page key.
####Regression Test
Added test_streaming_job_no_new_files_with_sibling. The pattern
example_[0-0].csv only matches
example_0.csv; since getLongestPrefix strips at [, the S3 listing prefix
becomes
regression/load/data/example_ and returns both example_0.csv and
example_1.csv — example_1.csv acts as the
non-matching sibling. The test verifies that after the first successful
task no failed tasks appear.1 parent aef4694 commit fbe4993
File tree
4 files changed
+145
-8
lines changed- fe/fe-core/src/main/java/org/apache/doris
- fs/obj
- job/common
- regression-test
- data/job_p0/streaming_job
- suites/job_p0/streaming_job
4 files changed
+145
-8
lines changedLines changed: 20 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
645 | 645 | | |
646 | 646 | | |
647 | 647 | | |
| 648 | + | |
648 | 649 | | |
649 | 650 | | |
650 | 651 | | |
651 | 652 | | |
652 | 653 | | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
653 | 667 | | |
654 | 668 | | |
655 | 669 | | |
| |||
677 | 691 | | |
678 | 692 | | |
679 | 693 | | |
| 694 | + | |
680 | 695 | | |
681 | 696 | | |
682 | 697 | | |
| |||
686 | 701 | | |
687 | 702 | | |
688 | 703 | | |
689 | | - | |
690 | | - | |
691 | | - | |
692 | 704 | | |
693 | 705 | | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
698 | 711 | | |
699 | 712 | | |
700 | 713 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
54 | 56 | | |
55 | 57 | | |
56 | 58 | | |
| |||
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
Lines changed: 110 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
0 commit comments