Skip to content

branch-4.0: [fix](job) fix streaming job stuck when S3 auth error is silently ignored in fetchRemoteMeta #61284#61296

Merged
yiguolei merged 1 commit intobranch-4.0from
auto-pick-61284-branch-4.0
Mar 13, 2026
Merged

branch-4.0: [fix](job) fix streaming job stuck when S3 auth error is silently ignored in fetchRemoteMeta #61284#61296
yiguolei merged 1 commit intobranch-4.0from
auto-pick-61284-branch-4.0

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #61284

…ored in fetchRemoteMeta (#61284)

### What problem does this PR solve?

#### Problem

When S3 credentials become invalid (e.g. 403 auth error), the streaming
job neither pauses nor reports an error — it hang, even add new files.
  indefinitely without making progress.

#### Root cause:

S3ObjStorage.globListInternal() catches all exceptions and returns a
GlobListResult with a non-ok Status instead of
rethrowing. S3SourceOffsetProvider.fetchRemoteMeta() called
globListWithLimit() but never checked the returned status.
Since objects was empty, the maxEndFile was never updated,
hasMoreDataToConsume() kept returning false, and the scheduler
  retried every 500ms forever without triggering a PAUSE.

The same status check was also missing in getNextOffset(), which would
produce a misleading "No new files found" error
  instead of the actual S3 error message.

#### Fix

- In fetchRemoteMeta(): check globListResult status after
globListWithLimit(); throw Exception with the real error message
if not ok, so the upper-level StreamingInsertJob.fetchMeta() catch block
can catch it, set GET_REMOTE_DATA_ERROR, and PAUSE
   the job for auto-resume.
- In getNextOffset(): same status check, throw RuntimeException with
accurate error message.
- Add a debug point S3SourceOffsetProvider.fetchRemoteMeta.error to
simulate a failed GlobListResult for testing.

#### Test

Added regression test test_streaming_insert_job_fetch_meta_error:
enables the debug point to inject a failed
GlobListResult, creates a streaming job, waits for it to reach PAUSED
status, and asserts the ErrorMsg contains "Failed to
  list S3 files".
@github-actions github-actions bot requested a review from yiguolei as a code owner March 13, 2026 02:59
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 13, 2026
@hello-stephen
Copy link
Contributor

run buildall

@yiguolei yiguolei merged commit 663312c into branch-4.0 Mar 13, 2026
26 of 29 checks passed
@github-actions github-actions bot deleted the auto-pick-61284-branch-4.0 branch March 13, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants