[AURON #2229] Silent fallback to empty partitions may cause data loss#2230
Open
guixiaowen wants to merge 2 commits intoapache:masterfrom
Open
[AURON #2229] Silent fallback to empty partitions may cause data loss#2230guixiaowen wants to merge 2 commits intoapache:masterfrom
guixiaowen wants to merge 2 commits intoapache:masterfrom
Conversation
| s"Failed to obtain input partitions via reflection for ${exec.getClass.getName}.", | ||
| t) | ||
| Seq.empty | ||
| throw new IllegalStateException( |
Contributor
There was a problem hiding this comment.
Since we are throwing an exception anyways, do we need to catch it in L189?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… #2229
Which issue does this PR close?
Closes #2229
Rationale for this change
The current implementation silently falls back to Seq.empty when both DataSource V2 and reflective partition retrieval fail. This may lead to empty input partitions without any failure signal, potentially causing silent data loss or empty query results.
Problem Description
In inputPartitions(exec: BatchScanExec), when partition planning fails:
V2 API failure → caught and ignored
Reflection failure → caught and ignored
Final fallback → Seq.empty
catch { case t: Throwable => logWarning(...) Seq.empty }and
catch { case t: Throwable => logWarning(...) Seq.empty }Risk
Returning an empty partition list is dangerous because:
Silent data loss
Downstream execution may interpret empty partitions as:
no data available
valid empty scan result
→ leads to incorrect empty query results without failure
Hard to detect in production
Only warning logs are emitted
No exception or job failure
Easy to miss in large pipelines
A scan failure (cannot plan partitions) is semantically different from:
dataset is actually empty
But both are currently indistinguishable.
What changes are included in this PR?
Expected Behavior
When both:
V2 planInputPartitions
reflection-based partition retrieval
fail, the system should:
Fail fast with a clear exception:
throw new IllegalStateException( s"Failed to plan input partitions for ${exec.getClass.getName}")
Are there any user-facing changes?
Nothing.
How was this patch tested?
Depends on existing unit tests.