[fix](streaming) Fix NPE in StreamingInsertJob when MetricRepo is not initialized during replay#61253
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
Code Review SummaryPR: fix Fix NPE in StreamingInsertJob when MetricRepo is not initialized during replay This PR fixes a Critical Checkpoint Conclusions
Minor Observations (Non-blocking)
Verdict: No issues found. The fix is correct, focused, and follows established patterns. |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run feut |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 27769 ms |
TPC-DS: Total hot run time: 153672 ms |
|
run p0 |
… initialized during replay (#61253) ### What problem does this PR solve? #### Problem `StreamingInsertJob.replayOnCommitted()` throws a `NullPointerException` during FE replay: ``` java.lang.NullPointerException: Cannot invoke "org.apache.doris.metric.LongCounterMetric.increase(java.lang.Long)" because "org.apache.doris.metric.MetricRepo.COUNTER_STREAMING_JOB_TOTAL_ROWS" is null at StreamingInsertJob.updateJobStatisticAndOffset(StreamingInsertJob.java:634) at StreamingInsertJob.replayOnCommitted(StreamingInsertJob.java:1020) at TransactionState.replaySetTransactionStatus(TransactionState.java:589) at DatabaseTransactionMgr.replayUpsertTransactionState(DatabaseTransactionMgr.java:2636) at GlobalTransactionMgr.replayUpsertTransactionState(GlobalTransactionMgr.java:952) ``` The root cause is that `MetricRepo` may not be initialized when FE replays transaction logs during startup, but `StreamingInsertJob` unconditionally calls metric update methods, leading to NPE. #### Fix Two separate changes are applied to `StreamingInsertJob`: 1. **Skip metric updates during replay**: `updateJobStatisticAndOffset` and `updateCloudJobStatisticAndOffset` now accept an `isReplay` boolean parameter. Call sites in `replayOnCommitted` and `replayOnCloudMode` pass `true`, while `afterCommitted` passes `false`. 2. **Guard all metric calls with `MetricRepo.isInit`**: All remaining `MetricRepo.COUNTER_STREAMING_JOB_*` call sites are wrapped with `if (MetricRepo.isInit && !isReplay)` or `if (MetricRepo.isInit)` to prevent NPE if `MetricRepo` has not been fully initialized.
…cRepo is not initialized during replay #61253 (#61295) Cherry-picked from #61253 Co-authored-by: wudi <[email protected]>
What problem does this PR solve?
Problem
StreamingInsertJob.replayOnCommitted()throws aNullPointerExceptionduring FE replay:The root cause is that
MetricRepomay not be initialized when FE replays transaction logs during startup,but
StreamingInsertJobunconditionally calls metric update methods, leading to NPE.Fix
Two separate changes are applied to
StreamingInsertJob:Skip metric updates during replay:
updateJobStatisticAndOffsetandupdateCloudJobStatisticAndOffsetnow accept anisReplayboolean parameter. Call sites inreplayOnCommittedandreplayOnCloudModepasstrue, whileafterCommittedpassesfalse.Guard all metric calls with
MetricRepo.isInit: All remainingMetricRepo.COUNTER_STREAMING_JOB_*call sites are wrapped with
if (MetricRepo.isInit && !isReplay)orif (MetricRepo.isInit)to prevent NPEif
MetricRepohas not been fully initialized.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)