[Bug](profile) fix wrong count of scan wait worker timer#61064
[Bug](profile) fix wrong count of scan wait worker timer#61064BiteTheDDDDt merged 3 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where start_wait_worker_timer() was being called unconditionally in the scanner's Defer cleanup block, even when the scanner had failed (!status.ok()) or had reached end-of-stream (eos == true). It also adds explanatory comments describing the correct ordering of timing counter operations. In the normal scan cycle, a scanner finishes a partial scan round and then gets re-queued; calling start_wait_worker_timer() at this point correctly begins measuring the next wait period. For failed or finished scanners, calling it was incorrect and redundant.
Changes:
- Added a block comment (lines 156–160) explaining the required ordering of timing counter operations (
update_wait_worker_timer,start_scan_cpu_timer,update_scan_cpu_timer,start_wait_worker_timer) for correctness. - Moved
statusandbool eosdeclarations before theDeferblock to improve code clarity and make the conditional logic in the lambda readable. - Added a conditional guard in the
Deferlambda sostart_wait_worker_timer()is only called whenstatus.ok() && !eos, preventing incorrect timer updates for failed or finished scanners.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Defer defer_scanner([&] { | ||
| // WorkloadGroup Policy will check cputime realtime, so that should update the counter | ||
| // as soon as possible, could not update it on close. | ||
| update_scanner_profile(); | ||
| if (status.ok() && !eos) { | ||
| // if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again. |
There was a problem hiding this comment.
The inline comment on this line states "if eos is true, it means the scanner is finished successfully," but this is inaccurate. The eos flag is also set to true in error/cancellation paths (e.g., when scanner->prepare() or scanner->open() fails, or when ctx->done() is true). It would be more accurate to say "if eos is true, the scanner will not be re-queued (either because it completed scanning, encountered an error, or the context was cancelled), so no need to start the wait timer again."
| // if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again. | |
| // If status is not ok, the scanner has failed and its counters may not be updated | |
| // correctly, so there is no need to restart the wait-worker timer. If eos is true, | |
| // the scanner will not be re-queued (either because it completed scanning, | |
| // encountered an error, or the context was cancelled), so there is no need to | |
| // restart the wait-worker timer either. |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
run buildall |
1 similar comment
|
run buildall |
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
|
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Code Review Summary
This PR aims to fix incorrect counting of the scan wait worker timer by reordering timer start/stop calls and adding a _has_prepared guard to OlapScanner::update_realtime_counters(). While the intent is good, there is a correctness regression introduced by the refactoring.
Critical Checkpoints
-
Goal and correctness: The PR goal is to fix wrong scan wait worker timer counting. The
_has_preparedguard move toOlapScanner::update_realtime_counters()is correct. However, the refactoring of theDeferblock introduces a regression:update_scanner_profile()is no longer called on the normal yield path (scanner yields back to scheduler without eos or error), which breaks real-time CPU time and scan counter tracking for workload group policies. -
Modification scope: The change is small and focused, but the Defer refactoring has unintended side effects.
-
Parallel code paths:
FileScanner::update_realtime_counters()does not have the_has_preparedguard that was added toOlapScanner::update_realtime_counters(). WhileFileScannerinitializes_file_reader_statsininit()(beforeprepare()), the inconsistency should be noted for maintainability. -
Performance: The
start_wait_worker_timer()call in the new Defer is redundant becauseScannerScheduler::submit()(line 70 of this file) unconditionally callsstart_wait_worker_timer()when the task is re-submitted, which resets and overwrites the timer started in the Defer. -
Test coverage: No tests are included. The check list shows no test category selected. Given the behavioral change to profiling counters, at least manual verification of profile output should be documented.
-
Concurrency: No new concurrency issues introduced.
-
Memory safety: No new memory safety issues introduced.
-
Error handling: The
_has_preparedguard correctly prevents null dereference on prepare failure. -
Configuration: No new config items.
-
Incompatible changes: None.
-
Observability: The regression actually reduces observability by not updating real-time counters on yield paths.
| Status status = Status::OK(); | ||
| bool eos = false; | ||
| Defer defer_scanner([&] { | ||
| if (status.ok() && !eos) { |
There was a problem hiding this comment.
Bug: update_scanner_profile() is no longer called on normal yield path
The old code called update_scanner_profile() unconditionally in the Defer:
Defer defer_scanner([&] {
// WorkloadGroup Policy will check cputime realtime, so that should update the counter
// as soon as possible, could not update it on close.
update_scanner_profile();
});The new code only calls update_scanner_profile() when eos == true (at line 342), and the Defer only calls start_wait_worker_timer() when status.ok() && !eos. This means:
- On the normal yield path (scanner yields back to scheduler because it hit the byte threshold, max run time, or other break conditions — where
status.ok() && !eos),update_scan_cpu_timer()andupdate_realtime_counters()are never called. - The old comment explicitly stated: "WorkloadGroup Policy will check cputime realtime, so that should update the counter as soon as possible, could not update it on close." — This requirement is violated by the new code.
This will cause:
- CPU time for each scanner slice to not be reported to the workload group until eos, making workload group CPU limits less responsive
- Scan bytes/rows realtime counters to not be updated until eos, potentially allowing queries to exceed scan byte limits before being detected
Suggested fix: Keep calling update_scanner_profile() in the Defer (as before), and additionally call start_wait_worker_timer() only on the normal yield path:
Defer defer_scanner([&] {
update_scanner_profile();
if (status.ok() && !eos) {
scanner->start_wait_worker_timer();
}
});| Defer defer_scanner([&] { | ||
| if (status.ok() && !eos) { | ||
| // if status is not ok, it means the scanner is failed, and the counter may be not updated correctly, so no need to update counter again. if eos is true, it means the scanner is finished successfully, and the counter is updated correctly, so no need to update counter again. | ||
| scanner->start_wait_worker_timer(); |
There was a problem hiding this comment.
Minor: start_wait_worker_timer() here is redundant
start_wait_worker_timer() resets and restarts the internal _watch. However, ScannerScheduler::submit() (line 70 of this file) also calls start_wait_worker_timer() unconditionally when the scan task is re-submitted to the thread pool. Since submit() runs after the current _scanner_scan returns and push_back_scan_task is processed, the timer started here is immediately overwritten.
The actual wait time is measured from the submit() call (line 70) to the next update_wait_worker_timer() call (line 164) when a worker picks it up.
What problem does this PR solve?
This pull request improves the accuracy and clarity of scanner timing and profiling in the
ScannerSchedulerimplementation. The main changes clarify the order and logic for updating and recording timing counters, ensuring that CPU and wait times are measured more precisely during scanner execution and cleanup.Enhancements to scanner timing and profiling:
update_wait_worker_timer,start_scan_cpu_timer,update_scan_cpu_timer,start_wait_worker_timer) to ensure accurate measurement of scanner CPU and wait times.Defercleanup logic to only start the wait worker timer if the scanner has not failed and has not reached the end-of-stream, preventing redundant or incorrect counter updates.Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)