[improvement](be) Optimize nested loop join materialization#62956
[improvement](be) Optimize nested loop join materialization#62956BiteTheDDDDt wants to merge 12 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Pull request overview
Adds FE-planned materialized-slot information for nested loop joins and updates the BE nested loop join probe operator to support delaying payload materialization until after non-equi join conjunct evaluation, reducing unnecessary row expansion/copy for projection-pruned queries (e.g. COUNT(*)).
Changes:
- Extend
TNestedLoopJoinNodethrift with optionalmaterialized_slot_idsto drive BE materialization behavior. - FE: Track/map intermediate slots and emit
materialized_slot_ids, including explicitly-empty lists for constant/COUNT projections. - BE: Implement lazy join-block construction for inner/cross nested loop joins by evaluating join conjuncts using a lightweight eval block and only materializing required payload columns.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| gensrc/thrift/PlanNodes.thrift | Adds materialized_slot_ids to nested loop join plan node thrift. |
| fe/fe-core/src/main/java/org/apache/doris/planner/NestedLoopJoinNode.java | Stores and serializes materialized slot ids (and explain output). |
| fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java | Computes materialized slots for NLJ intermediate tuple; prunes them based on project requirements. |
| be/src/exec/operator/nested_loop_join_probe_operator.h | Declares lazy-materialization helper methods and new state fields. |
| be/src/exec/operator/nested_loop_join_probe_operator.cpp | Implements lazy join generation paths and enables them under specific conditions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Map<ExprId, SlotReference> materializedSlotReferenceMap = Maps.newHashMap(outputSlotReferenceMap); | ||
| nestedLoopJoinNode.enableMaterializedSlotIds(); |
| !_is_mark_join && | ||
| (_join_op == TJoinOp::INNER_JOIN || _join_op == TJoinOp::CROSS_JOIN) && | ||
| conjuncts().empty() && !projections().empty() && | ||
| &projections_row_desc() == &intermediate_row_desc(); |
| // Slots that need to be materialized after join conjunct evaluation. | ||
| // If this field is not set, BE keeps the legacy behavior. | ||
| // If this field is set to an empty list, no payload slot needs materialization. |
There was a problem hiding this comment.
Automated review result: request changes. I found a blocking nested-loop-join correctness regression, and the new correctness-critical lazy materialization path has no dedicated test coverage in this PR.
Critical checkpoint conclusions:
- Goal/test proof: the goal is to avoid materializing pruned nested-loop-join payload columns before join-conjunct evaluation. The implementation does not fully preserve column invariants, and no regression/unit test proves the new lazy path.
- Scope: the actual GitHub PR is focused on nested-loop-join materialization and the FE/BE thrift plumbing.
- Concurrency: no new shared concurrent state issue found in the actual PR diff.
- Lifecycle/static initialization: no special lifecycle or cross-TU static initialization issue found.
- Configuration: no config item added.
- Compatibility: the new thrift field is optional and BE has a legacy fallback when absent; no mixed-version issue found there.
- Parallel paths: the change handles FE translation plus BE execution for nested-loop join, but the BE mark-join null append path regresses from the base implementation.
- Special checks: no new special conditional check issue found.
- Test coverage/results: insufficient; this kernel execution change needs tests for projection-pruned inner/cross nested-loop joins, filtered join conjuncts, zero materialized slots, and nullable/outer/mark-adjacent paths.
- Observability: no new observability requirement identified.
- Transaction/persistence/data writes: not applicable.
- FE/BE variables:
materialized_slot_idsis passed through FE/thrift/BE, but the BE output column handling has a size mismatch bug. - Performance: intended optimization is plausible, but correctness must be fixed and covered first.
- User focus: no additional user-provided review focus was supplied.
| } | ||
| if (_has_materialized_slot_ids) { | ||
| for (const auto slot_id : _materialized_slot_ids) { | ||
| const int column_id = intermediate_row_desc().get_column_id(slot_id); |
There was a problem hiding this comment.
This regresses the base behavior from resizing by _probe_side_process_count to resizing by 1. The nested nullable column above inserts _probe_side_process_count probe rows, so when more than one row is appended the ColumnNullable nested column and null map have different sizes. That can crash or corrupt later block processing/serialization. Please keep the null map size aligned with the inserted range.
| const int column_id = intermediate_row_desc().get_column_id(slot_id); | |
| .resize_fill(origin_sz + _probe_side_process_count, 0); |
| } | ||
|
|
||
| if (!_matched_rows_done && !_need_more_input_data) { | ||
| if (use_generate_block_base_build()) { |
There was a problem hiding this comment.
This enables a new execution path that bypasses the existing full materialization plus _do_filtering_and_update_visited_flags() flow for inner/cross nested-loop joins, but the PR does not add any BE unit or regression coverage for it. This is a kernel execution correctness change; please add tests covering at least projection-pruned joins with non-equi join conjuncts, all-slots-pruned cases such as COUNT(*), and nullable columns so the lazy placeholder/materialization logic is exercised.
|
PR approved by at least one committer and no changes requested. |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Nested loop join materialized full intermediate payload columns before evaluating non-equi join conjuncts, making COUNT and projection-pruned joins pay unnecessary copy/filter cost.
### Release note
Improve nested loop join execution by delaying payload materialization for projection-pruned inner/cross joins.
### Check List (For Author)
- Test: Build and style checks
- ./generated-source.sh
- build-support/clang-format.sh be/src/exec/operator/nested_loop_join_probe_operator.cpp be/src/exec/operator/nested_loop_join_probe_operator.h && build-support/check-format.sh
- cd fe && mvn checkstyle:check -pl fe-core -DskipTests
- ./build.sh --fe -j8
- ./build.sh --be -j8
- build-support/run-clang-tidy.sh (blocked by pre-existing readability-function-size/readability-function-cognitive-complexity diagnostics in nested_loop_join_probe_operator.cpp and jni-util.h static_assert diagnostics)
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oins
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Extend nested loop join lazy materialization beyond inner/cross joins by tracking probe-side match state and finalizing left outer/semi/anti rows without eagerly materializing unused payload columns.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter='*NLJ*' -j8
- build-support/clang-format.sh
- build-support/check-format.sh
- git diff --check
- build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (blocked by existing be/src/util/jni-util.h static_assert diagnostics after changed-line warnings were addressed)
- Behavior changed: Yes. LEFT OUTER/LEFT SEMI/LEFT ANTI nested loop joins can use the lazy materialization path when the existing lazy-materialization gates are satisfied.
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6933d37 to
25a2d28
Compare
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found one additional correctness issue in the new lazy nested-loop-join path. Existing review threads already cover the missing tests, nullable null-map sizing, IDL comment clarity, BE guard clarity, and the FE map allocation, so I did not duplicate those.
Critical checkpoint conclusions:
- Goal/test: The PR aims to reduce nested-loop-join materialization by passing required slot IDs from FE and lazily materializing in BE. The approach is plausible, but the current left-side finalization state machine can drop output rows; existing coverage also does not prove this path.
- Scope/focus: The change is focused on NLJ materialization, but it introduces a large alternate execution path with separate row-advancement semantics.
- Concurrency/lifecycle: No new shared-state concurrency or special lifecycle issue found beyond the existing operator local-state flow.
- Config/compatibility: No new config. Thrift field is optional, so old FE keeps legacy behavior; mixed-version fallback appears intended.
- Parallel paths: Hash join has separate pruning logic; this PR intentionally handles NLJ. The lazy NLJ path must still preserve legacy outer/semi/anti finalization semantics.
- Conditional checks: The lazy enablement conditions are already covered by an existing comment; no duplicate raised.
- Test coverage: Existing comments already request coverage. The issue below should be covered by a regression such as LEFT OUTER/LEFT ANTI nested-loop join with an empty build side and enough probe rows to cross a batch boundary.
- Observability/performance: No additional blocking observability issue found.
- Transaction/persistence/data-write: Not applicable.
- FE/BE variable passing: The new thrift field is passed on the Nereids NLJ path; no additional passing gap found.
User focus: No additional user-provided review focus was specified.
| return Status::OK(); | ||
| } | ||
| if (_join_block.rows() >= state->batch_size()) { | ||
| return Status::OK(); |
There was a problem hiding this comment.
Returning OK here when the output batch is already full lets _advance_lazy_probe_row() continue to _reset_with_next_probe_row() and move _probe_block_pos forward without emitting the required finalized row. A concrete case is LEFT OUTER JOIN or LEFT ANTI JOIN with an empty build side: _current_build_pos == build_blocks.size() remains true, so _generate_lazy_block_base_probe() repeatedly finalizes probe rows; after one append fills state->batch_size(), the next probe row hits this early return, then line 342 advances past it, dropping the null/default output row. Please make finalization report that the current probe row was not consumed, or otherwise stop before advancing when the row cannot be appended.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Remove the redundant nested loop join old-version flag from the lazy gate and allow lazy materialization when post-join conjuncts are present, relying on FE materialized slot ids to cover post-join filter inputs.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter='*NLJ*' -j8
- build-support/clang-format.sh
- build-support/check-format.sh
- git diff --check
- build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (blocked by existing be/src/util/jni-util.h static_assert diagnostics)
- Behavior changed: Yes. NLJ lazy materialization can now remain enabled when post-join conjuncts exist and required slots are provided by FE.
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Remove fallback-style descriptor handling from the nested loop join probe operator now that nested-loop plan descriptors are expected to exist, and use explicit assertions for the invariant.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter='*NLJ*' -j8
- build-support/clang-format.sh
- build-support/check-format.sh
- git diff --check
- build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (blocked by existing be/src/util/jni-util.h static_assert diagnostics)
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Introduce lazy output-policy helpers for nested loop join and support build-side visited tracking/finalization so RIGHT and FULL nested loop joins can use lazy materialization under the existing gates.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter='*NLJ*' -j8
- build-support/clang-format.sh
- build-support/check-format.sh
- git diff --check
- build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (blocked by existing be/src/util/jni-util.h static_assert diagnostics)
- Behavior changed: Yes. RIGHT OUTER/RIGHT SEMI/RIGHT ANTI/FULL OUTER nested loop joins can use lazy materialization when the existing lazy-materialization gates are satisfied.
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Extend nested loop join lazy materialization to additional finalization cases. NULL-aware left anti join can now use probe-side lazy finalization, simple mark left semi/anti joins without mark conjuncts can lazily produce mark output, and build-side finalization now keeps a row cursor so RIGHT/FULL batches do not overrun batch size or duplicate rows when resuming inside a build block.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Manual test
- Unit Test: ./run-be-ut.sh --run --filter='*NLJ*' -j8
- Manual test: build-support/clang-format.sh; build-support/check-format.sh; git diff --check
- Manual test: build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN was run but blocked by existing be/src/util/jni-util.h static_assert(false) diagnostics
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Complete lazy materialization support for left semi/anti mark nested loop joins. The lazy path now keeps tri-state mark state for each probe row, evaluates NLJ mark join conjuncts in the lazy eval block, materializes nullable mark output correctly, and ensures Nereids always sends mark join conjuncts for nested loop join plans.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Manual test
- Unit Test: ./run-be-ut.sh --run --filter='*NLJ*' -j8
- Manual test: build-support/clang-format.sh; build-support/check-format.sh; git diff --check
- Manual test: cd fe && mvn checkstyle:check -pl fe-core -q
- Manual test: build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN was run but blocked by existing be/src/util/jni-util.h static_assert(false) diagnostics
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add a session variable to control Nested Loop Join lazy materialization. When enable_nested_loop_join_lazy_materialization is false, Nereids does not send materialized_slot_ids to BE, so BE keeps the existing legacy materialization path.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Manual test: git diff --check
- Manual test: cd fe && mvn checkstyle:check -pl fe-core -q
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Remove the session variable switch for Nested Loop Join lazy materialization. The switch only controlled FE materialized_slot_ids transmission and could not fully isolate the branch's BE lazy materialization changes, so keeping it would be misleading.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Manual test: git diff --check --cached
- Manual test: cd fe && mvn checkstyle:check -pl fe-core -q
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found one additional blocking compatibility issue. The lazy nested-loop join optimization still has existing unresolved correctness/test-coverage threads, and this PR also removes the descriptor fallback needed for old FE plans during rolling upgrade.
Critical checkpoint conclusions:
- Goal/test: The goal is to lazily materialize nested-loop join payload slots. The implementation is not yet proven safe because existing correctness bugs and missing coverage remain, and this review found a separate rolling-upgrade failure.
- Scope/clarity: The change is broad for a hot execution path and includes FE/BE/thrift protocol behavior; it is not a small local refactor.
- Concurrency: No new shared-thread synchronization primitive was introduced in the reviewed diff; existing build/probe shared-state usage appears to follow the current operator model.
- Lifecycle/static initialization: No new static initialization-order issue found.
- Configuration: No new config item.
- Compatibility: Blocking issue found: old FE plans without
nested_loop_join_nodeno longer have a safe row-descriptor fallback in BE. - Parallel code paths: Hash join has similar materialized-slot pruning concepts; nested-loop join differs enough that the new BE path needs its own complete coverage.
- Special conditions: Lazy enablement has several non-obvious conditions; existing review threads already cover clarity/comment concerns.
- Test coverage: Existing thread already requests regression/unit coverage for the new lazy path; I did not find additional tests in this diff.
- Test results: Not verified locally in this review.
- Observability: No new critical observability gap beyond the correctness issues.
- Transaction/persistence/data writes: Not applicable.
- FE-BE variable passing: New thrift field is passed for Nereids nested-loop joins, but compatibility with old FE is broken as noted inline.
- Performance: Existing threads cover the main performance/extra-allocation concern; no additional blocking performance issue found.
User focus: no additional user-provided review focus was supplied.
| @@ -232,7 +269,8 @@ class NestedLoopJoinProbeOperatorX final | |||
| Status push(RuntimeState* state, Block* input_block, bool eos) const override; | |||
There was a problem hiding this comment.
This removes the old-FE fallback for nested-loop/cross join plans. When a new BE receives a plan from an old FE, tnode.__isset.nested_loop_join_node is false, so JoinProbeOperatorX leaves _intermediate_row_desc and _output_row_desc unset (see the legacy branch in join_probe_operator.cpp). Before this PR _old_version_flag made intermediate_row_desc() and row_desc() return _row_descriptor; now the same rolling-upgrade plan hits this DORIS_CHECK (or the _output_row_desc check below) and the query fails/crashes. Please preserve the legacy descriptor fallback or initialize these descriptors for the old-plan case before removing _old_version_flag.
|
run buildall |
|
run buildall |
|
/review |
There was a problem hiding this comment.
Summary:
I reviewed the PR end-to-end with the Doris code-review checklist. The change adds lazy materialization for nested-loop joins, wires FE slot-id transmission, updates thrift, and adds a regression suite. There is still one additional determinism issue in the new FE serialization path, and the existing inline threads already contain several blocking correctness/compatibility issues that should remain addressed before merge.
Critical checkpoint conclusions:
- Goal/test coverage: The goal is clear and there is now a regression suite covering several lazy-materialization join variants, but existing review context still identifies missing/insufficient coverage for projection-pruned/non-equi/all-slots-pruned/nullable paths and correctness bugs in lazy finalization.
- Scope/focus: The PR is mostly focused on nested-loop join lazy materialization, but it introduces a broad new BE execution path across many join variants.
- Concurrency: No new threads, dependencies, or shared mutable cross-task state were introduced in this PR path beyond existing shared build/probe state; no new lock-order issue found.
- Lifecycle/static initialization: No new static/global initializer dependency found.
- Configuration: No new config item.
- Compatibility: Existing review thread already flags a rolling-upgrade old-FE/new-BE descriptor fallback regression; I did not duplicate that inline.
- Parallel code paths: Hash join already has materialized slot pruning semantics; this PR adds the nested-loop path. I found a new deterministic serialization issue in the nested-loop path.
- Special conditional checks: Existing review threads already flag non-obvious descriptor checks and lazy finalization behavior.
- Test result correctness: I did not run the regression suite in this runner; review is static.
- Observability: No new critical observability gap found beyond plan/explain determinism.
- Transaction/persistence/data writes: Not applicable.
- FE-BE variables: New materialized_slot_ids is passed FE to BE, but its order is nondeterministic because FE stores it in a HashSet before serialization.
- Performance: Lazy materialization reduces avoidable materialization; no additional hot-path issue found beyond existing comments.
User focus: No additional user-provided focus was present.
|
run buildall |
This pull request introduces support for lazy materialization of slots (columns) in the nested loop join operator. The main goal is to improve performance by only materializing the necessary columns after join conjunct evaluation, rather than materializing all columns up front. The changes span both the frontend (FE) and backend (BE) code, including the planner, execution engine, and Thrift interface.
Key changes include:
Backend (BE) execution enhancements
NestedLoopJoinProbeOperatorXandNestedLoopJoinProbeLocalStateto support lazy materialization, including tracking materialized slot IDs, managing evaluation columns, and handling mark join logic. [1] [2] [3]Frontend (FE) planner and translator changes
PhysicalPlanTranslatorto track which slots need to be materialized, propagate this information to the plan node, and set up the necessary mappings between expressions and slot IDs. [1] [2] [3]Plan node and explain output
NestedLoopJoinNodeto track materialized slot IDs, expose them through getters/setters, and include them in the explain output for better debugging and observability. [1] [2] [3]Thrift interface update
materialized_slot_idsfield toTNestedLoopJoinNodein the Thrift definition to communicate the set of slots to be materialized from FE to BE.General code improvements
These changes collectively enable more efficient execution of nested loop joins by deferring column materialization until after join condition evaluation, reducing unnecessary computation and memory usage.