Skip to content

[Chore](pick) pick changes from PR #61104 and PR #60941#61303

Open
BiteTheDDDDt wants to merge 5 commits intoapache:branch-4.1from
BiteTheDDDDt:cp_0313
Open

[Chore](pick) pick changes from PR #61104 and PR #60941#61303
BiteTheDDDDt wants to merge 5 commits intoapache:branch-4.1from
BiteTheDDDDt:cp_0313

Conversation

@BiteTheDDDDt
Copy link
Contributor

pick changes from PR #61104 and PR #60941

@BiteTheDDDDt BiteTheDDDDt requested a review from yiguolei as a code owner March 13, 2026 06:22
Copilot AI review requested due to automatic review settings March 13, 2026 06:22
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cherry-picks changes from #61104 and #60941 to propagate a “single backend query” hint from FE→BE and to optimize string hash-table aggregation hot paths via batched sub-table operations.

Changes:

  • Add single_backend_query to TQueryOptions and propagate it through FE planning and BE query context.
  • Adjust streaming aggregation hash-table expansion heuristics when running on a single backend.
  • Add batch emplace/find helpers for string hash tables by grouping rows per sub-table to reduce per-row dispatch overhead.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
gensrc/thrift/PaloInternalService.thrift Adds single_backend_query to query options for FE→BE propagation.
fe/fe-core/src/main/java/org/apache/doris/qe/runtime/ThriftPlansBuilder.java Sets single_backend_query into TQueryOptions based on coordinator context.
fe/fe-core/src/main/java/org/apache/doris/qe/CoordinatorContext.java Computes whether the query uses a single backend.
fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java Propagates single_backend_query into per-fragment pipeline params.
be/src/vec/common/hash_table/string_hash_table.h Exposes submaps + adds visit_submaps() for batch operations.
be/src/vec/common/hash_table/hash_map_context.h Adds row grouping + batch emplace/find helpers for string hash maps.
be/src/runtime/query_context.h Stores is_single_backend_query in BE query context.
be/src/runtime/fragment_mgr.cpp Initializes QueryContext’s is_single_backend_query before prepare().
be/src/pipeline/exec/streaming_aggregation_operator.h Adds _is_single_backend flag.
be/src/pipeline/exec/streaming_aggregation_operator.cpp Uses different reduction thresholds for single-backend queries + batch emplace.
be/src/pipeline/exec/distinct_streaming_aggregation_operator.h Adds _is_single_backend flag; fixes include formatting.
be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp Applies single-backend thresholds + batch emplace (void).
be/src/pipeline/exec/aggregation_source_operator.cpp Switches to lazy_emplace_batch() for emplace hot path.
be/src/pipeline/exec/aggregation_sink_operator.cpp Switches to lazy_emplace_batch() / find_batch() for hot paths.
be/src/clucene Updates the clucene subproject commit pointer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +115 to 133
void prefetch(size_t i) {
if (LIKELY(i + HASH_MAP_PREFETCH_DIST < hash_values.size())) {
hash_table->template prefetch<read>(keys[i + HASH_MAP_PREFETCH_DIST],
hash_values[i + HASH_MAP_PREFETCH_DIST]);
}
}

template <typename State>
ALWAYS_INLINE auto find(State& state, size_t i) {
if constexpr (!is_string_hash_map()) {
prefetch<true>(i);
}
auto find(State& state, size_t i) {
prefetch<true>(i);
return state.find_key_with_hash(*hash_table, i, keys[i], hash_values[i]);
}

template <typename State, typename F, typename FF>
ALWAYS_INLINE auto lazy_emplace(State& state, size_t i, F&& creator,
FF&& creator_for_null_key) {
if constexpr (!is_string_hash_map()) {
prefetch<false>(i);
}
auto lazy_emplace(State& state, size_t i, F&& creator, FF&& creator_for_null_key) {
prefetch<false>(i);
return state.lazy_emplace_key(*hash_table, i, keys[i], hash_values[i], creator,
creator_for_null_key);
}
Comment on lines +540 to +549
if constexpr (is_nullable) {
if (state.key_column->is_null_at(row)) {
bool has_null_key = hash_table.has_null_key_data();
hash_table.has_null_key_data() = true;
if (!has_null_key) {
std::forward<FF>(creator_for_null_key)();
}
continue;
}
}
Comment on lines +70 to +72
// Expand into L3 cache if we look like we're getting some reduction.
// At present, The L2 cache is generally 1024k or more
{1024 * 1024, 1.1},
{.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 1.1},
Comment on lines +81 to +83
// Expand into L3 cache if we look like we're getting some reduction.
// At present, The L2 cache is generally 1024k or more
{.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 5.0},
Comment on lines +59 to +61
// Expand into L3 cache if we look like we're getting some reduction.
// At present, The L2 cache is generally 1024k or more
{.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 5.0},
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.51% (31951/49530)
Region Coverage 65.35% (15991/24468)
Branch Coverage 55.90% (8505/15214)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 60.00% (3/5) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants