[Improvement](agg) opt for StringHashTable's for_each#60941
[Improvement](agg) opt for StringHashTable's for_each#60941BiteTheDDDDt wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR introduces batch-oriented helpers to speed up StringHashMap operations by grouping rows by StringHashTable sub-map (m0/m1/m2/m3/m4/ms) and then processing each sub-map in a tight loop, avoiding per-row dispatch branching.
Changes:
- Expose StringHashTable sub-maps via
visit_submaps()(and direct accessors) to enable sub-map batch processing. - Add
StringKeySubTableGroupsplus new batch helpers (lazy_emplace_batch,lazy_emplace_batch_void,find_batch) to process StringHashMap rows grouped by sub-table. - Switch several aggregation pipeline operators to use the new batch helpers for emplace/find.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| be/src/vec/common/hash_table/string_hash_table.h | Adds visit_submaps() and sub-map accessors to iterate over StringHashTable sub-tables. |
| be/src/vec/common/hash_table/hash_map_context.h | Adds string sub-table grouping and new batch emplace/find helpers; adjusts prefetch behavior. |
| be/src/pipeline/rec_cte_shared_state.h | Replaces per-row emplace loop with lazy_emplace_batch_void. |
| be/src/pipeline/exec/streaming_aggregation_operator.cpp | Replaces per-row emplace loops with lazy_emplace_batch (and limit path). |
| be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp | Replaces per-row emplace loop with lazy_emplace_batch_void. |
| be/src/pipeline/exec/aggregation_source_operator.cpp | Replaces per-row emplace loop with lazy_emplace_batch. |
| be/src/pipeline/exec/aggregation_sink_operator.cpp | Replaces per-row emplace loops with lazy_emplace_batch and per-row find loop with find_batch. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
run buildall |
1 similar comment
|
run buildall |
update update
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
2 similar comments
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
2 similar comments
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
|
/review |
There was a problem hiding this comment.
Code Review Summary: StringHashTable Batch Operations Optimization
This PR optimizes StringHashTable operations in the aggregation path by pre-grouping row indices by sub-table (based on string key length), then processing each sub-table group in a batch with targeted prefetch. This avoids the per-row dispatch() branching and achieves better cache locality.
Critical Checkpoint Conclusions
1. Goal and correctness: The goal is to optimize StringHashTable's emplace/find hot path for aggregation. The get_group() classification logic correctly mirrors the existing dispatch() logic (including the critical trailing-zero-byte check). The nullable handling correctly replicates the original HashMethodSingleLowNullableColumn::lazy_emplace_key semantics (void vs non-void creator_for_null_key signatures). The convert_key_for_submap correctly uses the same to_string_key<T>() conversions. Appears correct.
2. Modification scope: The change is well-scoped to aggregation operators (sink, source, streaming, distinct streaming, RecCTE). Join operators use a completely different hash table mechanism and are correctly excluded.
3. Concurrency: No new concurrency concerns. The batch operations are called from the same single-threaded per-task execution context as the original per-row loops.
4. Lifecycle management: No special lifecycle concerns. StringKeySubTableGroups is a member of MethodStringNoCache and follows its lifetime.
5. Configuration items: None added.
6. Incompatible changes: None - this is a pure internal optimization with no format/protocol changes.
7. Parallel code paths: The non-StringHashMap fallback path in each batch helper correctly delegates to the original per-row loop with standard prefetch. Partition sort sink (partition_sort_sink_operator.cpp) also uses StringHashMap but iterates in reverse order with early exit, making it architecturally incompatible with this batch optimization - this is acceptable.
8. Test coverage: The PR description has no test checkboxes marked. No new tests are added. While existing aggregation regression tests should cover functional correctness, the PR would benefit from explicit testing with nullable string keys, empty strings, strings with trailing zero bytes, and mixed-length strings to validate the sub-table grouping. Needs attention.
9. Observability: No new observability needed - the existing _hash_table_emplace_timer and _hash_table_input_counter still cover the batch operations.
10. Other issues: See inline comments for specific code concerns.
Issues Found
- (Medium)
std::forwardused inside loops - while safe in current call patterns, this is a latent footgun. - (Low) Dead code:
get_submap_m0()throughget_submap_ms()accessors are never used. - (Medium) No tests added or marked in checklist despite this being a non-trivial optimization that changes the iteration order of hash table operations.
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
This pull request refactors and optimizes the batch operations for hash table insertion and lookup in the aggregation and distinct aggregation pipelines. The main improvement is the introduction of more efficient batch-processing methods for hash tables, especially for string-keyed tables, which now benefit from better cache locality through sub-table grouping. Several aggregation and distinct aggregation operators are updated to use these new batch methods, resulting in cleaner and more efficient code.
Batch hash table operations and code modernization:
Replaced per-row hash table insertion and lookup loops in aggregation and streaming aggregation operators (
aggregation_sink_operator.cpp,aggregation_source_operator.cpp,streaming_aggregation_operator.cpp,distinct_streaming_aggregation_operator.cpp, andrec_cte_shared_state.h) with new batch methods:vectorized::lazy_emplace_batch,vectorized::lazy_emplace_batch_void, andvectorized::find_batch, improving performance and code clarity. [1] [2] [3] [4] [5] [6] [7] [8]Updated the hash table method interface in
hash_map_context.hby removing unnecessaryALWAYS_INLINEannotations and simplifying theprefetch,find, andlazy_emplacemethods, making the code easier to maintain.String-keyed hash table optimizations:
Introduced the
StringKeySubTableGroupsstruct inhash_map_context.h, which groups row indices by string length for string-keyed hash tables. This enables batch processing by sub-table, improving cache locality and performance. [1] [2]Integrated sub-table grouping into the string hash table initialization for aggregation (not join), so that batch operations can efficiently process each sub-table group.
Minor code cleanups:
These changes collectively improve the efficiency and maintainability of the aggregation pipeline, especially for workloads involving string keys.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)