Skip to content

opt for StringHashTable's for_each#60941

Open
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:dev_0302
Open

opt for StringHashTable's for_each#60941
BiteTheDDDDt wants to merge 1 commit intoapache:masterfrom
BiteTheDDDDt:dev_0302

Conversation

@BiteTheDDDDt
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings March 2, 2026 02:57
@Thearas
Copy link
Contributor

Thearas commented Mar 2, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces batch-oriented helpers to speed up StringHashMap operations by grouping rows by StringHashTable sub-map (m0/m1/m2/m3/m4/ms) and then processing each sub-map in a tight loop, avoiding per-row dispatch branching.

Changes:

  • Expose StringHashTable sub-maps via visit_submaps() (and direct accessors) to enable sub-map batch processing.
  • Add StringKeySubTableGroups plus new batch helpers (lazy_emplace_batch, lazy_emplace_batch_void, find_batch) to process StringHashMap rows grouped by sub-table.
  • Switch several aggregation pipeline operators to use the new batch helpers for emplace/find.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
be/src/vec/common/hash_table/string_hash_table.h Adds visit_submaps() and sub-map accessors to iterate over StringHashTable sub-tables.
be/src/vec/common/hash_table/hash_map_context.h Adds string sub-table grouping and new batch emplace/find helpers; adjusts prefetch behavior.
be/src/pipeline/rec_cte_shared_state.h Replaces per-row emplace loop with lazy_emplace_batch_void.
be/src/pipeline/exec/streaming_aggregation_operator.cpp Replaces per-row emplace loops with lazy_emplace_batch (and limit path).
be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp Replaces per-row emplace loop with lazy_emplace_batch_void.
be/src/pipeline/exec/aggregation_source_operator.cpp Replaces per-row emplace loop with lazy_emplace_batch.
be/src/pipeline/exec/aggregation_sink_operator.cpp Replaces per-row emplace loops with lazy_emplace_batch and per-row find loop with find_batch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

1 similar comment
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.13% (18708/36589)
Line Coverage 35.06% (177079/505105)
Region Coverage 31.00% (136994/441930)
Branch Coverage 32.23% (59636/185017)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.10% (18701/36594)
Line Coverage 35.04% (176967/505030)
Region Coverage 30.98% (136914/441875)
Branch Coverage 32.21% (59591/184982)

2 similar comments
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.10% (18701/36594)
Line Coverage 35.04% (176967/505030)
Region Coverage 30.98% (136914/441875)
Branch Coverage 32.21% (59591/184982)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.10% (18701/36594)
Line Coverage 35.04% (176967/505030)
Region Coverage 30.98% (136914/441875)
Branch Coverage 32.21% (59591/184982)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.12% (18710/36599)
Line Coverage 35.03% (176950/505176)
Region Coverage 30.96% (136825/441969)
Branch Coverage 32.22% (59623/185041)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.12% (18710/36599)
Line Coverage 35.03% (176950/505176)
Region Coverage 30.96% (136825/441969)
Branch Coverage 32.22% (59623/185041)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.12% (18710/36601)
Line Coverage 35.03% (176951/505194)
Region Coverage 30.96% (136826/441974)
Branch Coverage 32.22% (59623/185043)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.16% (20189/36601)
Line Coverage 38.47% (194340/505194)
Region Coverage 34.42% (152149/441974)
Branch Coverage 35.23% (65187/185043)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 56.91% (20828/36601)
Line Coverage 40.01% (202130/505194)
Region Coverage 36.37% (160739/441974)
Branch Coverage 37.09% (68640/185043)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 36.73% (101/275) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 56.94% (20840/36601)
Line Coverage 40.04% (202295/505194)
Region Coverage 36.39% (160845/441974)
Branch Coverage 37.12% (68697/185043)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants