[Chore](pick) pick #60141 #59410 by BiteTheDDDDt · Pull Request #61287 · apache/doris

BiteTheDDDDt · 2026-03-12T12:29:28Z

…efault value (apache#60141) Increasing the batch_size will result in an overall performance improvement of approximately 5% in Doris (some operators become slower, while others become faster). This pull request makes adjustments to the default configuration values in the `SessionVariable` class to improve query parallelism and batch processing efficiency. The changes increase the maximum number of parallel exchange instances and the default batch size. Configuration defaults updated: * Increased the default value of `exchangeInstanceParallel` from 100 to 256, allowing for more parallel exchange instances during query execution. * Increased the default value of `batchSize` from 4064 to 8160, enabling larger batch processing in queries.

<img width="298" height="2142" alt="图片" src="https://github.com/user-attachments/assets/3be70146-c4bd-4ff7-ac96-2645f89fed14" /> This pull request refactors and optimizes the handling of null maps and key packing in hash join and hash table code, with a focus on improving SIMD (Single Instruction, Multiple Data) usage and simplifying null bitmap logic. The changes replace older byte-searching utilities with new, more efficient SIMD-based functions, update how null bitmaps are packed and processed, and streamline column null data replacement. Additionally, the logic for determining hash key types and handling fixed key serialization is improved for better correctness and performance. Key improvements and changes: * Introduced new SIMD-based functions `contain_one` and `contain_zero` in `simd/bits.h`, replacing the older `contain_byte` and related logic for checking the presence of ones or zeros in null maps, resulting in more efficient null detection. * Updated all usages of null map checks throughout the codebase to use the new `contain_one` and `contain_zero` functions, simplifying and unifying the logic for detecting nulls in columns and filters. [[1]](diffhunk://#diff-0732e01c1a3f38997ada381c43aff98286e86ca7519db5469a6e4dcdec5bce44L195-L200) [[2]](diffhunk://#diff-3110bab7d558f46b88ae1958b09ac369a92cac4bff98b280b2cf83d2d7aecbf4L117-R118) [[3]](diffhunk://#diff-3110bab7d558f46b88ae1958b09ac369a92cac4bff98b280b2cf83d2d7aecbf4L369-R371) [[4]](diffhunk://#diff-8981dd2e1f08aaa46a97aeef27bd906c64d1bb08deedc0fe1d94c1c49dc064ceL100-R100) [[5]](diffhunk://#diff-9fd61a223bcb3b7a9cb93c2d26c9364d8cce2131673fe286f22a80b09c6fd2c6L283-R283) [[6]](diffhunk://#diff-9fd61a223bcb3b7a9cb93c2d26c9364d8cce2131673fe286f22a80b09c6fd2c6L601-R605) * Refactored the logic for packing null maps into hash keys in `MethodKeysFixed`, introducing new templates and helper functions for interleaved null map packing, and replacing the old bitmap size calculation with a simplified approach. This improves both performance and maintainability. [[1]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R478-R540) [[2]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L500-R611) * Updated the logic for initializing and inserting keys, ensuring correct handling of nulls and simplifying offset calculations for key data. [[1]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R653) [[2]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L619-R692) [[3]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L645-R712) * Simplified the `replace_column_null_data` methods for vector and decimal columns by removing unnecessary null count checks and optimizing the replacement logic. [[1]](diffhunk://#diff-3fa47f544ff08bb2c8232af99312c0bbf2c58cac9da7a2b06473282b99ad5aa4L528-R530) [[2]](diffhunk://#diff-5fdf450def955da3201cc889aa870d94def054d1168f1ef3def32e8f009dc65aL526-L529) * Improved the logic for determining the hash key type in `hash_key_type.h` to handle cases where the number of data types exceeds the bit size, defaulting to serialized keys as needed. [[1]](diffhunk://#diff-4f1fb8a89cd0e13a719c3427b1ae7581b42cb7325755a3ceac4c44bdc64bd144R83-R86) [[2]](diffhunk://#diff-4f1fb8a89cd0e13a719c3427b1ae7581b42cb7325755a3ceac4c44bdc64bd144L97-R101) * Removed unused functions and updated includes to ensure all SIMD utilities are properly imported where needed. [[1]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R20-R26) [[2]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R36) [[3]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L292-L295) These changes collectively improve performance, maintainability, and correctness in hash join operations, especially in handling nullable columns and SIMD optimizations.

Thearas · 2026-03-12T12:29:35Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

BiteTheDDDDt · 2026-03-12T12:29:41Z

run buildall

Copilot

Pull request overview

Cherry-picks changes from #60141 and #59410, primarily updating SIMD-based null-map utilities/usages and adjusting a couple of execution/session defaults.

Changes:

Update multiple null-map checks to use new simd::contain_one/contain_zero helpers.
Refactor packing of fixed hash keys/null bitmaps to reduce work when columns have no nulls.
Adjust FE session defaults (exchangeInstanceParallel, batchSize) and fix/align some BE tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java	Updates default session variables for exchange parallelism and batch size.
be/test/vec/columns/column_nullable_test.cpp	Adjusts test construction of null map column initialization.
be/src/vec/functions/function_jsonb.cpp	Switches “all null” detection to `contain_zero` helper.
be/src/vec/data_types/serde/data_type_map_serde.cpp	Updates null-key detection in map->Arrow serialization.
be/src/vec/core/column_with_type_and_name.cpp	Avoids null-data replacement when null map contains no nulls.
be/src/vec/common/hash_table/hash_map_context.h	Refactors fixed-key packing and bitmap logic; adds SIMD helpers/includes.
be/src/vec/common/hash_table/hash_key_type.h	Changes fixed hash key type selection rules with new bitmap sizing logic.
be/src/vec/columns/column_vector.cpp	Changes null-data replacement implementation for vector columns.
be/src/vec/columns/column_nullable.cpp	Replaces byte-scans with `contain_one/contain_zero`.
be/src/vec/columns/column_decimal.cpp	Changes null-data replacement implementation for decimal columns.
be/src/util/simd/bits.h	Removes `contain_byte`, adds `contain_one/contain_zero` SIMD helpers.
be/src/pipeline/exec/nested_loop_join_probe_operator.h	Updates filter scan to `contain_one`.
be/src/pipeline/exec/join/process_hash_table_probe_impl.h	Updates null detection and filter checks to new SIMD helpers; adds guards before rewriting nulls.
be/src/pipeline/exec/hashjoin_build_sink.h	Replaces loop-based null detection with SIMD helper; avoids rewriting when no nulls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T12:37:10Z

fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java

@@ -1216,7 +1216,7 @@ public void checkQuerySlotCount(String slotCnt) {

    // 4096 minus 16 + 16 bytes padding that in padding pod array


The comment describing the batchSize default ("4096 minus 16 + 16...") no longer matches the new default value 8160. Update the comment to reflect the new sizing rationale/constraints, or adjust the value if the intent is still a ~4K payload.

Suggested change

// 4096 minus 16 + 16 bytes padding that in padding pod array

// 8160 + 16 + 16 = 8192

Copilot · 2026-03-12T12:37:11Z

be/src/vec/common/hash_table/hash_map_context.h

+        size_t bitmap_size = nullmap_columns.empty() ? 0 : 1;
+        if (bitmap_size) {
+            // set size to 0 at first, then use resize to call default constructor on index included from [0, row_numbers) to reset all memory
+            // only need to reset the memory used to bitmap
+            result.clear();
+        }
        result.resize(row_numbers);


When bitmap_size == 0, the code no longer clears/value-initializes result before writing key bytes. If sizeof(T) is larger than the total bytes written by pack_fixeds (common when key_byte_size < sizeof(T)), padding/unused bytes can retain stale data and corrupt hash/equality behavior. Fix by ensuring the entire result buffer is zero-initialized for all rows (e.g., unconditional clear()+resize(), or an explicit memset over row_numbers * sizeof(T) before packing).

Copilot · 2026-03-12T12:37:11Z

be/src/vec/common/hash_table/hash_map_context.h


        for (size_t j = 0; j < key_columns.size(); ++j) {
-            const char* data = key_columns[j]->get_raw_data().data;
+            const char* __restrict data = key_columns[j]->get_raw_data().data;


This introduces two correctness risks: (1) assume_mutable() can create a new column instance when the original is shared; the returned mutable is not stored back into key_columns, so the null replacement may have no lasting effect. (2) Even if it mutates in-place, you captured data before the mutation; if assume_mutable() performs a clone/reallocation, data can become stale. Safer options: avoid mutating key_columns here (restore the previous per-row conditional copy of zero/default for nulls), or store/propagate the mutable column back and re-fetch data after mutation.

Copilot · 2026-03-12T12:37:11Z

be/src/vec/common/hash_table/hash_map_context.h

+                if (has_null_column.size() && has_null_column[j]) {
+                    const auto* nullmap =
                            assert_cast<const ColumnUInt8&>(*nullmap_columns[j]).get_data().data();
-                    for (size_t i = 0; i < row_numbers; ++i) {
-                        // make sure null cell is filled by 0x0
-                        memcpy_fixed<Fixed, true>(
-                                (char*)(&result[i]) + offset,
-                                nullmap[i] ? (char*)&zero : data + i * sizeof(Fixed));
-                    }
-                } else {
-                    for (size_t i = 0; i < row_numbers; ++i) {
-                        memcpy_fixed<Fixed, true>((char*)(&result[i]) + offset,
-                                                  data + i * sizeof(Fixed));
-                    }
+                    // make sure null cell is filled by 0x0
+                    key_columns[j]->assume_mutable()->replace_column_null_data(nullmap);
+                }
+                auto* __restrict current = result_data + offset;
+                for (size_t i = 0; i < row_numbers; ++i) {
+                    memcpy_fixed<Fixed, true>(current, data);
+                    current += sizeof(T);
+                    data += sizeof(Fixed);


This introduces two correctness risks: (1) assume_mutable() can create a new column instance when the original is shared; the returned mutable is not stored back into key_columns, so the null replacement may have no lasting effect. (2) Even if it mutates in-place, you captured data before the mutation; if assume_mutable() performs a clone/reallocation, data can become stale. Safer options: avoid mutating key_columns here (restore the previous per-row conditional copy of zero/default for nulls), or store/propagate the mutable column back and re-fetch data after mutation.

Copilot · 2026-03-12T12:37:12Z

be/src/vec/common/hash_table/hash_key_type.h

 }

 inline HashKeyType get_hash_key_type_fixed(const std::vector<vectorized::DataTypePtr>& data_types) {
+    if (data_types.size() >= vectorized::BITSIZE) {


The >= vectorized::BITSIZE condition forces exactly BITSIZE keys to use serialized, even though other parts of this PR (e.g., bitmap stored in 1 byte and checks like <= BITSIZE) still appear to support up to BITSIZE keys. If BITSIZE is 8 (as implied by the 1-byte bitmap), this likely regresses the fixed-key path for 8 keys. Consider changing the threshold to > vectorized::BITSIZE (or aligning the other checks/packing logic to match this new restriction).

Suggested change

if (data_types.size() >= vectorized::BITSIZE) {

if (data_types.size() > vectorized::BITSIZE) {

Copilot · 2026-03-12T12:37:12Z

be/src/vec/common/hash_table/hash_key_type.h


-    size_t bitmap_size = has_null ? vectorized::get_bitmap_size(data_types.size()) : 0;
-    return get_hash_key_type_with_fixed(bitmap_size + key_byte_size);
+    return get_hash_key_type_with_fixed(has_null + key_byte_size);


The >= vectorized::BITSIZE condition forces exactly BITSIZE keys to use serialized, even though other parts of this PR (e.g., bitmap stored in 1 byte and checks like <= BITSIZE) still appear to support up to BITSIZE keys. If BITSIZE is 8 (as implied by the 1-byte bitmap), this likely regresses the fixed-key path for 8 keys. Consider changing the threshold to > vectorized::BITSIZE (or aligning the other checks/packing logic to match this new restriction).

Copilot · 2026-03-12T12:37:12Z

be/src/vec/columns/column_vector.cpp

 void ColumnVector<T>::replace_column_null_data(const uint8_t* __restrict null_map) {
    auto s = size();
-    size_t null_count = s - simd::count_zero_num((const int8_t*)null_map, s);
-    if (0 == null_count) {
-        return;
-    }
+    auto value = default_value();
    for (size_t i = 0; i < s; ++i) {
-        data[i] = null_map[i] ? default_value() : data[i];
+        data[i] = null_map[i] ? value : data[i];
    }


The early-exit when the null map contains no nulls was removed, so this now always scans and potentially writes the full column. If this method is invoked on many columns/blocks where nulls are rare (and not all call sites gate it with simd::contain_one), this can be a measurable regression. Consider reintroducing a fast-path check (e.g., if (!simd::contain_one(null_map, s)) return;) before the loop.

Copilot · 2026-03-12T12:37:12Z

be/src/vec/columns/column_decimal.cpp

@@ -486,10 +486,6 @@ void ColumnDecimal<T>::compare_internal(size_t rhs_row_id, const IColumn& rhs,
 template <PrimitiveType T>
 void ColumnDecimal<T>::replace_column_null_data(const uint8_t* __restrict null_map) {
    auto s = size();


Same as ColumnVector::replace_column_null_data: removing the no-nulls early return can introduce a full scan/write even when null_map contains no 1s. Consider adding a fast-path guard (e.g., if (!simd::contain_one(null_map, s)) return;) to avoid unnecessary work in hot paths.

Suggested change

auto s = size();

auto s = size();

if (!simd::contain_one(null_map, s)) {

return;

}

Copilot · 2026-03-12T12:37:13Z

be/src/vec/common/hash_table/hash_map_context.h

+template <int N>
+void pack_nullmaps_interleaved(const uint8_t* const* datas, const uint8_t* bit_offsets,
+                               size_t row_numbers, size_t stride, uint8_t* __restrict out) {


The fixed-key nullmap packing logic was substantially changed (interleaved packing + skipping columns without nulls). Add focused unit tests to cover: (1) packing correctness for multiple nullable columns (including sparse nulls), (2) behavior when some nullable columns have no nulls (should not break bit positions), and (3) ensuring packed keys are deterministic for null rows (no dependence on original underlying value bytes).

hello-stephen · 2026-03-12T16:30:25Z

BE Regression && UT Coverage Report

Increment line coverage 97.10% (134/138) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.45% (25312/35427)
Line Coverage	54.03% (266222/492694)
Region Coverage	51.60% (220105/426597)
Branch Coverage	53.12% (94880/178628)

BiteTheDDDDt added 2 commits March 12, 2026 20:24

BiteTheDDDDt requested a review from yiguolei as a code owner March 12, 2026 12:29

Copilot AI review requested due to automatic review settings March 12, 2026 12:29

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Copilot started reviewing on behalf of BiteTheDDDDt March 12, 2026 12:45 View session

yiguolei approved these changes Mar 13, 2026

View reviewed changes

yiguolei merged commit 88fccb8 into apache:branch-4.1 Mar 13, 2026
29 of 32 checks passed

		@@ -1216,7 +1216,7 @@ public void checkQuerySlotCount(String slotCnt) {

		// 4096 minus 16 + 16 bytes padding that in padding pod array

	// 4096 minus 16 + 16 bytes padding that in padding pod array
	// 8160 + 16 + 16 = 8192

	if (data_types.size() >= vectorized::BITSIZE) {
	if (data_types.size() > vectorized::BITSIZE) {

Conversation

BiteTheDDDDt commented Mar 12, 2026

Uh oh!

Thearas commented Mar 12, 2026

Uh oh!

BiteTheDDDDt commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

hello-stephen commented Mar 12, 2026

BE Regression && UT Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants