Skip to content

[Chore](pick) pick #60141 #59410#61287

Merged
yiguolei merged 2 commits intoapache:branch-4.1from
BiteTheDDDDt:cp_0312
Mar 13, 2026
Merged

[Chore](pick) pick #60141 #59410#61287
yiguolei merged 2 commits intoapache:branch-4.1from
BiteTheDDDDt:cp_0312

Conversation

@BiteTheDDDDt
Copy link
Contributor

pick #60141 #59410

…efault value (apache#60141)

Increasing the batch_size will result in an overall performance
improvement of approximately 5% in Doris (some operators become slower,
while others become faster).

This pull request makes adjustments to the default configuration values
in the `SessionVariable` class to improve query parallelism and batch
processing efficiency. The changes increase the maximum number of
parallel exchange instances and the default batch size.

Configuration defaults updated:

* Increased the default value of `exchangeInstanceParallel` from 100 to
256, allowing for more parallel exchange instances during query
execution.
* Increased the default value of `batchSize` from 4064 to 8160, enabling
larger batch processing in queries.
<img width="298" height="2142" alt="图片"
src="https://github.com/user-attachments/assets/3be70146-c4bd-4ff7-ac96-2645f89fed14"
/>

This pull request refactors and optimizes the handling of null maps and
key packing in hash join and hash table code, with a focus on improving
SIMD (Single Instruction, Multiple Data) usage and simplifying null
bitmap logic. The changes replace older byte-searching utilities with
new, more efficient SIMD-based functions, update how null bitmaps are
packed and processed, and streamline column null data replacement.
Additionally, the logic for determining hash key types and handling
fixed key serialization is improved for better correctness and
performance.

Key improvements and changes:

* Introduced new SIMD-based functions `contain_one` and `contain_zero`
in `simd/bits.h`, replacing the older `contain_byte` and related logic
for checking the presence of ones or zeros in null maps, resulting in
more efficient null detection.
* Updated all usages of null map checks throughout the codebase to use
the new `contain_one` and `contain_zero` functions, simplifying and
unifying the logic for detecting nulls in columns and filters.
[[1]](diffhunk://#diff-0732e01c1a3f38997ada381c43aff98286e86ca7519db5469a6e4dcdec5bce44L195-L200)
[[2]](diffhunk://#diff-3110bab7d558f46b88ae1958b09ac369a92cac4bff98b280b2cf83d2d7aecbf4L117-R118)
[[3]](diffhunk://#diff-3110bab7d558f46b88ae1958b09ac369a92cac4bff98b280b2cf83d2d7aecbf4L369-R371)
[[4]](diffhunk://#diff-8981dd2e1f08aaa46a97aeef27bd906c64d1bb08deedc0fe1d94c1c49dc064ceL100-R100)
[[5]](diffhunk://#diff-9fd61a223bcb3b7a9cb93c2d26c9364d8cce2131673fe286f22a80b09c6fd2c6L283-R283)
[[6]](diffhunk://#diff-9fd61a223bcb3b7a9cb93c2d26c9364d8cce2131673fe286f22a80b09c6fd2c6L601-R605)

* Refactored the logic for packing null maps into hash keys in
`MethodKeysFixed`, introducing new templates and helper functions for
interleaved null map packing, and replacing the old bitmap size
calculation with a simplified approach. This improves both performance
and maintainability.
[[1]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R478-R540)
[[2]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L500-R611)
* Updated the logic for initializing and inserting keys, ensuring
correct handling of nulls and simplifying offset calculations for key
data.
[[1]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R653)
[[2]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L619-R692)
[[3]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L645-R712)

* Simplified the `replace_column_null_data` methods for vector and
decimal columns by removing unnecessary null count checks and optimizing
the replacement logic.
[[1]](diffhunk://#diff-3fa47f544ff08bb2c8232af99312c0bbf2c58cac9da7a2b06473282b99ad5aa4L528-R530)
[[2]](diffhunk://#diff-5fdf450def955da3201cc889aa870d94def054d1168f1ef3def32e8f009dc65aL526-L529)

* Improved the logic for determining the hash key type in
`hash_key_type.h` to handle cases where the number of data types exceeds
the bit size, defaulting to serialized keys as needed.
[[1]](diffhunk://#diff-4f1fb8a89cd0e13a719c3427b1ae7581b42cb7325755a3ceac4c44bdc64bd144R83-R86)
[[2]](diffhunk://#diff-4f1fb8a89cd0e13a719c3427b1ae7581b42cb7325755a3ceac4c44bdc64bd144L97-R101)

* Removed unused functions and updated includes to ensure all SIMD
utilities are properly imported where needed.
[[1]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R20-R26)
[[2]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3R36)
[[3]](diffhunk://#diff-b8623712a5a1728bb77cc67b6ee1bbf16ef2b842044f6f6bab64c3fc5c4575f3L292-L295)

These changes collectively improve performance, maintainability, and
correctness in hash join operations, especially in handling nullable
columns and SIMD optimizations.
@BiteTheDDDDt BiteTheDDDDt requested a review from yiguolei as a code owner March 12, 2026 12:29
Copilot AI review requested due to automatic review settings March 12, 2026 12:29
@Thearas
Copy link
Contributor

Thearas commented Mar 12, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Cherry-picks changes from #60141 and #59410, primarily updating SIMD-based null-map utilities/usages and adjusting a couple of execution/session defaults.

Changes:

  • Update multiple null-map checks to use new simd::contain_one/contain_zero helpers.
  • Refactor packing of fixed hash keys/null bitmaps to reduce work when columns have no nulls.
  • Adjust FE session defaults (exchangeInstanceParallel, batchSize) and fix/align some BE tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java Updates default session variables for exchange parallelism and batch size.
be/test/vec/columns/column_nullable_test.cpp Adjusts test construction of null map column initialization.
be/src/vec/functions/function_jsonb.cpp Switches “all null” detection to contain_zero helper.
be/src/vec/data_types/serde/data_type_map_serde.cpp Updates null-key detection in map->Arrow serialization.
be/src/vec/core/column_with_type_and_name.cpp Avoids null-data replacement when null map contains no nulls.
be/src/vec/common/hash_table/hash_map_context.h Refactors fixed-key packing and bitmap logic; adds SIMD helpers/includes.
be/src/vec/common/hash_table/hash_key_type.h Changes fixed hash key type selection rules with new bitmap sizing logic.
be/src/vec/columns/column_vector.cpp Changes null-data replacement implementation for vector columns.
be/src/vec/columns/column_nullable.cpp Replaces byte-scans with contain_one/contain_zero.
be/src/vec/columns/column_decimal.cpp Changes null-data replacement implementation for decimal columns.
be/src/util/simd/bits.h Removes contain_byte, adds contain_one/contain_zero SIMD helpers.
be/src/pipeline/exec/nested_loop_join_probe_operator.h Updates filter scan to contain_one.
be/src/pipeline/exec/join/process_hash_table_probe_impl.h Updates null detection and filter checks to new SIMD helpers; adds guards before rewriting nulls.
be/src/pipeline/exec/hashjoin_build_sink.h Replaces loop-based null detection with SIMD helper; avoids rewriting when no nulls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1216,7 +1216,7 @@ public void checkQuerySlotCount(String slotCnt) {

// 4096 minus 16 + 16 bytes padding that in padding pod array
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describing the batchSize default ("4096 minus 16 + 16...") no longer matches the new default value 8160. Update the comment to reflect the new sizing rationale/constraints, or adjust the value if the intent is still a ~4K payload.

Suggested change
// 4096 minus 16 + 16 bytes padding that in padding pod array
// 8160 + 16 + 16 = 8192

Copilot uses AI. Check for mistakes.
Comment on lines +562 to 568
size_t bitmap_size = nullmap_columns.empty() ? 0 : 1;
if (bitmap_size) {
// set size to 0 at first, then use resize to call default constructor on index included from [0, row_numbers) to reset all memory
// only need to reset the memory used to bitmap
result.clear();
}
result.resize(row_numbers);
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When bitmap_size == 0, the code no longer clears/value-initializes result before writing key bytes. If sizeof(T) is larger than the total bytes written by pack_fixeds (common when key_byte_size < sizeof(T)), padding/unused bytes can retain stale data and corrupt hash/equality behavior. Fix by ensuring the entire result buffer is zero-initialized for all rows (e.g., unconditional clear()+resize(), or an explicit memset over row_numbers * sizeof(T) before packing).

Copilot uses AI. Check for mistakes.

for (size_t j = 0; j < key_columns.size(); ++j) {
const char* data = key_columns[j]->get_raw_data().data;
const char* __restrict data = key_columns[j]->get_raw_data().data;
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces two correctness risks: (1) assume_mutable() can create a new column instance when the original is shared; the returned mutable is not stored back into key_columns, so the null replacement may have no lasting effect. (2) Even if it mutates in-place, you captured data before the mutation; if assume_mutable() performs a clone/reallocation, data can become stale. Safer options: avoid mutating key_columns here (restore the previous per-row conditional copy of zero/default for nulls), or store/propagate the mutable column back and re-fetch data after mutation.

Copilot uses AI. Check for mistakes.
Comment on lines +601 to +611
if (has_null_column.size() && has_null_column[j]) {
const auto* nullmap =
assert_cast<const ColumnUInt8&>(*nullmap_columns[j]).get_data().data();
for (size_t i = 0; i < row_numbers; ++i) {
// make sure null cell is filled by 0x0
memcpy_fixed<Fixed, true>(
(char*)(&result[i]) + offset,
nullmap[i] ? (char*)&zero : data + i * sizeof(Fixed));
}
} else {
for (size_t i = 0; i < row_numbers; ++i) {
memcpy_fixed<Fixed, true>((char*)(&result[i]) + offset,
data + i * sizeof(Fixed));
}
// make sure null cell is filled by 0x0
key_columns[j]->assume_mutable()->replace_column_null_data(nullmap);
}
auto* __restrict current = result_data + offset;
for (size_t i = 0; i < row_numbers; ++i) {
memcpy_fixed<Fixed, true>(current, data);
current += sizeof(T);
data += sizeof(Fixed);
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces two correctness risks: (1) assume_mutable() can create a new column instance when the original is shared; the returned mutable is not stored back into key_columns, so the null replacement may have no lasting effect. (2) Even if it mutates in-place, you captured data before the mutation; if assume_mutable() performs a clone/reallocation, data can become stale. Safer options: avoid mutating key_columns here (restore the previous per-row conditional copy of zero/default for nulls), or store/propagate the mutable column back and re-fetch data after mutation.

Copilot uses AI. Check for mistakes.
}

inline HashKeyType get_hash_key_type_fixed(const std::vector<vectorized::DataTypePtr>& data_types) {
if (data_types.size() >= vectorized::BITSIZE) {
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The >= vectorized::BITSIZE condition forces exactly BITSIZE keys to use serialized, even though other parts of this PR (e.g., bitmap stored in 1 byte and checks like <= BITSIZE) still appear to support up to BITSIZE keys. If BITSIZE is 8 (as implied by the 1-byte bitmap), this likely regresses the fixed-key path for 8 keys. Consider changing the threshold to > vectorized::BITSIZE (or aligning the other checks/packing logic to match this new restriction).

Suggested change
if (data_types.size() >= vectorized::BITSIZE) {
if (data_types.size() > vectorized::BITSIZE) {

Copilot uses AI. Check for mistakes.

size_t bitmap_size = has_null ? vectorized::get_bitmap_size(data_types.size()) : 0;
return get_hash_key_type_with_fixed(bitmap_size + key_byte_size);
return get_hash_key_type_with_fixed(has_null + key_byte_size);
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The >= vectorized::BITSIZE condition forces exactly BITSIZE keys to use serialized, even though other parts of this PR (e.g., bitmap stored in 1 byte and checks like <= BITSIZE) still appear to support up to BITSIZE keys. If BITSIZE is 8 (as implied by the 1-byte bitmap), this likely regresses the fixed-key path for 8 keys. Consider changing the threshold to > vectorized::BITSIZE (or aligning the other checks/packing logic to match this new restriction).

Copilot uses AI. Check for mistakes.
Comment on lines 477 to 482
void ColumnVector<T>::replace_column_null_data(const uint8_t* __restrict null_map) {
auto s = size();
size_t null_count = s - simd::count_zero_num((const int8_t*)null_map, s);
if (0 == null_count) {
return;
}
auto value = default_value();
for (size_t i = 0; i < s; ++i) {
data[i] = null_map[i] ? default_value() : data[i];
data[i] = null_map[i] ? value : data[i];
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early-exit when the null map contains no nulls was removed, so this now always scans and potentially writes the full column. If this method is invoked on many columns/blocks where nulls are rare (and not all call sites gate it with simd::contain_one), this can be a measurable regression. Consider reintroducing a fast-path check (e.g., if (!simd::contain_one(null_map, s)) return;) before the loop.

Copilot uses AI. Check for mistakes.
@@ -486,10 +486,6 @@ void ColumnDecimal<T>::compare_internal(size_t rhs_row_id, const IColumn& rhs,
template <PrimitiveType T>
void ColumnDecimal<T>::replace_column_null_data(const uint8_t* __restrict null_map) {
auto s = size();
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as ColumnVector::replace_column_null_data: removing the no-nulls early return can introduce a full scan/write even when null_map contains no 1s. Consider adding a fast-path guard (e.g., if (!simd::contain_one(null_map, s)) return;) to avoid unnecessary work in hot paths.

Suggested change
auto s = size();
auto s = size();
if (!simd::contain_one(null_map, s)) {
return;
}

Copilot uses AI. Check for mistakes.
Comment on lines +478 to +480
template <int N>
void pack_nullmaps_interleaved(const uint8_t* const* datas, const uint8_t* bit_offsets,
size_t row_numbers, size_t stride, uint8_t* __restrict out) {
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed-key nullmap packing logic was substantially changed (interleaved packing + skipping columns without nulls). Add focused unit tests to cover: (1) packing correctness for multiple nullable columns (including sparse nulls), (2) behavior when some nullable columns have no nulls (should not break bit positions), and (3) ensuring packed keys are deterministic for null rows (no dependence on original underlying value bytes).

Copilot uses AI. Check for mistakes.
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 97.10% (134/138) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.45% (25312/35427)
Line Coverage 54.03% (266222/492694)
Region Coverage 51.60% (220105/426597)
Branch Coverage 53.12% (94880/178628)

@yiguolei yiguolei merged commit 88fccb8 into apache:branch-4.1 Mar 13, 2026
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants