Skip to content

[fix](search) Replace ExcludeScorer with AndNotScorer for MUST_NOT to handle NULL rows#61200

Merged
airborne12 merged 3 commits intoapache:masterfrom
airborne12:fix-search-not-null-bitmap
Mar 13, 2026
Merged

[fix](search) Replace ExcludeScorer with AndNotScorer for MUST_NOT to handle NULL rows#61200
airborne12 merged 3 commits intoapache:masterfrom
airborne12:fix-search-not-null-bitmap

Conversation

@airborne12
Copy link
Member

What problem does this PR solve?

Issue Number: close #DORIS-24681

Problem Summary:

search('NOT msg:omega') incorrectly includes NULL rows in the result set, while NOT search('msg:omega') correctly excludes them.

Root cause: OccurBooleanWeight::complex_scorer() used ExcludeScorer for MUST_NOT clause handling. ExcludeScorer does not implement has_null_bitmap() / get_null_bitmap(), inheriting the Scorer base class defaults that always return false / nullptr. This caused NULL documents to be treated as TRUE (matching) rather than NULL, violating SQL three-valued logic where NOT(NULL) = NULL.

Fix: Replace ExcludeScorer with AndNotScorer which correctly implements three-valued logic:

  • NOT(TRUE) = FALSE → excluded from results
  • NOT(FALSE) = TRUE → included in results
  • NOT(NULL) = NULL → placed in null bitmap, filtered by mask_out_null()

Also plumb binding_keys from function_search.cpp through the query builder chain (OccurBooleanQueryBuilderOccurBooleanQueryOccurBooleanWeight) so that per_occur_scorers() can resolve the correct logical field for each sub-weight, enabling proper null bitmap fetching from the NullBitmapResolver.

Release note

Fix search('NOT field:value') incorrectly including NULL rows by using null-bitmap-aware AndNotScorer instead of ExcludeScorer.

Check List (For Author)

  • Test
    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason

Manual test steps:

CREATE DATABASE IF NOT EXISTS test_search_not;
USE test_search_not;

CREATE TABLE test_null_handling (
    id INT,
    msg TEXT,
    INDEX idx_msg(msg) USING INVERTED PROPERTIES("parser"="unicode")
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO test_null_handling VALUES (1, NULL), (3, 'hello world'), (4, 'alpha beta');

-- Before fix: returns id=1,3,4 (WRONG - includes NULL row)
-- After fix: returns id=3,4 (CORRECT - excludes NULL row)
SELECT * FROM test_null_handling WHERE search('NOT msg:omega');

-- This always worked correctly (SQL-layer NOT):
SELECT * FROM test_null_handling WHERE NOT search('msg:omega');
  • Behavior changed:

    • Yes. search('NOT field:value') now correctly excludes rows where the field is NULL, matching the behavior of NOT search('field:value').
  • Does this need documentation?

    • No.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

… handle NULL rows

search('NOT msg:omega') incorrectly included NULL rows in the result set
because ExcludeScorer does not implement has_null_bitmap()/get_null_bitmap(),
inheriting the Scorer base class defaults that always return false/nullptr.
This caused NULL documents to be treated as TRUE (matching) rather than NULL.

Replace ExcludeScorer with AndNotScorer in OccurBooleanWeight::complex_scorer()
for MUST_NOT clause handling. AndNotScorer correctly implements SQL three-valued
logic: NOT(NULL) = NULL, so documents where the excluded field is NULL are placed
in the null bitmap and subsequently filtered out by mask_out_null().

Also plumb binding_keys from function_search.cpp through the query builder chain
(OccurBooleanQueryBuilder → OccurBooleanQuery → OccurBooleanWeight) so that
per_occur_scorers() can resolve the correct logical field for each sub-weight,
enabling proper null bitmap fetching from the NullBitmapResolver.
@Thearas
Copy link
Contributor

Thearas commented Mar 11, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

airborne12 and others added 2 commits March 11, 2026 14:24
… fix

Add comprehensive tests for the AndNotScorer null bitmap handling:

BE Unit Tests (intersection_scorer_test.cpp):
- 8 new test cases covering AndNotScorer null bitmap behavior
- Tests for all-null excludes, no resolver, empty excludes,
  multiple excludes with nulls, true-overrides-null priority,
  seek with null exclusion, include null bitmap inheritance,
  and all-docs-excluded scenarios

BE Unit Test Fix (occur_boolean_query_test.cpp):
- Update 12 OccurBooleanQuery constructor calls to include
  the new binding_keys parameter: (clauses, N) -> (clauses, {}, N)

Regression Test (test_search_not_null_bitmap.groovy):
- Tests search('NOT field:value') with NULL rows
- Verifies internal NOT (search DSL) matches external NOT (SQL)
- Tests: basic NOT with NULLs, all-NULL table, multi-field with
  mixed NULLs, complex boolean queries, multiple MUST_NOT clauses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…of eager AndNotScorer

Replace eager-materialization AndNotScorer with enhanced lazy ExcludeScorer
for MUST_NOT handling. ExcludeScorer now accepts a pre-collected exclude_null
bitmap (read from index at O(1) cost) and checks it during advance()/seek()
before the original lazy is_within() exclusion. This avoids iterating all
exclude docs upfront (O(N) in AndNotScorer constructor) while correctly
implementing SQL three-valued logic: NOT(NULL) = NULL.

Key changes:
- ExcludeScorer: add _exclude_null/_null_bitmap members, override
  has_null_bitmap()/get_null_bitmap(), null check in constructor/advance/seek
- OccurBooleanWeight: add build_exclude_opt() to collect null bitmaps from
  MUST_NOT scorers and union them; replace AndNotScorer with make_exclude()
- Add DCHECK for _binding_keys size validation
- Adapt unit tests from AndNotScorer to enhanced ExcludeScorer
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27692 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ac71773a4542df796d7d720854407ac99f13a535, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17683	4476	4336	4336
q2	q3	10644	799	508	508
q4	4697	370	256	256
q5	7648	1200	1010	1010
q6	180	177	145	145
q7	792	838	672	672
q8	10171	1516	1359	1359
q9	5697	4787	4794	4787
q10	6327	1959	1645	1645
q11	477	256	252	252
q12	773	581	466	466
q13	18067	2913	2222	2222
q14	227	233	217	217
q15	954	791	787	787
q16	793	728	671	671
q17	709	892	403	403
q18	5873	5340	5295	5295
q19	1182	978	629	629
q20	490	489	386	386
q21	4443	1907	1409	1409
q22	339	285	237	237
Total cold run time: 98166 ms
Total hot run time: 27692 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4451	4376	4341	4341
q2	q3	3845	4280	3760	3760
q4	860	1180	774	774
q5	4065	4299	4352	4299
q6	175	175	140	140
q7	1737	1601	1470	1470
q8	2423	2639	2570	2570
q9	8229	7460	7457	7457
q10	3787	3941	3576	3576
q11	536	446	458	446
q12	499	593	453	453
q13	2632	3162	2369	2369
q14	305	313	293	293
q15	856	819	809	809
q16	706	743	708	708
q17	1182	1485	1348	1348
q18	7168	6923	6831	6831
q19	910	920	896	896
q20	2102	2204	1989	1989
q21	4269	3563	3389	3389
q22	447	425	379	379
Total cold run time: 51184 ms
Total hot run time: 48297 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153439 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ac71773a4542df796d7d720854407ac99f13a535, data reload: false

query5	4330	662	527	527
query6	332	246	209	209
query7	4210	471	261	261
query8	341	244	247	244
query9	8707	2714	2688	2688
query10	530	390	367	367
query11	7460	5919	5569	5569
query12	188	126	127	126
query13	1272	458	351	351
query14	5690	3848	3573	3573
query14_1	2836	2784	2811	2784
query15	212	197	182	182
query16	1002	466	464	464
query17	1114	716	632	632
query18	2449	453	353	353
query19	212	212	186	186
query20	132	129	129	129
query21	225	143	129	129
query22	4898	5017	5002	5002
query23	16606	16331	15797	15797
query23_1	15998	15894	15901	15894
query24	7572	1846	1286	1286
query24_1	1273	1273	1268	1268
query25	560	533	436	436
query26	1429	276	168	168
query27	2849	483	297	297
query28	4511	1854	1832	1832
query29	826	559	472	472
query30	307	242	217	217
query31	1347	1297	1218	1218
query32	76	71	69	69
query33	490	387	275	275
query34	956	927	551	551
query35	630	669	615	615
query36	1079	1124	974	974
query37	125	90	79	79
query38	2951	2900	2884	2884
query39	1009	869	840	840
query39_1	830	820	821	820
query40	235	149	136	136
query41	62	62	57	57
query42	315	300	292	292
query43	237	246	218	218
query44	
query45	202	197	183	183
query46	879	978	608	608
query47	2152	2117	2076	2076
query48	319	325	224	224
query49	627	480	386	386
query50	677	286	220	220
query51	4156	4103	4012	4012
query52	286	289	282	282
query53	290	340	282	282
query54	294	270	261	261
query55	93	88	85	85
query56	312	312	301	301
query57	1381	1331	1251	1251
query58	315	274	266	266
query59	1325	1444	1278	1278
query60	337	342	327	327
query61	148	144	146	144
query62	607	579	531	531
query63	304	280	276	276
query64	5069	1256	995	995
query65	
query66	1457	452	352	352
query67	16346	16472	16420	16420
query68	
query69	382	301	279	279
query70	995	967	972	967
query71	337	310	300	300
query72	2847	2682	2443	2443
query73	550	553	324	324
query74	9976	9913	9791	9791
query75	2862	2788	2494	2494
query76	2311	1040	672	672
query77	364	385	310	310
query78	11179	11270	10682	10682
query79	1149	828	607	607
query80	1343	665	566	566
query81	551	292	256	256
query82	1277	157	123	123
query83	343	276	253	253
query84	257	119	105	105
query85	990	533	439	439
query86	435	330	324	324
query87	3271	3106	3013	3013
query88	3541	2651	2648	2648
query89	418	382	358	358
query90	2016	174	172	172
query91	160	158	139	139
query92	79	72	69	69
query93	927	821	504	504
query94	648	326	297	297
query95	584	402	320	320
query96	628	514	229	229
query97	2473	2516	2394	2394
query98	236	224	230	224
query99	1002	1001	913	913
Total cold run time: 234983 ms
Total hot run time: 153439 ms

@airborne12
Copy link
Member Author

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

PR Goal: Fix search('NOT field:value') incorrectly including NULL rows by enhancing ExcludeScorer with null bitmap awareness for SQL three-valued logic (NOT(NULL) = NULL).

Verdict: The core logic is correct and well-designed. The approach of pre-collecting null bitmaps from MUST_NOT scorers and plumbing them through ExcludeScorer is sound. A few minor issues noted below.

Critical Checkpoint Conclusions

  1. Goal accomplished? Yes. The fix correctly ensures that when evaluating NOT(NULL), the result is NULL (excluded from results) rather than TRUE (incorrectly included). The binding_keys plumbing ensures the correct logical field is resolved for null bitmap fetching.

  2. Modification minimal and focused? Yes. Changes are well-scoped: plumb binding_keys through the query builder chain, enhance ExcludeScorer with null bitmap support, and collect null bitmaps in build_exclude_opt.

  3. Concurrency? Not applicable — no new concurrency introduced. ExcludeScorer operates within a single thread/pipeline task.

  4. Lifecycle management? No special lifecycle concerns. roaring::Roaring is value-type and correctly moved.

  5. Configuration items? None added.

  6. Incompatible changes? No storage format or function symbol changes.

  7. Parallel code paths? The OperatorBooleanWeight path already uses AndNotScorer which correctly handles null bitmaps, so no parallel fix needed.

  8. Special conditional checks? The _exclude_null.isEmpty() guard in ExcludeScorer is a reasonable fast-path optimization. The null-bitmap-first check order in ExcludeScorer is correct for SQL three-valued logic (if ANY exclude scorer says NULL, result is NULL).

  9. Test coverage? Good. 10+ unit test cases cover: basic true/null exclusion, all-null excludes, no resolver, empty excludes, multiple excludes, overlap between true and null, seek operations, include null bitmap inheritance, all docs excluded, and backward compatibility. Regression tests verify end-to-end behavior with internal vs external NOT comparison.

  10. Observability? Not applicable for this bug fix.

  11. Transaction/persistence? Not applicable.

  12. Performance? The _exclude_null.contains() check is O(1) with Roaring bitmaps, and the _exclude_null.isEmpty() guard ensures zero overhead when no null bitmaps exist. Good.

  13. Other issues? Two minor issues noted as inline comments (DCHECK should be DORIS_CHECK; regression test uses def tableName).

_score_combiner(std::move(score_combiner)) {}
_score_combiner(std::move(score_combiner)) {
DCHECK(_binding_keys.empty() || _binding_keys.size() == _sub_weights.size())
<< "binding_keys size (" << _binding_keys.size() << ") must match sub_weights size ("
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Per AGENTS.md coding standards: "Any if check for errors must have a clearly known inevitable failure path... you may use the DORIS_CHECK macro for precondition assertions."

DCHECK disappears in RELEASE builds, leaving no invariant protection in production. Since this is a precondition assertion (binding_keys size must match sub_weights size), it should use DORIS_CHECK instead:

DORIS_CHECK(_binding_keys.empty() || _binding_keys.size() == _sub_weights.size())
        << "binding_keys size (" << _binding_keys.size() << ") must match sub_weights size ("
        << _sub_weights.size() << ") when non-empty";

Note: The subsequent _binding_keys.resize(_sub_weights.size()) makes this safe in practice (it will silently pad even if the sizes mismatch), so the DCHECK is effectively dead in RELEASE builds where size mismatches would silently produce wrong binding_key associations rather than crashing. DORIS_CHECK would catch this.

// so NULL rows are excluded from the result set.

def tableName = "search_not_null_bitmap"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Per regression test standards in AGENTS.md: "For ordinary single test tables, do not use def tableName form; instead hardcode your table name in all SQL."

This applies to tableName, allNullTable, and mixedTable — all three should be hardcoded directly in the SQL statements rather than using def variables.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.61% (19688/37425)
Line Coverage 36.17% (183733/508038)
Region Coverage 32.26% (141697/439278)
Branch Coverage 33.44% (61901/185118)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.52% (26199/36631)
Line Coverage 54.27% (274811/506371)
Region Coverage 51.55% (228524/443337)
Branch Coverage 52.91% (98218/185618)

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 13, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@airborne12 airborne12 merged commit afb36bb into apache:master Mar 13, 2026
31 of 32 checks passed
@airborne12 airborne12 deleted the fix-search-not-null-bitmap branch March 13, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants