Skip to content

Comments

[fix](search) Upgrade query type for variant subcolumns with analyzer-based indexes#60782

Merged
airborne12 merged 2 commits intoapache:masterfrom
airborne12:refact-search-dsl-parser
Feb 18, 2026
Merged

[fix](search) Upgrade query type for variant subcolumns with analyzer-based indexes#60782
airborne12 merged 2 commits intoapache:masterfrom
airborne12:refact-search-dsl-parser

Conversation

@airborne12
Copy link
Member

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #60654

Problem Summary:

Follow-up fix for #60654 (SearchDslParser refactoring).

When FE resolves a variant subcolumn field pattern to a specific analyzer-based index and sends its index_properties via TSearchFieldBinding, the BE FieldReaderResolver was using EQUAL_QUERY for TERM clauses. This caused select_best_reader to pick the STRING_TYPE reader (untokenized index directory) instead of the FULLTEXT reader, so tokenized search terms would never match.

Root cause: For variant subcolumns with analyzer-based indexes, EQUAL_QUERY opens the wrong (untokenized) index directory. The query type needs to be upgraded to MATCH_ANY_QUERY so select_best_reader picks the correct FULLTEXT reader.

Fix: In FieldReaderResolver::resolve(), when the field is a variant subcolumn and the FE-provided index_properties indicate an analyzer-based index (should_analyzer() returns true), automatically upgrade EQUAL_QUERY to MATCH_ANY_QUERY before calling select_best_reader. Also reuse the fb_it iterator to avoid a redundant map lookup.

Release note

Fix variant subcolumn search queries to correctly select the FULLTEXT inverted index reader when an analyzer-based index is configured, instead of incorrectly using the untokenized STRING_TYPE reader.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • Yes. Variant subcolumn TERM queries now correctly use the FULLTEXT reader when the index has an analyzer configured, matching the expected ES-compatible behavior.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…-based indexes

When FE resolves a variant subcolumn field pattern to an analyzer-based
index, the query type must be upgraded from EQUAL_QUERY to MATCH_ANY_QUERY
so that select_best_reader picks the FULLTEXT reader instead of STRING_TYPE.
Without this, TERM clauses from lucene-mode DSL would open the wrong
(untokenized) index directory and tokenized search terms would never match.
@Thearas
Copy link
Contributor

Thearas commented Feb 18, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28816 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e2215f5c6052d2caf3a5e16dfdf4539b52464ef7, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17639	4443	4249	4249
q2	q3	10678	772	522	522
q4	4681	356	263	263
q5	7553	1185	1034	1034
q6	175	172	149	149
q7	769	832	681	681
q8	9305	1455	1346	1346
q9	4754	4745	4658	4658
q10	6823	1866	1624	1624
q11	455	266	243	243
q12	738	566	461	461
q13	17788	4220	3439	3439
q14	222	231	218	218
q15	979	784	783	783
q16	757	715	681	681
q17	718	887	431	431
q18	6302	5406	5356	5356
q19	1116	987	638	638
q20	513	484	389	389
q21	4407	1888	1406	1406
q22	340	285	245	245
Total cold run time: 96712 ms
Total hot run time: 28816 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4383	4310	4324	4310
q2	q3	1754	2162	1725	1725
q4	839	1142	757	757
q5	4025	4303	4348	4303
q6	178	173	142	142
q7	1714	1603	1479	1479
q8	2405	2667	2490	2490
q9	7394	7453	7508	7453
q10	2744	2898	2424	2424
q11	515	429	419	419
q12	492	587	446	446
q13	4023	4474	3595	3595
q14	291	291	269	269
q15	884	826	803	803
q16	741	793	726	726
q17	1196	1565	1353	1353
q18	7209	6711	6510	6510
q19	868	865	863	863
q20	2049	2179	2020	2020
q21	3903	3435	3476	3435
q22	473	462	422	422
Total cold run time: 48080 ms
Total hot run time: 45944 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184274 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e2215f5c6052d2caf3a5e16dfdf4539b52464ef7, data reload: false

query5	4936	638	487	487
query6	338	232	208	208
query7	4214	466	272	272
query8	349	248	238	238
query9	8754	2758	2743	2743
query10	540	378	338	338
query11	16879	16762	16565	16565
query12	191	128	129	128
query13	1280	445	359	359
query14	6582	3250	3052	3052
query14_1	2874	2809	2776	2776
query15	199	192	183	183
query16	995	450	450	450
query17	1036	691	588	588
query18	2661	434	335	335
query19	202	204	177	177
query20	140	128	127	127
query21	219	148	118	118
query22	5130	5519	5430	5430
query23	18036	17141	17134	17134
query23_1	17296	17030	16767	16767
query24	7240	1615	1229	1229
query24_1	1213	1232	1247	1232
query25	577	489	423	423
query26	1243	272	153	153
query27	2766	481	292	292
query28	4506	1876	1875	1875
query29	814	592	503	503
query30	311	235	211	211
query31	871	745	642	642
query32	84	73	71	71
query33	534	358	287	287
query34	915	909	546	546
query35	640	680	605	605
query36	1100	1098	978	978
query37	141	98	90	90
query38	2958	2884	2841	2841
query39	1013	856	853	853
query39_1	809	828	844	828
query40	233	163	143	143
query41	68	66	65	65
query42	107	104	102	102
query43	372	389	355	355
query44	
query45	201	191	185	185
query46	874	985	634	634
query47	2169	2151	2038	2038
query48	330	314	243	243
query49	656	482	393	393
query50	701	280	215	215
query51	4195	4069	4027	4027
query52	105	109	98	98
query53	298	336	290	290
query54	317	294	279	279
query55	88	85	79	79
query56	319	330	322	322
query57	1387	1376	1273	1273
query58	296	282	277	277
query59	2635	2770	2519	2519
query60	334	334	315	315
query61	152	152	149	149
query62	630	603	574	574
query63	322	276	291	276
query64	4833	1260	973	973
query65	
query66	1388	446	361	361
query67	16338	16441	16631	16441
query68	
query69	397	300	284	284
query70	943	1009	881	881
query71	340	292	285	285
query72	2759	2632	2369	2369
query73	553	551	316	316
query74	9973	9956	9771	9771
query75	2839	2759	2466	2466
query76	2319	1047	680	680
query77	365	386	322	322
query78	11240	11264	10690	10690
query79	2981	816	627	627
query80	1765	636	536	536
query81	590	293	253	253
query82	998	148	114	114
query83	344	274	243	243
query84	253	122	104	104
query85	885	472	429	429
query86	489	312	295	295
query87	3143	3107	3001	3001
query88	3614	2661	2660	2660
query89	429	372	347	347
query90	2129	176	167	167
query91	163	153	129	129
query92	84	73	70	70
query93	1803	826	516	516
query94	652	317	286	286
query95	581	337	320	320
query96	640	520	236	236
query97	2472	2523	2419	2419
query98	233	219	215	215
query99	943	1026	931	931
Total cold run time: 258197 ms
Total hot run time: 184274 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.67% (19535/37092)
Line Coverage 36.23% (182146/502746)
Region Coverage 32.59% (141438/433967)
Branch Coverage 33.61% (61274/182332)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (13/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.57% (26742/36351)
Line Coverage 56.77% (284708/501507)
Region Coverage 54.24% (237748/438352)
Branch Coverage 56.03% (102562/183036)

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 18, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@airborne12 airborne12 merged commit 9a9b3b2 into apache:master Feb 18, 2026
28 of 30 checks passed
@airborne12 airborne12 deleted the refact-search-dsl-parser branch February 18, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants