Skip to content

[feat](condition cache) Support condition cache for external table#60897

Open
jacktengg wants to merge 13 commits intoapache:masterfrom
jacktengg:condition-cache-cc
Open

[feat](condition cache) Support condition cache for external table#60897
jacktengg wants to merge 13 commits intoapache:masterfrom
jacktengg:condition-cache-cc

Conversation

@jacktengg
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Feb 27, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28620 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9f140c7df2398a491f977da25906a10cce20fee5, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17630	4532	4309	4309
q2	q3	10648	780	520	520
q4	4678	348	260	260
q5	7547	1194	1014	1014
q6	177	173	146	146
q7	766	834	666	666
q8	9287	1479	1350	1350
q9	4867	4715	4653	4653
q10	7155	1891	1627	1627
q11	474	252	239	239
q12	705	565	469	469
q13	17817	4230	3445	3445
q14	229	231	223	223
q15	960	801	798	798
q16	751	726	670	670
q17	698	845	443	443
q18	5980	5338	5173	5173
q19	1201	972	610	610
q20	511	482	389	389
q21	4622	1900	1375	1375
q22	339	282	241	241
Total cold run time: 97042 ms
Total hot run time: 28620 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4444	4366	4359	4359
q2	q3	1759	2169	1726	1726
q4	848	1158	758	758
q5	4049	4298	4330	4298
q6	183	177	142	142
q7	1730	1606	1504	1504
q8	2423	2669	2541	2541
q9	7395	7464	7349	7349
q10	2630	2914	2453	2453
q11	526	438	416	416
q12	505	578	446	446
q13	3928	4528	3739	3739
q14	291	307	282	282
q15	848	809	790	790
q16	731	798	720	720
q17	1167	1508	1367	1367
q18	7086	6795	6584	6584
q19	857	982	912	912
q20	2097	2129	2048	2048
q21	4013	3639	3429	3429
q22	456	456	392	392
Total cold run time: 47966 ms
Total hot run time: 46255 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184197 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9f140c7df2398a491f977da25906a10cce20fee5, data reload: false

query5	5159	638	530	530
query6	338	231	209	209
query7	4233	481	264	264
query8	329	240	227	227
query9	8726	2744	2730	2730
query10	560	388	315	315
query11	16928	17594	17394	17394
query12	202	132	131	131
query13	1430	498	359	359
query14	7049	3234	3018	3018
query14_1	2881	2828	3036	2828
query15	205	200	186	186
query16	1152	558	460	460
query17	2154	720	612	612
query18	2857	513	393	393
query19	207	215	193	193
query20	137	124	126	124
query21	214	137	119	119
query22	5860	5635	5110	5110
query23	17167	16772	16582	16582
query23_1	16696	16629	16626	16626
query24	7071	1587	1198	1198
query24_1	1210	1257	1226	1226
query25	537	451	395	395
query26	1226	261	144	144
query27	2795	473	281	281
query28	4513	1860	1857	1857
query29	787	545	465	465
query30	308	246	205	205
query31	854	708	654	654
query32	81	68	68	68
query33	508	340	275	275
query34	898	903	588	588
query35	618	670	621	621
query36	1092	1101	1020	1020
query37	134	95	83	83
query38	2914	2889	2832	2832
query39	896	867	840	840
query39_1	832	830	834	830
query40	226	155	133	133
query41	62	61	58	58
query42	109	99	103	99
query43	370	381	341	341
query44	
query45	196	191	180	180
query46	877	995	599	599
query47	2113	2143	2029	2029
query48	305	303	225	225
query49	624	463	372	372
query50	677	283	213	213
query51	4152	4049	4077	4049
query52	107	110	97	97
query53	287	333	285	285
query54	299	258	253	253
query55	88	88	90	88
query56	320	306	305	305
query57	1358	1344	1295	1295
query58	303	272	299	272
query59	2553	2770	2606	2606
query60	331	336	318	318
query61	150	145	148	145
query62	611	601	533	533
query63	306	277	274	274
query64	4841	1351	1081	1081
query65	
query66	1389	467	379	379
query67	16235	16471	16453	16453
query68	
query69	401	330	292	292
query70	991	928	959	928
query71	343	310	301	301
query72	3012	2814	2530	2530
query73	563	542	314	314
query74	9939	9886	9749	9749
query75	2824	2750	2454	2454
query76	2302	1030	664	664
query77	356	376	303	303
query78	11167	11319	10690	10690
query79	3020	804	598	598
query80	1763	614	535	535
query81	590	288	243	243
query82	992	150	115	115
query83	340	257	246	246
query84	251	113	99	99
query85	917	473	422	422
query86	519	322	296	296
query87	3105	3061	3088	3061
query88	3507	2640	2632	2632
query89	423	366	337	337
query90	1936	171	167	167
query91	155	146	132	132
query92	79	76	69	69
query93	1425	814	519	519
query94	641	303	299	299
query95	577	388	310	310
query96	630	517	230	230
query97	2463	2531	2411	2411
query98	235	213	209	209
query99	996	970	911	911
Total cold run time: 258686 ms
Total hot run time: 184197 ms

@jacktengg
Copy link
Contributor Author

run p0

@jacktengg
Copy link
Contributor Author

run external

@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28745 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2b882d226e94fc72abb8bf9704a41541423407c4, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17635	4548	4354	4354
q2	q3	10647	792	521	521
q4	4680	352	257	257
q5	7536	1204	1025	1025
q6	174	177	148	148
q7	783	858	676	676
q8	9308	1448	1307	1307
q9	4863	4727	4712	4712
q10	6817	1876	1622	1622
q11	454	256	246	246
q12	695	563	465	465
q13	17772	4203	3443	3443
q14	228	228	206	206
q15	967	793	783	783
q16	754	716	671	671
q17	710	852	411	411
q18	6017	5397	5263	5263
q19	1113	985	613	613
q20	491	496	387	387
q21	4308	1854	1400	1400
q22	335	279	235	235
Total cold run time: 96287 ms
Total hot run time: 28745 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4437	4362	4361	4361
q2	q3	1769	2177	1732	1732
q4	849	1156	750	750
q5	4018	4330	4316	4316
q6	178	177	141	141
q7	1723	1599	1476	1476
q8	2439	2653	2538	2538
q9	7332	7571	7417	7417
q10	2693	2809	2453	2453
q11	505	442	414	414
q12	503	600	468	468
q13	3932	4426	3702	3702
q14	305	312	290	290
q15	867	812	843	812
q16	733	772	710	710
q17	1167	1526	1329	1329
q18	7220	6885	6749	6749
q19	925	857	887	857
q20	2121	2190	2042	2042
q21	3912	3675	3380	3380
q22	489	429	382	382
Total cold run time: 48117 ms
Total hot run time: 46319 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184276 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2b882d226e94fc72abb8bf9704a41541423407c4, data reload: false

query5	5055	644	524	524
query6	325	220	196	196
query7	4212	474	268	268
query8	342	244	224	224
query9	8737	2723	2705	2705
query10	570	388	344	344
query11	16940	16810	16608	16608
query12	191	131	124	124
query13	1282	452	355	355
query14	6516	3247	2942	2942
query14_1	2858	2827	2845	2827
query15	213	196	185	185
query16	990	457	441	441
query17	1041	689	572	572
query18	2588	425	335	335
query19	200	203	176	176
query20	138	127	126	126
query21	219	144	120	120
query22	5020	5784	5503	5503
query23	17583	17244	17155	17155
query23_1	17034	17001	17002	17001
query24	7378	1659	1230	1230
query24_1	1237	1259	1239	1239
query25	590	492	425	425
query26	1235	262	159	159
query27	2759	475	297	297
query28	4496	1866	1864	1864
query29	833	575	489	489
query30	308	247	208	208
query31	862	725	656	656
query32	81	72	75	72
query33	536	346	297	297
query34	936	900	568	568
query35	653	673	607	607
query36	1117	1163	946	946
query37	151	99	83	83
query38	2919	2881	2880	2880
query39	1035	871	863	863
query39_1	839	828	833	828
query40	234	158	138	138
query41	67	64	63	63
query42	112	105	105	105
query43	376	387	356	356
query44	
query45	200	189	182	182
query46	864	996	603	603
query47	2160	2127	2049	2049
query48	311	316	234	234
query49	645	476	378	378
query50	682	280	225	225
query51	4124	4141	4013	4013
query52	106	108	99	99
query53	305	335	303	303
query54	327	314	262	262
query55	85	85	80	80
query56	313	304	319	304
query57	1376	1329	1272	1272
query58	290	280	271	271
query59	2558	2599	2557	2557
query60	344	341	324	324
query61	150	143	149	143
query62	627	595	551	551
query63	313	269	278	269
query64	4904	1271	984	984
query65	
query66	1408	457	371	371
query67	16378	16366	16244	16244
query68	
query69	400	297	280	280
query70	976	1003	985	985
query71	339	304	299	299
query72	2789	2683	2432	2432
query73	539	540	320	320
query74	9977	9860	9761	9761
query75	2830	2741	2459	2459
query76	2313	1030	668	668
query77	355	388	318	318
query78	11193	11360	10692	10692
query79	1134	795	599	599
query80	1358	644	544	544
query81	553	276	252	252
query82	1014	159	119	119
query83	342	259	238	238
query84	250	119	97	97
query85	909	480	441	441
query86	417	320	278	278
query87	3111	3084	2943	2943
query88	3526	2663	2662	2662
query89	418	376	347	347
query90	1971	172	165	165
query91	169	156	135	135
query92	78	78	73	73
query93	973	817	509	509
query94	639	327	298	298
query95	581	396	315	315
query96	631	515	224	224
query97	2471	2511	2438	2438
query98	231	211	213	211
query99	1002	1033	909	909
Total cold run time: 254353 ms
Total hot run time: 184276 ms

@jacktengg jacktengg force-pushed the condition-cache-cc branch 5 times, most recently from 6781f14 to 15fff1b Compare February 28, 2026 11:00
@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 29158 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 15fff1b61996813f8ea89e9a4a8b70b3bfd7b798, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17612	4540	4303	4303
q2	q3	10648	775	526	526
q4	4681	363	268	268
q5	7544	1215	1016	1016
q6	173	176	148	148
q7	789	876	686	686
q8	9306	1511	1333	1333
q9	4964	4715	4718	4715
q10	6841	1886	1659	1659
q11	455	269	250	250
q12	696	574	466	466
q13	17782	4269	3459	3459
q14	236	241	220	220
q15	936	802	788	788
q16	775	727	680	680
q17	751	868	440	440
q18	6061	5438	5299	5299
q19	1255	987	652	652
q20	512	500	396	396
q21	5002	2001	1581	1581
q22	395	324	273	273
Total cold run time: 97414 ms
Total hot run time: 29158 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4688	4538	4574	4538
q2	q3	1798	2238	1771	1771
q4	869	1210	784	784
q5	4044	4417	4300	4300
q6	189	175	141	141
q7	1775	1647	1505	1505
q8	2517	2897	2555	2555
q9	7482	7351	7321	7321
q10	2597	2789	2398	2398
q11	516	436	411	411
q12	511	585	465	465
q13	4026	4481	3685	3685
q14	287	299	277	277
q15	904	805	793	793
q16	696	758	724	724
q17	1220	1473	1379	1379
q18	7271	6746	6597	6597
q19	922	915	913	913
q20	2191	2127	2024	2024
q21	3994	3504	3394	3394
q22	477	433	367	367
Total cold run time: 48974 ms
Total hot run time: 46342 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183955 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 15fff1b61996813f8ea89e9a4a8b70b3bfd7b798, data reload: false

query5	4560	647	532	532
query6	330	219	215	215
query7	4222	474	273	273
query8	362	255	242	242
query9	8778	2728	2740	2728
query10	499	391	352	352
query11	17065	17602	17287	17287
query12	202	142	130	130
query13	1334	508	381	381
query14	7164	3360	2997	2997
query14_1	2929	2893	2964	2893
query15	247	214	182	182
query16	1015	473	472	472
query17	1119	744	647	647
query18	2709	429	342	342
query19	197	202	176	176
query20	138	125	127	125
query21	214	131	110	110
query22	4734	4849	4916	4849
query23	17253	16723	16566	16566
query23_1	16644	16662	16545	16545
query24	7134	1625	1222	1222
query24_1	1222	1198	1228	1198
query25	539	452	407	407
query26	1239	293	149	149
query27	2753	482	283	283
query28	4522	1855	1871	1855
query29	784	567	464	464
query30	310	244	210	210
query31	883	738	643	643
query32	77	73	69	69
query33	512	348	274	274
query34	905	893	560	560
query35	630	674	599	599
query36	1086	1124	997	997
query37	136	94	82	82
query38	2971	2937	2872	2872
query39	881	874	831	831
query39_1	826	823	828	823
query40	230	156	139	139
query41	63	60	60	60
query42	109	108	105	105
query43	387	387	361	361
query44	
query45	230	189	178	178
query46	871	988	601	601
query47	2105	2147	2033	2033
query48	305	315	243	243
query49	627	468	377	377
query50	686	278	212	212
query51	4083	4107	4100	4100
query52	105	106	98	98
query53	291	339	288	288
query54	293	267	259	259
query55	86	87	77	77
query56	325	303	302	302
query57	1363	1357	1279	1279
query58	288	275	264	264
query59	2617	2718	2595	2595
query60	330	326	333	326
query61	149	146	143	143
query62	622	597	529	529
query63	315	276	281	276
query64	4863	1282	995	995
query65	
query66	1400	455	360	360
query67	16484	16325	16236	16236
query68	
query69	397	305	297	297
query70	958	963	982	963
query71	334	307	301	301
query72	2991	2825	2568	2568
query73	548	555	336	336
query74	9975	9936	9717	9717
query75	2859	2776	2472	2472
query76	2288	1046	692	692
query77	393	426	319	319
query78	11325	11574	10860	10860
query79	1775	784	598	598
query80	1365	626	543	543
query81	572	283	252	252
query82	998	146	113	113
query83	366	261	240	240
query84	256	133	99	99
query85	1050	501	427	427
query86	408	331	329	329
query87	3140	3081	3022	3022
query88	3569	2681	2681	2681
query89	436	372	345	345
query90	1884	170	176	170
query91	166	157	137	137
query92	83	80	67	67
query93	1077	877	514	514
query94	636	333	309	309
query95	589	401	310	310
query96	634	517	229	229
query97	2458	2495	2402	2402
query98	231	221	220	220
query99	1004	1002	922	922
Total cold run time: 254763 ms
Total hot run time: 183955 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 35.69% (121/339) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.63% (19663/37359)
Line Coverage 36.24% (183623/506679)
Region Coverage 32.54% (142483/437833)
Branch Coverage 33.46% (61743/184537)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 64.31% (218/339) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.47% (26146/36581)
Line Coverage 54.24% (273986/505126)
Region Coverage 51.63% (228177/441970)
Branch Coverage 52.84% (97811/185101)

@jacktengg jacktengg force-pushed the condition-cache-cc branch 3 times, most recently from da407e2 to a1f386e Compare March 1, 2026 15:24
@jacktengg
Copy link
Contributor Author

run buildall

1 similar comment
@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28659 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a1f386ed66ee671131b2827eb5e919d580052911, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17629	4515	4276	4276
q2	q3	10647	780	515	515
q4	4673	343	252	252
q5	7559	1189	1027	1027
q6	181	172	145	145
q7	774	825	667	667
q8	9286	1459	1296	1296
q9	4894	4705	4687	4687
q10	6816	1871	1620	1620
q11	439	255	259	255
q12	717	566	481	481
q13	17779	4245	3415	3415
q14	228	230	209	209
q15	961	788	791	788
q16	755	719	669	669
q17	706	883	404	404
q18	5885	5335	5303	5303
q19	1124	983	611	611
q20	500	491	384	384
q21	4492	1857	1413	1413
q22	336	282	242	242
Total cold run time: 96381 ms
Total hot run time: 28659 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4432	4366	4335	4335
q2	q3	1757	2166	1718	1718
q4	849	1144	763	763
q5	4041	4330	4314	4314
q6	170	171	141	141
q7	1704	1563	1487	1487
q8	2411	2665	2511	2511
q9	7226	7762	7367	7367
q10	2733	2854	2414	2414
q11	518	426	412	412
q12	505	575	446	446
q13	3920	4476	3619	3619
q14	318	327	309	309
q15	860	804	847	804
q16	702	749	713	713
q17	1204	1517	1288	1288
q18	7040	6868	6544	6544
q19	885	903	876	876
q20	2063	2153	2049	2049
q21	4010	3756	3334	3334
q22	505	430	374	374
Total cold run time: 47853 ms
Total hot run time: 45818 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184191 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a1f386ed66ee671131b2827eb5e919d580052911, data reload: false

query5	4728	626	495	495
query6	326	214	206	206
query7	4248	482	269	269
query8	351	238	239	238
query9	8767	2739	2727	2727
query10	479	372	337	337
query11	16958	16858	16590	16590
query12	186	128	123	123
query13	1269	450	377	377
query14	6449	3198	2979	2979
query14_1	2813	2784	2771	2771
query15	211	197	180	180
query16	993	472	433	433
query17	1076	718	606	606
query18	2602	448	352	352
query19	210	203	182	182
query20	138	132	132	132
query21	224	146	125	125
query22	4761	6202	5534	5534
query23	17916	17269	17209	17209
query23_1	17058	16950	17081	16950
query24	7284	1608	1240	1240
query24_1	1237	1237	1219	1219
query25	554	484	427	427
query26	1229	266	158	158
query27	2760	461	288	288
query28	4500	1851	1866	1851
query29	777	555	467	467
query30	310	244	207	207
query31	889	716	671	671
query32	78	73	70	70
query33	508	338	276	276
query34	916	912	569	569
query35	641	672	603	603
query36	1093	1138	1010	1010
query37	132	91	80	80
query38	2979	2915	2870	2870
query39	1025	864	839	839
query39_1	829	832	825	825
query40	230	145	132	132
query41	64	57	59	57
query42	104	102	100	100
query43	397	390	349	349
query44	
query45	202	188	184	184
query46	864	972	606	606
query47	2123	2141	2048	2048
query48	329	351	238	238
query49	617	464	382	382
query50	683	278	215	215
query51	4089	4081	4047	4047
query52	106	107	97	97
query53	292	338	275	275
query54	291	261	260	260
query55	100	81	83	81
query56	310	300	300	300
query57	1351	1339	1277	1277
query58	293	278	274	274
query59	2557	2670	2508	2508
query60	322	323	327	323
query61	150	143	143	143
query62	616	599	547	547
query63	324	278	282	278
query64	4848	1273	969	969
query65	
query66	1414	454	356	356
query67	16278	16307	16366	16307
query68	
query69	398	315	287	287
query70	988	951	964	951
query71	330	314	298	298
query72	2771	2601	2372	2372
query73	533	545	315	315
query74	9985	9868	9720	9720
query75	2872	2734	2467	2467
query76	2296	1042	702	702
query77	366	371	295	295
query78	11197	11369	10695	10695
query79	3018	816	611	611
query80	1751	636	517	517
query81	598	275	244	244
query82	995	147	114	114
query83	327	267	240	240
query84	253	124	96	96
query85	879	459	438	438
query86	497	312	296	296
query87	3087	3094	3003	3003
query88	3541	2648	2641	2641
query89	418	365	343	343
query90	2253	176	163	163
query91	162	158	134	134
query92	86	85	70	70
query93	2044	840	521	521
query94	644	312	280	280
query95	588	395	313	313
query96	643	511	231	231
query97	2478	2518	2433	2433
query98	238	218	214	214
query99	1015	980	912	912
Total cold run time: 257240 ms
Total hot run time: 184191 ms

@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28635 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a1f386ed66ee671131b2827eb5e919d580052911, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17637	4424	4262	4262
q2	q3	10648	758	522	522
q4	4678	354	251	251
q5	7552	1178	1024	1024
q6	173	173	148	148
q7	784	860	671	671
q8	9549	1454	1316	1316
q9	4718	4724	4692	4692
q10	6849	1878	1628	1628
q11	446	262	251	251
q12	736	569	477	477
q13	17755	4202	3405	3405
q14	231	232	221	221
q15	961	803	785	785
q16	740	721	657	657
q17	712	889	427	427
q18	5909	5359	5216	5216
q19	1323	963	638	638
q20	510	489	391	391
q21	4536	1874	1415	1415
q22	331	285	238	238
Total cold run time: 96778 ms
Total hot run time: 28635 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4441	4316	4326	4316
q2	q3	1758	2174	1720	1720
q4	837	1166	756	756
q5	4021	4319	4320	4319
q6	180	172	141	141
q7	1729	1589	1468	1468
q8	2416	2649	2511	2511
q9	7374	7415	7324	7324
q10	2676	2895	2430	2430
q11	523	433	411	411
q12	580	589	454	454
q13	3980	4509	3728	3728
q14	289	307	278	278
q15	859	829	819	819
q16	765	782	711	711
q17	1235	1547	1365	1365
q18	7050	6822	6649	6649
q19	918	883	912	883
q20	2076	2138	1993	1993
q21	3964	3691	3401	3401
q22	419	431	395	395
Total cold run time: 48090 ms
Total hot run time: 46072 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183659 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a1f386ed66ee671131b2827eb5e919d580052911, data reload: false

query5	4776	642	521	521
query6	322	224	198	198
query7	4219	474	277	277
query8	353	256	248	248
query9	8766	2792	2778	2778
query10	525	382	346	346
query11	17053	17683	17182	17182
query12	212	139	139	139
query13	1328	503	358	358
query14	7009	3435	3099	3099
query14_1	2975	3030	2977	2977
query15	217	197	181	181
query16	1016	498	471	471
query17	1606	792	682	682
query18	2991	473	380	380
query19	227	219	192	192
query20	137	145	136	136
query21	233	143	127	127
query22	5667	5064	4846	4846
query23	17491	16754	16568	16568
query23_1	16636	16791	16608	16608
query24	7104	1606	1209	1209
query24_1	1234	1222	1203	1203
query25	531	459	401	401
query26	1238	263	151	151
query27	2746	463	306	306
query28	4471	1878	1865	1865
query29	785	589	466	466
query30	309	249	211	211
query31	870	726	641	641
query32	87	75	68	68
query33	505	338	282	282
query34	926	914	560	560
query35	638	677	597	597
query36	1073	1081	943	943
query37	134	94	86	86
query38	2959	2894	2821	2821
query39	886	854	841	841
query39_1	835	826	829	826
query40	227	152	139	139
query41	64	64	58	58
query42	108	106	104	104
query43	380	381	347	347
query44	
query45	198	204	185	185
query46	901	1018	604	604
query47	2140	2124	2026	2026
query48	319	318	230	230
query49	625	467	392	392
query50	670	279	217	217
query51	4115	4079	3993	3993
query52	110	106	96	96
query53	294	341	293	293
query54	287	262	260	260
query55	92	83	84	83
query56	309	316	297	297
query57	1360	1312	1292	1292
query58	291	278	280	278
query59	2588	2751	2512	2512
query60	338	344	332	332
query61	152	149	150	149
query62	639	584	545	545
query63	322	283	281	281
query64	4830	1285	1016	1016
query65	
query66	1381	462	353	353
query67	16405	16246	16201	16201
query68	
query69	398	319	312	312
query70	999	945	864	864
query71	333	308	302	302
query72	2816	2683	2447	2447
query73	552	546	329	329
query74	9984	9923	9751	9751
query75	2862	2750	2478	2478
query76	2285	1027	666	666
query77	347	392	303	303
query78	11167	11465	10718	10718
query79	1131	803	600	600
query80	1348	619	529	529
query81	533	300	257	257
query82	1270	150	113	113
query83	343	261	271	261
query84	251	114	99	99
query85	891	484	431	431
query86	409	342	294	294
query87	3142	3077	3059	3059
query88	3588	2684	2677	2677
query89	417	366	352	352
query90	1993	177	174	174
query91	168	157	130	130
query92	73	73	70	70
query93	917	843	513	513
query94	634	319	289	289
query95	587	404	318	318
query96	640	524	235	235
query97	2480	2467	2405	2405
query98	227	215	217	215
query99	993	997	908	908
Total cold run time: 255467 ms
Total hot run time: 183659 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 36.50% (123/337) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.62% (19661/37363)
Line Coverage 36.24% (183596/506677)
Region Coverage 32.53% (142442/437837)
Branch Coverage 33.45% (61735/184539)

@jacktengg jacktengg force-pushed the condition-cache-cc branch from c33c87b to ee1fcce Compare March 4, 2026 15:37
@jacktengg
Copy link
Contributor Author

run buildall

@jacktengg
Copy link
Contributor Author

/review

@jacktengg jacktengg force-pushed the condition-cache-cc branch from ee1fcce to 14748e2 Compare March 4, 2026 15:42
@jacktengg
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: [feat](condition cache) Support condition cache for external table

Summary

This PR adds a condition cache for external tables (Hive Parquet/ORC). The cache stores per-granule (2048-row chunks) filter results keyed by (file_path, modification_time, file_size, condition_digest). On cache HIT, readers skip granules with no surviving rows, avoiding I/O and predicate evaluation. On MISS, readers mark which granules have surviving rows during predicate evaluation for future lookups.

Overall this is a well-structured feature with good test coverage. A few issues found below.

Critical Checkpoint Conclusions

Goal & correctness: The code accomplishes the stated goal. Regression tests cover both ORC and Parquet paths. Unit tests cover ranges_exception, get_row_index_by_pos, and filter_ranges_by_cache. Cache is correctly skipped for LIMIT-truncated scans and tables with delete operations (Iceberg/transactional Hive).

Concurrency: The ConditionCacheContext is created per-file and used by a single scanner thread sequentially, so no concurrency issues with the std::vector<bool> filter_result. The global ConditionCacheManager uses LRUCachePolicy which has internal locking.

Lifecycle: Cache entries are inserted only on full file consumption (_finalize_reader_condition_cache), preventing partial/incorrect cache entries. The ConditionCacheContext is owned by FileScanner and passed by raw pointer to readers, which is fine since the scanner outlives its readers.

Parallel code paths: Both ORC and Parquet readers are updated. Both lazy-read and non-lazy-read paths handle HIT and MISS. The Iceberg and transactional Hive table format readers correctly delegate cache context methods and report has_delete_operations() to disable caching.

Configuration: Uses existing enable_condition_cache and condition_cache_capacity session variables. No new config items added.

Observability: Profile counters added for ConditionCacheHit, ConditionCacheFilteredRows, and ConditionCacheMissedRows.

Compatibility: The condition_cache_digest is passed via TQueryOptions from FE. BE checks __isset.condition_cache_digest before use, so rolling upgrades where FE is newer than BE will gracefully skip caching.

Test coverage: Regression tests cover basic HIT/MISS scenarios for both formats. Unit tests cover the new RowRanges methods and cache filtering logic. Consider adding negative tests for edge cases (empty files, single-row files, files smaller than one granule).

Performance: On HIT, the Parquet non-lazy path pre-filters _read_ranges to skip I/O entirely for false granules. The ORC path seeks past false granules. The lazy-read HIT check in Parquet is dead code (see comment below) but harmless.

Key encoding: ExternalCacheKey::encode() concatenates variable-length path with fixed 24-byte binary suffix without a length separator. In practice, file paths won't contain bytes that could be confused with the binary suffix, but a length prefix or separator would be more robust.

Issues Found

See inline comments below.

return -1;
}

uint64_t get_digest(uint64_t seed) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug risk] DCHECK(false) vanishes in RELEASE builds, causing silent return of -1 for out-of-bounds positions. Per AGENTS.md coding standards, invariant violations should crash rather than silently continue.

If pos >= _count is truly an invariant that should never be violated, this should use DORIS_CHECK(false) (or equivalently, DORIS_CHECK(pos < _count) at the function entry) to catch violations in all build types. A silent return of -1 in production could cause incorrect granule marking and data correctness issues.

// Suggested: replace DCHECK with DORIS_CHECK
DORIS_CHECK(false) << "pos " << pos << " is out of bounds for RowRanges with count " << _count;

std::shared_ptr<ConditionCacheContext> _condition_cache_ctx;
std::unique_ptr<ORCFilterImpl> _orc_filter;
orc::RowReaderOptions _row_reader_options;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Initializing uint64_t to -1 gives UINT64_MAX (18446744073709551615). While it's overwritten before first use, this is a code smell and may trigger -Wconversion warnings in files that enable compile_check_begin.h.

Consider using 0 as initial value, or a named sentinel constant if a specific "uninitialized" value is intended.

auto& cache = *_condition_cache_ctx->filter_result;
int64_t first_rg_pos = _read_ranges.get_row_index_by_pos(batch_base_row);
int64_t last_rg_pos =
_read_ranges.get_row_index_by_pos(batch_base_row + pre_read_rows - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Dead code] This lazy-read HIT check (if (_condition_cache_ctx && _condition_cache_ctx->is_hit)) will never evaluate to true for Parquet.

In _filter_read_ranges_by_condition_cache() (called during row group init), _condition_cache_ctx is set to nullptr after filtering _read_ranges on HIT. So by the time execution reaches this lazy-read path, _condition_cache_ctx is already null.

The actual HIT optimization for Parquet is entirely handled by pre-filtering _read_ranges in the non-lazy path, which is correct and effective. This block (lines 616-641 approximately) is unreachable dead code.

Consider either:

  1. Removing this dead code block, or
  2. If there's an intent to do per-batch cache checking in the lazy path (e.g., for future use), don't null out _condition_cache_ctx in _filter_read_ranges_by_condition_cache() and document the design.

RETURN_IF_ERROR(_read_column_data(block, _lazy_read_ctx.predicate_columns.first, batch_size,
&pre_read_rows, &pre_eof, filter_map));
if (pre_read_rows == 0) {
DCHECK_EQ(pre_eof, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance observation] In this lazy-read HIT path, predicate columns are already read from storage (_read_column_data above) before the cache check occurs. This means the cache HIT in the lazy-read path only saves the cost of:

  • Predicate evaluation
  • Lazy (non-predicate) column reads

But the I/O cost for predicate columns is already spent. This contrasts with the non-lazy path which pre-filters _read_ranges to avoid reading any columns for false granules.

This is noted as a design observation. Since this code is actually dead (see other comment about _condition_cache_ctx being null at this point), it's not a practical issue currently. But if this code path is ever activated, consider restructuring to check the cache before reading predicate columns.

@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28825 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 37deff66bd9e464a6dbdf8f54d30e23a26459b14, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17665	4485	4291	4291
q2	q3	10634	850	523	523
q4	4689	374	273	273
q5	7559	1206	1046	1046
q6	172	175	145	145
q7	771	861	664	664
q8	9289	1488	1335	1335
q9	4950	4764	4743	4743
q10	6790	1909	1658	1658
q11	465	269	257	257
q12	692	572	459	459
q13	17756	4238	3422	3422
q14	229	227	212	212
q15	935	799	785	785
q16	756	721	676	676
q17	708	895	384	384
q18	6006	5428	5215	5215
q19	1273	980	597	597
q20	505	492	394	394
q21	4830	1992	1493	1493
q22	394	317	253	253
Total cold run time: 97068 ms
Total hot run time: 28825 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4686	4562	4531	4531
q2	q3	1775	2227	1769	1769
q4	883	1200	801	801
q5	4088	4354	4354	4354
q6	184	173	145	145
q7	1748	1638	1489	1489
q8	2483	2853	2554	2554
q9	8006	7334	7339	7334
q10	2635	3064	2414	2414
q11	514	449	425	425
q12	524	591	460	460
q13	3983	4413	3605	3605
q14	305	307	281	281
q15	893	823	822	822
q16	707	765	701	701
q17	1212	1483	1376	1376
q18	6974	6785	6564	6564
q19	867	875	887	875
q20	2044	2184	2055	2055
q21	4070	3437	3430	3430
q22	448	439	406	406
Total cold run time: 49029 ms
Total hot run time: 46391 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183460 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 37deff66bd9e464a6dbdf8f54d30e23a26459b14, data reload: false

query5	4602	633	528	528
query6	331	229	204	204
query7	4215	467	273	273
query8	357	258	234	234
query9	8763	2818	2735	2735
query10	528	374	342	342
query11	17023	17450	17251	17251
query12	202	148	131	131
query13	1412	515	356	356
query14	6668	3431	3034	3034
query14_1	2898	3024	3016	3016
query15	215	210	218	210
query16	1001	483	500	483
query17	1082	733	652	652
query18	2721	425	339	339
query19	209	214	184	184
query20	138	129	122	122
query21	215	133	111	111
query22	4856	5041	4683	4683
query23	17106	16689	16459	16459
query23_1	16743	16505	16608	16505
query24	7212	1592	1224	1224
query24_1	1226	1254	1208	1208
query25	533	456	434	434
query26	1248	258	148	148
query27	2793	456	293	293
query28	4512	1890	1885	1885
query29	792	580	488	488
query30	313	243	205	205
query31	885	750	642	642
query32	81	70	71	70
query33	519	339	281	281
query34	926	902	589	589
query35	630	718	593	593
query36	1086	1093	990	990
query37	137	95	81	81
query38	2938	2936	2825	2825
query39	879	879	859	859
query39_1	833	908	832	832
query40	231	151	132	132
query41	63	61	58	58
query42	108	104	105	104
query43	369	396	351	351
query44	
query45	193	196	182	182
query46	877	979	607	607
query47	2106	2122	2031	2031
query48	312	321	231	231
query49	625	461	388	388
query50	674	275	210	210
query51	4029	4104	4033	4033
query52	109	109	97	97
query53	295	342	286	286
query54	313	262	272	262
query55	86	81	81	81
query56	327	298	318	298
query57	1379	1335	1258	1258
query58	288	274	280	274
query59	2618	2775	2625	2625
query60	327	338	318	318
query61	147	144	147	144
query62	638	589	543	543
query63	313	278	280	278
query64	4867	1255	974	974
query65	
query66	1412	448	374	374
query67	16460	16376	16270	16270
query68	
query69	394	320	296	296
query70	1001	920	996	920
query71	342	303	303	303
query72	2922	2945	2574	2574
query73	537	557	323	323
query74	10009	9889	9735	9735
query75	2838	2772	2499	2499
query76	2297	1030	698	698
query77	397	374	304	304
query78	11417	11399	10679	10679
query79	2075	792	611	611
query80	1655	656	543	543
query81	573	278	254	254
query82	1022	153	115	115
query83	348	258	243	243
query84	247	124	98	98
query85	879	474	427	427
query86	412	312	338	312
query87	3107	3095	3000	3000
query88	3505	2677	2653	2653
query89	437	377	346	346
query90	2027	174	170	170
query91	174	158	138	138
query92	86	72	74	72
query93	1045	844	505	505
query94	639	317	265	265
query95	596	353	319	319
query96	647	514	226	226
query97	2445	2494	2375	2375
query98	226	223	219	219
query99	961	982	944	944
Total cold run time: 254680 ms
Total hot run time: 183460 ms

@jacktengg
Copy link
Contributor Author

run beut

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 43.04% (136/316) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.62% (19682/37407)
Line Coverage 36.21% (183690/507282)
Region Coverage 32.50% (142478/438349)
Branch Coverage 33.45% (61812/184806)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 58.86% (186/316) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.53% (26201/36629)
Line Coverage 54.26% (274406/505728)
Region Coverage 51.51% (227923/442486)
Branch Coverage 52.89% (98033/185370)

@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27646 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 69163f5fe41b8cdd1cc7703d2a3fc14c04b1182a, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17624	4396	4277	4277
q2	q3	10659	860	523	523
q4	4672	371	252	252
q5	7561	1221	1019	1019
q6	177	178	148	148
q7	798	832	653	653
q8	9297	1498	1321	1321
q9	4971	4741	4721	4721
q10	6305	1890	1663	1663
q11	504	272	234	234
q12	745	571	468	468
q13	18046	2933	2189	2189
q14	232	229	212	212
q15	933	811	815	811
q16	764	723	667	667
q17	720	863	435	435
q18	6042	5473	5325	5325
q19	1176	1001	595	595
q20	497	486	383	383
q21	4432	2163	1471	1471
q22	350	340	279	279
Total cold run time: 96505 ms
Total hot run time: 27646 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4696	4622	4483	4483
q2	q3	3904	4371	3811	3811
q4	890	1205	797	797
q5	4043	4415	4355	4355
q6	188	169	141	141
q7	1783	1662	1534	1534
q8	2535	2720	2657	2657
q9	7588	7360	7528	7360
q10	3813	4018	3572	3572
q11	501	430	434	430
q12	493	603	431	431
q13	2686	3150	2383	2383
q14	279	326	302	302
q15	892	797	820	797
q16	746	771	720	720
q17	1161	1493	1339	1339
q18	7285	6883	6755	6755
q19	885	872	885	872
q20	2068	2157	2099	2099
q21	3945	3482	3357	3357
q22	497	429	392	392
Total cold run time: 50878 ms
Total hot run time: 48587 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153330 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 69163f5fe41b8cdd1cc7703d2a3fc14c04b1182a, data reload: false

query5	4333	644	519	519
query6	345	227	211	211
query7	4199	464	277	277
query8	333	257	230	230
query9	8711	2808	2831	2808
query10	518	403	362	362
query11	7263	5904	5626	5626
query12	182	131	126	126
query13	1271	466	350	350
query14	5714	3854	3539	3539
query14_1	2809	2810	2784	2784
query15	198	197	174	174
query16	991	482	472	472
query17	1080	704	590	590
query18	2434	453	339	339
query19	209	215	184	184
query20	137	131	125	125
query21	221	142	130	130
query22	5071	4888	4737	4737
query23	16217	15633	15359	15359
query23_1	15368	16295	15922	15922
query24	7313	1696	1307	1307
query24_1	1345	1325	1302	1302
query25	621	540	457	457
query26	1248	284	165	165
query27	2970	513	307	307
query28	4853	1998	2034	1998
query29	970	638	510	510
query30	316	251	217	217
query31	1357	1299	1223	1223
query32	125	70	76	70
query33	516	330	279	279
query34	918	921	567	567
query35	621	687	587	587
query36	1083	1091	956	956
query37	140	97	81	81
query38	2932	2949	2877	2877
query39	891	874	831	831
query39_1	821	832	870	832
query40	230	156	135	135
query41	62	63	58	58
query42	296	297	312	297
query43	236	247	222	222
query44	
query45	200	193	186	186
query46	873	996	622	622
query47	2118	2162	2059	2059
query48	312	329	246	246
query49	638	473	382	382
query50	698	279	211	211
query51	4128	4273	4067	4067
query52	297	297	289	289
query53	295	337	287	287
query54	294	273	284	273
query55	92	89	85	85
query56	318	339	302	302
query57	1366	1337	1278	1278
query58	297	284	273	273
query59	1407	1411	1240	1240
query60	327	342	321	321
query61	188	139	143	139
query62	627	602	547	547
query63	312	272	272	272
query64	5120	1263	991	991
query65	
query66	1468	464	357	357
query67	16443	16431	16293	16293
query68	
query69	389	313	297	297
query70	1016	1002	936	936
query71	339	313	311	311
query72	2781	2686	2444	2444
query73	539	555	321	321
query74	10110	9918	9819	9819
query75	2855	2779	2461	2461
query76	2287	1021	685	685
query77	361	392	331	331
query78	11203	11547	10727	10727
query79	1123	793	596	596
query80	702	660	575	575
query81	518	274	253	253
query82	1356	151	116	116
query83	383	296	262	262
query84	259	126	108	108
query85	921	564	517	517
query86	392	315	297	297
query87	3168	3126	3002	3002
query88	3629	2690	2662	2662
query89	435	378	351	351
query90	1973	190	178	178
query91	170	168	135	135
query92	84	75	76	75
query93	944	869	509	509
query94	475	350	308	308
query95	593	399	331	331
query96	657	528	235	235
query97	2458	2554	2456	2456
query98	235	230	223	223
query99	1019	989	931	931
Total cold run time: 233643 ms
Total hot run time: 153330 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 43.57% (139/319) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.65% (19697/37410)
Line Coverage 36.26% (183954/507304)
Region Coverage 32.53% (142621/438362)
Branch Coverage 33.47% (61858/184818)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 71.16% (227/319) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.63% (26240/36632)
Line Coverage 54.42% (275242/505751)
Region Coverage 51.81% (229280/442499)
Branch Coverage 53.04% (98319/185382)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants