Skip to content

[opt](TabletScheduler) introduce TableDispatchScheduler for table-level tablet scheduling#60955

Open
uchenily wants to merge 2 commits intoapache:masterfrom
uchenily:table-level-schd
Open

[opt](TabletScheduler) introduce TableDispatchScheduler for table-level tablet scheduling#60955
uchenily wants to merge 2 commits intoapache:masterfrom
uchenily:table-level-schd

Conversation

@uchenily
Copy link
Contributor

@uchenily uchenily commented Mar 2, 2026

What problem does this PR solve?

This PR optimizes the TabletScheduler by introducing a table-level dispatching mechanism. Instead of processing tablets sequentially from a single global queue, the scheduler now dispatches tablets to table-specific queues handled by a pool of worker threads.

This optimization aims to prevent a long-held lock or a potential deadlock on a specific table from blocking the entire TabletScheduler and enhance overall scheduling throughput by allowing multiple tables to be processed concurrently.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@uchenily
Copy link
Contributor Author

uchenily commented Mar 2, 2026

run buildall

…el tablet scheduling

This PR optimizes the TabletScheduler by introducing a table-level dispatching mechanism.
Instead of processing tablets sequentially from a single global queue, the scheduler now
dispatches tablets to table-specific queues handled by a pool of worker threads.

This optimization aims to prevent a long-held lock or a potential deadlock on a specific
table from blocking the entire TabletScheduler and enhance overall scheduling throughput
by allowing multiple tables to be processed concurrently.
@uchenily
Copy link
Contributor Author

uchenily commented Mar 3, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28613 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6f31d1bac0b441c0a33a5a1cf249aa29ec8e9341, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17641	4446	4293	4293
q2	q3	10649	776	519	519
q4	4687	354	251	251
q5	7551	1191	1024	1024
q6	176	177	147	147
q7	776	827	692	692
q8	9735	1460	1279	1279
q9	4993	4758	4721	4721
q10	6829	1889	1634	1634
q11	452	254	251	251
q12	741	568	468	468
q13	17795	4200	3405	3405
q14	230	234	211	211
q15	962	794	782	782
q16	740	725	653	653
q17	724	897	415	415
q18	5983	5259	5311	5259
q19	1418	955	593	593
q20	502	497	388	388
q21	4499	1834	1395	1395
q22	345	287	233	233
Total cold run time: 97428 ms
Total hot run time: 28613 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4440	4341	4368	4341
q2	q3	1752	2158	1708	1708
q4	824	1157	760	760
q5	4007	4316	4319	4316
q6	178	175	142	142
q7	1729	1593	1486	1486
q8	2401	2634	2513	2513
q9	7462	7350	7491	7350
q10	2734	2874	2449	2449
q11	513	430	413	413
q12	507	592	453	453
q13	4028	4472	3605	3605
q14	281	304	272	272
q15	865	808	831	808
q16	758	759	719	719
q17	1156	1496	1322	1322
q18	6957	7035	6493	6493
q19	910	849	878	849
q20	2088	2169	2033	2033
q21	3944	3466	3619	3466
q22	447	451	396	396
Total cold run time: 47981 ms
Total hot run time: 45894 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183411 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6f31d1bac0b441c0a33a5a1cf249aa29ec8e9341, data reload: false

query5	4828	650	506	506
query6	337	214	203	203
query7	4227	463	278	278
query8	332	238	226	226
query9	8756	2779	2805	2779
query10	532	370	331	331
query11	17042	17528	17257	17257
query12	192	132	133	132
query13	1262	495	356	356
query14	6706	3329	2999	2999
query14_1	2956	2936	2854	2854
query15	208	196	182	182
query16	1103	473	469	469
query17	1341	749	610	610
query18	2976	446	390	390
query19	260	243	196	196
query20	138	136	135	135
query21	215	141	118	118
query22	5604	5003	4557	4557
query23	17041	16784	16586	16586
query23_1	16593	16677	16548	16548
query24	7289	1629	1219	1219
query24_1	1215	1235	1226	1226
query25	538	457	405	405
query26	1241	261	146	146
query27	2766	467	284	284
query28	4531	1908	1909	1908
query29	784	555	469	469
query30	311	245	211	211
query31	877	730	656	656
query32	79	72	68	68
query33	507	349	284	284
query34	911	921	561	561
query35	633	678	599	599
query36	1039	1130	954	954
query37	134	98	86	86
query38	2989	2996	2913	2913
query39	898	868	851	851
query39_1	833	837	820	820
query40	231	152	138	138
query41	62	59	60	59
query42	104	104	101	101
query43	376	383	345	345
query44	
query45	200	197	186	186
query46	862	984	639	639
query47	2140	2131	2046	2046
query48	340	321	231	231
query49	639	466	393	393
query50	673	278	212	212
query51	4091	4116	4067	4067
query52	112	106	96	96
query53	297	334	275	275
query54	292	276	262	262
query55	91	80	79	79
query56	311	346	307	307
query57	1373	1303	1270	1270
query58	294	282	276	276
query59	2587	2700	2591	2591
query60	341	338	326	326
query61	149	145	142	142
query62	630	595	542	542
query63	320	299	284	284
query64	4903	1276	1048	1048
query65	
query66	1451	468	378	378
query67	16444	16393	16063	16063
query68	
query69	415	326	305	305
query70	1015	945	964	945
query71	384	303	296	296
query72	2826	2640	2426	2426
query73	541	538	315	315
query74	9943	9897	9748	9748
query75	2855	2737	2476	2476
query76	2293	1025	686	686
query77	357	377	324	324
query78	11162	11226	10620	10620
query79	1156	791	593	593
query80	789	642	538	538
query81	503	283	243	243
query82	1315	157	120	120
query83	352	267	242	242
query84	251	119	98	98
query85	897	570	510	510
query86	379	300	321	300
query87	3155	3165	2994	2994
query88	3522	2699	2676	2676
query89	434	375	350	350
query90	1859	179	174	174
query91	180	169	150	150
query92	79	75	75	75
query93	905	821	518	518
query94	522	343	304	304
query95	600	354	324	324
query96	644	524	231	231
query97	2457	2506	2396	2396
query98	231	219	229	219
query99	1010	976	930	930
Total cold run time: 253951 ms
Total hot run time: 183411 ms

@uchenily
Copy link
Contributor Author

uchenily commented Mar 4, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28868 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 76d8c7736ddbcb59b88899842d963e3bc5670217, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17672	4497	4328	4328
q2	q3	10636	792	530	530
q4	4680	359	252	252
q5	7549	1203	1049	1049
q6	179	180	149	149
q7	789	851	666	666
q8	9508	1464	1336	1336
q9	5286	4752	4715	4715
q10	6887	1881	1660	1660
q11	489	262	235	235
q12	737	571	462	462
q13	17774	4191	3417	3417
q14	237	244	203	203
q15	932	806	803	803
q16	736	720	681	681
q17	722	900	397	397
q18	5942	5279	5206	5206
q19	1420	962	601	601
q20	499	492	387	387
q21	4697	1969	1493	1493
q22	370	335	298	298
Total cold run time: 97741 ms
Total hot run time: 28868 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4687	4616	4566	4566
q2	q3	1824	2229	1771	1771
q4	861	1169	776	776
q5	4051	4390	4350	4350
q6	193	186	152	152
q7	1780	1663	1518	1518
q8	2477	2722	2573	2573
q9	7609	7434	7344	7344
q10	2658	2888	2491	2491
q11	525	444	424	424
q12	508	675	490	490
q13	3988	4431	3675	3675
q14	275	296	296	296
q15	878	787	795	787
q16	745	773	714	714
q17	1173	1542	1252	1252
q18	7186	6857	6700	6700
q19	917	929	891	891
q20	2085	2181	2259	2181
q21	3919	3507	3415	3415
q22	472	544	483	483
Total cold run time: 48811 ms
Total hot run time: 46849 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184084 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 76d8c7736ddbcb59b88899842d963e3bc5670217, data reload: false

query5	4334	634	509	509
query6	327	215	199	199
query7	4208	484	277	277
query8	336	259	244	244
query9	8715	2735	2743	2735
query10	497	409	378	378
query11	17015	17495	17248	17248
query12	190	131	132	131
query13	1390	502	362	362
query14	6270	3424	3154	3154
query14_1	2950	2981	2928	2928
query15	217	213	181	181
query16	1144	472	446	446
query17	1221	777	618	618
query18	3129	465	374	374
query19	213	226	185	185
query20	151	139	132	132
query21	220	138	114	114
query22	5393	4914	4995	4914
query23	17559	16840	16616	16616
query23_1	16795	16665	16705	16665
query24	7215	1627	1233	1233
query24_1	1231	1251	1240	1240
query25	547	454	394	394
query26	1230	260	154	154
query27	2783	485	285	285
query28	4514	1885	1902	1885
query29	799	576	477	477
query30	315	246	214	214
query31	889	734	654	654
query32	79	72	70	70
query33	512	341	290	290
query34	917	907	564	564
query35	642	673	593	593
query36	1104	1144	932	932
query37	135	96	85	85
query38	2964	2908	2809	2809
query39	912	863	842	842
query39_1	833	830	833	830
query40	233	155	134	134
query41	61	58	56	56
query42	105	99	99	99
query43	374	381	350	350
query44	
query45	199	195	184	184
query46	884	992	630	630
query47	2136	2133	2051	2051
query48	318	329	229	229
query49	644	504	382	382
query50	685	274	210	210
query51	4128	4092	4051	4051
query52	108	112	100	100
query53	290	333	281	281
query54	321	273	259	259
query55	91	83	87	83
query56	315	311	310	310
query57	1396	1343	1293	1293
query58	290	279	267	267
query59	2545	2749	2611	2611
query60	338	332	330	330
query61	151	146	141	141
query62	625	613	540	540
query63	311	272	280	272
query64	4873	1288	1031	1031
query65	
query66	1477	449	349	349
query67	16411	16516	16219	16219
query68	
query69	417	305	303	303
query70	1017	867	969	867
query71	338	326	304	304
query72	2918	2825	2549	2549
query73	544	556	329	329
query74	9968	9896	9825	9825
query75	2863	2746	2469	2469
query76	2289	1068	666	666
query77	370	396	309	309
query78	11095	11277	10668	10668
query79	3054	819	606	606
query80	1751	654	567	567
query81	591	287	250	250
query82	970	150	119	119
query83	336	262	242	242
query84	257	124	98	98
query85	900	463	438	438
query86	500	296	290	290
query87	3143	3157	2994	2994
query88	3659	2690	2665	2665
query89	419	370	346	346
query90	2193	174	171	171
query91	166	156	131	131
query92	88	72	71	71
query93	2664	843	500	500
query94	655	327	278	278
query95	577	393	311	311
query96	648	532	230	230
query97	2440	2476	2467	2467
query98	228	219	212	212
query99	1007	998	913	913
Total cold run time: 258783 ms
Total hot run time: 184084 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 78.05% (128/164) 🎉
Increment coverage report
Complete coverage report

@uchenily
Copy link
Contributor Author

uchenily commented Mar 4, 2026

run p0

@uchenily
Copy link
Contributor Author

uchenily commented Mar 4, 2026

run nonConcurrent

@uchenily
Copy link
Contributor Author

uchenily commented Mar 5, 2026

run p0

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 64.02% (105/164) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants