Skip to content

[fix](fe) add column_data_sizes to BackendPartitionedSchemaScanNode#61086

Merged
dataroaring merged 1 commit intoapache:masterfrom
hoshinojyunn:column_data_sizes_fix
Mar 6, 2026
Merged

[fix](fe) add column_data_sizes to BackendPartitionedSchemaScanNode#61086
dataroaring merged 1 commit intoapache:masterfrom
hoshinojyunn:column_data_sizes_fix

Conversation

@hoshinojyunn
Copy link
Contributor

@hoshinojyunn hoshinojyunn commented Mar 5, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
The information_schema.column_data_sizes table is used to query column data size information across all BEs. However, when querying this table, only data from one BE is returned instead of all BEs. This is because the column_data_sizes table was not added to the BACKEND_TABLE in BackendPartitionedSchemaScanNode, causing the PhysicalPlanTranslator to use the default SchemaScanNode instead of BackendPartitionedSchemaScanNode.

fix:
Add column_data_sizes to BACKEND_TABLE in BackendPartitionedSchemaScanNode. This allows PhysicalPlanTranslator's visitPhysicalSchemaScan method to use BackendPartitionedSchemaScanNode instead of the default SchemaScanNode when querying the column_data_sizes table, enabling queries to return data from all BEs.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason (This change only affects scenarios with multiple BEs. It is difficult to test in regression tests or unit tests. The fix has been verified manually.)
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 5, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hoshinojyunn
Copy link
Contributor Author

run buildall

Copy link
Contributor

@csun5285 csun5285 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hoshinojyunn hoshinojyunn changed the title [fix](fe) fix column_data_sizes SchemaScanNode query returns data fro… [fix](fe) add column_data_sizes to BackendPartitionedSchemaScanNode Mar 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 27625 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 415e380473530eb510f9eed31dcc6b76d4a4ff0a, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17726	4438	4289	4289
q2	q3	10654	812	506	506
q4	4677	364	253	253
q5	7547	1188	1011	1011
q6	170	175	145	145
q7	782	827	687	687
q8	9298	1455	1329	1329
q9	4952	4622	4738	4622
q10	6291	1900	1662	1662
q11	452	264	244	244
q12	770	569	477	477
q13	18044	2914	2190	2190
q14	239	230	214	214
q15	933	806	797	797
q16	757	722	689	689
q17	712	851	400	400
q18	5962	5359	5282	5282
q19	1118	967	637	637
q20	507	484	393	393
q21	4430	2060	1505	1505
q22	363	361	293	293
Total cold run time: 96384 ms
Total hot run time: 27625 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4738	4666	4593	4593
q2	q3	3947	4346	3825	3825
q4	876	1184	766	766
q5	4076	4476	4372	4372
q6	213	179	142	142
q7	1776	1599	1551	1551
q8	2490	2717	2576	2576
q9	7577	7308	7404	7308
q10	3713	3966	3588	3588
q11	513	440	420	420
q12	499	619	464	464
q13	2772	3349	2362	2362
q14	280	293	278	278
q15	880	805	794	794
q16	699	790	749	749
q17	1150	1562	1424	1424
q18	7427	6863	6625	6625
q19	929	850	892	850
q20	2103	2177	2031	2031
q21	4028	3618	3377	3377
q22	430	432	383	383
Total cold run time: 51116 ms
Total hot run time: 48478 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153155 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 415e380473530eb510f9eed31dcc6b76d4a4ff0a, data reload: false

query5	4321	639	504	504
query6	325	219	198	198
query7	4204	465	267	267
query8	341	245	236	236
query9	8733	2722	2735	2722
query10	507	403	341	341
query11	7423	5863	5650	5650
query12	191	128	126	126
query13	1265	453	358	358
query14	5533	3800	3558	3558
query14_1	2789	2815	2822	2815
query15	201	197	177	177
query16	998	483	459	459
query17	876	737	613	613
query18	2447	445	357	357
query19	212	211	181	181
query20	136	136	131	131
query21	214	143	133	133
query22	4911	5076	4836	4836
query23	16052	15568	15313	15313
query23_1	15567	16284	15979	15979
query24	7721	1632	1245	1245
query24_1	1257	1296	1369	1296
query25	570	502	486	486
query26	1414	276	162	162
query27	3165	539	310	310
query28	5088	1932	1947	1932
query29	900	597	527	527
query30	319	257	220	220
query31	1394	1369	1325	1325
query32	89	74	76	74
query33	516	327	280	280
query34	921	922	561	561
query35	643	678	603	603
query36	1120	1104	963	963
query37	141	92	84	84
query38	2892	2955	2862	2862
query39	888	861	848	848
query39_1	806	831	822	822
query40	229	156	140	140
query41	66	59	57	57
query42	309	300	298	298
query43	245	255	221	221
query44	
query45	195	189	181	181
query46	873	984	614	614
query47	2154	2143	2087	2087
query48	313	304	225	225
query49	630	464	375	375
query50	696	272	223	223
query51	4045	4093	4152	4093
query52	292	292	284	284
query53	291	333	282	282
query54	291	269	261	261
query55	92	87	79	79
query56	313	321	315	315
query57	1380	1337	1284	1284
query58	287	276	268	268
query59	1361	1476	1288	1288
query60	349	338	314	314
query61	155	146	145	145
query62	632	586	546	546
query63	304	288	274	274
query64	5115	1268	986	986
query65	
query66	1453	476	356	356
query67	16418	16422	16325	16325
query68	
query69	380	308	276	276
query70	955	992	958	958
query71	339	299	303	299
query72	2678	2642	2413	2413
query73	533	550	323	323
query74	10031	9946	9778	9778
query75	2845	2761	2479	2479
query76	2299	1023	664	664
query77	353	383	300	300
query78	11184	11229	10704	10704
query79	2572	790	610	610
query80	1736	605	560	560
query81	565	282	243	243
query82	989	146	119	119
query83	348	273	250	250
query84	303	127	103	103
query85	990	555	436	436
query86	414	343	295	295
query87	3134	3146	2994	2994
query88	3516	2635	2614	2614
query89	427	368	348	348
query90	1998	176	177	176
query91	169	160	134	134
query92	80	74	73	73
query93	1138	843	512	512
query94	641	341	291	291
query95	596	328	307	307
query96	647	524	229	229
query97	2437	2477	2421	2421
query98	242	227	220	220
query99	986	1002	922	922
Total cold run time: 236190 ms
Total hot run time: 153155 ms

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

Verified: column_data_sizes correctly added to BACKEND_TABLE. BACKEND_ID column already registered in BEACKEND_ID_COLUMN_SET. No other schema tables missing registration.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

PR approved by at least one committer and no changes requested.

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@dataroaring dataroaring merged commit 687f90f into apache:master Mar 6, 2026
39 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 6, 2026
…61086)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
The `information_schema.column_data_sizes` table is used to query column
data size information across all BEs. However, when querying this table,
only data from one BE is returned instead of all BEs. This is because
the column_data_sizes table was not added to the BACKEND_TABLE in
`BackendPartitionedSchemaScanNode`, causing the `PhysicalPlanTranslator`
to use the default SchemaScanNode instead of
BackendPartitionedSchemaScanNode.

fix:
Add column_data_sizes to BACKEND_TABLE in
BackendPartitionedSchemaScanNode. This allows PhysicalPlanTranslator's
`visitPhysicalSchemaScan` method to use BackendPartitionedSchemaScanNode
instead of the default SchemaScanNode when querying the
column_data_sizes table, enabling queries to return data from all BEs.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
- [x] Other reason <!-- Add your reason? -->(This change only affects
scenarios with multiple BEs. It is difficult to test in regression tests
or unit tests. The fix has been verified manually.)

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
mrhhsg pushed a commit that referenced this pull request Mar 6, 2026
…61086)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
The `information_schema.column_data_sizes` table is used to query column
data size information across all BEs. However, when querying this table,
only data from one BE is returned instead of all BEs. This is because
the column_data_sizes table was not added to the BACKEND_TABLE in
`BackendPartitionedSchemaScanNode`, causing the `PhysicalPlanTranslator`
to use the default SchemaScanNode instead of
BackendPartitionedSchemaScanNode.

fix:
Add column_data_sizes to BACKEND_TABLE in
BackendPartitionedSchemaScanNode. This allows PhysicalPlanTranslator's
`visitPhysicalSchemaScan` method to use BackendPartitionedSchemaScanNode
instead of the default SchemaScanNode when querying the
column_data_sizes table, enabling queries to return data from all BEs.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
- [x] Other reason <!-- Add your reason? -->(This change only affects
scenarios with multiple BEs. It is difficult to test in regression tests
or unit tests. The fix has been verified manually.)

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei pushed a commit that referenced this pull request Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.5-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants