Skip to content

[feature](scan) Add value predicate pushdown control for MOR tables#60513

Open
dataroaring wants to merge 5 commits intomasterfrom
mor_value_predicate_pushdown_control
Open

[feature](scan) Add value predicate pushdown control for MOR tables#60513
dataroaring wants to merge 5 commits intomasterfrom
mor_value_predicate_pushdown_control

Conversation

@dataroaring
Copy link
Contributor

@dataroaring dataroaring commented Feb 4, 2026

Summary

  • Add session variable enable_mor_value_predicate_pushdown_tables to control value column predicate pushdown for MOR (Merge-On-Read) tables
  • FE: session variable with table-list matching (db.table, table, or *), thrift flag on TOlapScanNode, isMorTable() helper on OlapTable
  • BE: propagate flag through scan operator → tablet reader → rowset reader; extend _should_push_down_value_predicates() for all rowsets; skip __DORIS_DELETE_SIGN__ to preserve delete correctness; keep VExpr in conjuncts as post-merge safety net

Motivation

MOR tables with inverted indexes on value columns cannot utilize those indexes for filtering because value predicates are not pushed to the storage layer. This feature enables per-segment value predicate pushdown for dedup-only/insert-only MOR workloads where the same key always carries identical values across rowsets, allowing inverted indexes and zone maps to filter data early.

Design

Two-layer predicate approach:

  1. ColumnPredicate (storage layer): pushed per-segment for early filtering via inverted index, zone map, bloom filter
  2. VExpr (compute layer): retained in _conjuncts as post-merge safety net

Delete sign handling: __DORIS_DELETE_SIGN__ predicate is excluded from per-segment pushdown to prevent delete markers from being filtered before merge, which would cause deleted rows to reappear.

Test plan

  • Test 1: Basic session variable matching (table name, db.table, wildcard, not-in-list)
  • Test 2: Dedup-only pattern with multiple overlapping rowsets
  • Test 3: Dedup + row-level delete via INSERT INTO (..., __DORIS_DELETE_SIGN__) VALUES (..., 1)
  • Test 4: Dedup + delete predicate via DELETE FROM ... WHERE
  • Test 5: Multiple tables in session variable list
  • Test 6: MOW table (unaffected by setting)
  • Test 7: DUP_KEYS table (unaffected by setting)

Copilot AI review requested due to automatic review settings February 4, 2026 17:33
@Thearas
Copy link
Contributor

Thearas commented Feb 4, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring force-pushed the mor_value_predicate_pushdown_control branch 2 times, most recently from fffa13a to 94bc596 Compare February 4, 2026 17:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a session variable to control value column predicate pushdown for MOR (Merge-On-Read) tables, allowing users to selectively enable this optimization per table or globally.

Changes:

  • Added session variable enable_mor_value_predicate_pushdown_tables for selective table-level control
  • Added isMorTable() helper method to identify MOR tables (UNIQUE_KEYS without merge-on-write)
  • Modified predicate pushdown logic to support value predicates on MOR tables when enabled

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
gensrc/thrift/PlanNodes.thrift Added optional boolean field enable_mor_value_predicate_pushdown to TOlapScanNode struct
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java Added session variable declaration, getter, and helper method to check if MOR value predicate pushdown is enabled
fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java Added isMorTable() helper method to identify MOR tables
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java Set thrift flag in toThrift() based on table type and session variable
be/src/pipeline/exec/scan_operator.h Added virtual method _should_push_down_mor_value_predicate() with default false implementation
be/src/pipeline/exec/scan_operator.cpp Modified predicate pushdown condition to include MOR value predicate check
be/src/pipeline/exec/olap_scan_operator.h Declared override for _should_push_down_mor_value_predicate()
be/src/pipeline/exec/olap_scan_operator.cpp Implemented _should_push_down_mor_value_predicate() to read thrift flag

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Set MOR value predicate pushdown flag based on session variable
if (olapTable.isMorTable() && ConnectContext.get() != null) {
String dbName = olapTable.getQualifiedDbName();
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential null pointer exception. The method getQualifiedDbName() can return null (as seen in Table.java line 367-369). This null value would then be passed to isMorValuePredicatePushdownEnabled() which concatenates it with the table name on line 4691, potentially resulting in "null.tableName". Consider adding a null check for dbName or using an alternative method like getDBName() which handles the null case.

Suggested change
String dbName = olapTable.getQualifiedDbName();
String dbName = olapTable.getQualifiedDbName();
if (dbName == null) {
dbName = olapTable.getDBName();
}

Copilot uses AI. Check for mistakes.
String trimmed = enableMorValuePredicatePushdownTables.trim();
if ("*".equals(trimmed)) {
return true;
}
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method doesn't handle null dbName parameter. If dbName is null, line 4691 will create "null.tableName" which could lead to unexpected behavior. Consider adding a null check for dbName at the beginning of the method, or using Objects.requireNonNull() to fail fast, or handle the null case explicitly by using only tableName for comparison when dbName is null.

Suggested change
}
}
// When dbName is null, only compare using tableName to avoid creating "null.tableName".
if (dbName == null) {
for (String table : trimmed.split(",")) {
if (table.trim().equalsIgnoreCase(tableName)) {
return true;
}
}
return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +4682 to +4699
public boolean isMorValuePredicatePushdownEnabled(String dbName, String tableName) {
if (enableMorValuePredicatePushdownTables == null
|| enableMorValuePredicatePushdownTables.isEmpty()) {
return false;
}
String trimmed = enableMorValuePredicatePushdownTables.trim();
if ("*".equals(trimmed)) {
return true;
}
String fullName = dbName + "." + tableName;
for (String table : trimmed.split(",")) {
if (table.trim().equalsIgnoreCase(fullName)
|| table.trim().equalsIgnoreCase(tableName)) {
return true;
}
}
return false;
}
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new session variable and its helper method isMorValuePredicatePushdownEnabled() lack test coverage. Consider adding unit tests in SessionVariablesTest to verify:

  1. Empty string handling (default behavior)
  2. Wildcard '*' behavior
  3. Single table name matching (with and without database prefix)
  4. Multiple table names (comma-separated)
  5. Case-insensitive matching
  6. Whitespace handling in table names
  7. Edge cases like null dbName parameter

Copilot uses AI. Check for mistakes.
Comment on lines +4693 to +4694
if (table.trim().equalsIgnoreCase(fullName)
|| table.trim().equalsIgnoreCase(tableName)) {
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method doesn't handle edge cases well when splitting by comma. If the variable contains only commas or has consecutive commas (e.g., "db.table1,,db.table2"), the split operation will produce empty strings. Calling trim() on empty strings and comparing them could lead to false positives. Consider filtering out empty strings after trimming, or using a more robust parsing approach.

Suggested change
if (table.trim().equalsIgnoreCase(fullName)
|| table.trim().equalsIgnoreCase(tableName)) {
String normalized = table.trim();
if (normalized.isEmpty()) {
continue;
}
if (normalized.equalsIgnoreCase(fullName)
|| normalized.equalsIgnoreCase(tableName)) {

Copilot uses AI. Check for mistakes.
@dataroaring dataroaring force-pushed the mor_value_predicate_pushdown_control branch from 94bc596 to 8eadc41 Compare February 4, 2026 17:46
@dataroaring
Copy link
Contributor Author

run buildall

@dataroaring dataroaring force-pushed the mor_value_predicate_pushdown_control branch from 8eadc41 to 4a5d300 Compare February 4, 2026 18:17
@dataroaring
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.29% (1792/2260)
Line Coverage 64.74% (31826/49158)
Region Coverage 65.42% (15884/24280)
Branch Coverage 55.95% (8436/15078)

@doris-robot
Copy link

TPC-H: Total hot run time: 31991 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4a5d300b0d5cc30010a29223454252523fcdd336, data reload: false

------ Round 1 ----------------------------------
q1	17641	5168	5056	5056
q2	2008	315	238	238
q3	10189	1317	750	750
q4	10204	825	325	325
q5	7542	2159	1895	1895
q6	195	182	151	151
q7	868	757	597	597
q8	9268	1425	1052	1052
q9	5292	4925	4820	4820
q10	6815	1963	1560	1560
q11	524	289	281	281
q12	368	365	222	222
q13	17791	4077	3239	3239
q14	235	232	215	215
q15	905	837	813	813
q16	676	680	622	622
q17	668	815	430	430
q18	6787	6502	7583	6502
q19	1207	1118	677	677
q20	393	381	251	251
q21	2920	2340	1988	1988
q22	374	325	307	307
Total cold run time: 102870 ms
Total hot run time: 31991 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5678	5528	5456	5456
q2	264	345	277	277
q3	2455	2869	2512	2512
q4	1443	1887	1388	1388
q5	4766	4425	4692	4425
q6	218	188	133	133
q7	2047	1968	1805	1805
q8	2556	2394	2325	2325
q9	7387	7608	7560	7560
q10	2838	3097	2670	2670
q11	557	475	462	462
q12	675	706	548	548
q13	3496	4070	3245	3245
q14	274	288	272	272
q15	839	813	779	779
q16	630	681	633	633
q17	1084	1239	1294	1239
q18	7568	7457	7374	7374
q19	811	816	807	807
q20	1968	2095	1879	1879
q21	4541	4205	4144	4144
q22	586	535	492	492
Total cold run time: 52681 ms
Total hot run time: 50425 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.16 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4a5d300b0d5cc30010a29223454252523fcdd336, data reload: false

query1	0.06	0.04	0.04
query2	0.13	0.07	0.07
query3	0.30	0.07	0.08
query4	1.60	0.10	0.10
query5	0.26	0.25	0.25
query6	1.14	0.65	0.64
query7	0.03	0.03	0.03
query8	0.08	0.06	0.07
query9	0.58	0.50	0.50
query10	0.55	0.54	0.55
query11	0.26	0.14	0.13
query12	0.26	0.13	0.14
query13	0.63	0.61	0.60
query14	0.99	0.99	0.98
query15	0.92	0.83	0.83
query16	0.39	0.38	0.40
query17	1.00	1.00	1.01
query18	0.24	0.23	0.23
query19	1.81	1.87	1.83
query20	0.02	0.02	0.02
query21	15.41	0.33	0.29
query22	4.97	0.13	0.12
query23	15.38	0.44	0.27
query24	2.25	0.57	0.41
query25	0.11	0.11	0.10
query26	0.20	0.18	0.18
query27	0.11	0.10	0.11
query28	3.62	1.15	0.98
query29	12.51	4.04	3.29
query30	0.33	0.13	0.13
query31	2.80	0.68	0.45
query32	3.23	0.64	0.49
query33	3.09	3.05	3.04
query34	16.11	5.06	4.42
query35	4.44	4.47	4.42
query36	0.62	0.51	0.50
query37	0.31	0.09	0.09
query38	0.27	0.06	0.06
query39	0.08	0.05	0.05
query40	0.21	0.16	0.17
query41	0.13	0.08	0.07
query42	0.09	0.05	0.06
query43	0.09	0.07	0.06
Total cold run time: 97.61 s
Total hot run time: 28.16 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 4.76% (1/21) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 25.00% (2/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.55% (19369/36856)
Line Coverage 36.04% (179897/499203)
Region Coverage 32.39% (139423/430406)
Branch Coverage 33.40% (60395/180802)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 87.50% (7/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.60% (25862/36120)
Line Coverage 54.26% (270210/498008)
Region Coverage 51.85% (225458/434822)
Branch Coverage 53.25% (96668/181532)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 71.43% (15/21) 🎉
Increment coverage report
Complete coverage report

@dataroaring dataroaring force-pushed the mor_value_predicate_pushdown_control branch from 4a5d300 to a06783e Compare February 5, 2026 00:04
@dataroaring dataroaring changed the title [feature](scan) Add session variable to control value predicate pushdown for MOR tables [feature](scan) Add value predicate pushdown control for MOR tables Feb 5, 2026
@dataroaring dataroaring force-pushed the mor_value_predicate_pushdown_control branch 3 times, most recently from a0385a9 to 7d4aa18 Compare February 5, 2026 01:30
@dataroaring
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.29% (1792/2260)
Line Coverage 64.74% (31827/49158)
Region Coverage 65.43% (15886/24280)
Branch Coverage 55.94% (8434/15078)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 4.76% (1/21) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-H: Total hot run time: 31169 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7d4aa18ff98daf6854720549735e19260aa7d84d, data reload: false

------ Round 1 ----------------------------------
q1	17639	4609	4340	4340
q2	2058	366	228	228
q3	10457	1369	738	738
q4	10332	841	311	311
q5	9262	2214	1892	1892
q6	204	179	144	144
q7	877	746	600	600
q8	9270	1566	1061	1061
q9	5426	4911	4782	4782
q10	6854	1978	1586	1586
q11	523	296	281	281
q12	382	380	221	221
q13	17800	4075	3275	3275
q14	232	248	218	218
q15	897	825	800	800
q16	695	688	658	658
q17	778	803	485	485
q18	6840	6538	6615	6538
q19	1386	1003	634	634
q20	382	356	249	249
q21	2642	2151	1848	1848
q22	359	311	280	280
Total cold run time: 105295 ms
Total hot run time: 31169 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4458	4339	4372	4339
q2	260	337	245	245
q3	2142	2620	2287	2287
q4	1321	1753	1325	1325
q5	4347	4176	4225	4176
q6	214	178	133	133
q7	1878	2259	1874	1874
q8	2568	2431	2476	2431
q9	7486	7619	7579	7579
q10	2910	3114	2668	2668
q11	533	470	449	449
q12	697	805	687	687
q13	3894	4481	3587	3587
q14	290	322	300	300
q15	894	848	832	832
q16	715	746	679	679
q17	1201	1327	1421	1327
q18	8384	7965	7783	7783
q19	914	864	832	832
q20	2081	2144	2009	2009
q21	4913	4185	4112	4112
q22	601	539	496	496
Total cold run time: 52701 ms
Total hot run time: 50150 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.05 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7d4aa18ff98daf6854720549735e19260aa7d84d, data reload: false

query1	0.06	0.05	0.05
query2	0.13	0.07	0.08
query3	0.31	0.07	0.07
query4	1.60	0.10	0.10
query5	0.25	0.26	0.24
query6	1.14	0.65	0.64
query7	0.03	0.02	0.02
query8	0.08	0.06	0.06
query9	0.58	0.50	0.48
query10	0.55	0.56	0.55
query11	0.27	0.13	0.13
query12	0.26	0.14	0.15
query13	0.63	0.61	0.60
query14	0.98	0.98	0.99
query15	0.91	0.82	0.82
query16	0.39	0.39	0.40
query17	1.06	1.06	1.02
query18	0.25	0.22	0.22
query19	1.96	1.82	1.85
query20	0.02	0.01	0.02
query21	15.39	0.33	0.29
query22	4.97	0.11	0.11
query23	15.38	0.44	0.28
query24	2.39	0.56	0.41
query25	0.12	0.10	0.10
query26	0.20	0.18	0.18
query27	0.10	0.11	0.11
query28	3.69	1.16	0.98
query29	12.53	4.03	3.29
query30	0.32	0.13	0.10
query31	2.81	0.69	0.43
query32	3.23	0.63	0.50
query33	3.07	3.05	3.04
query34	16.42	5.06	4.43
query35	4.44	4.40	4.47
query36	0.60	0.49	0.49
query37	0.30	0.09	0.08
query38	0.28	0.05	0.06
query39	0.08	0.05	0.05
query40	0.20	0.16	0.16
query41	0.14	0.06	0.06
query42	0.09	0.06	0.05
query43	0.06	0.07	0.05
Total cold run time: 98.27 s
Total hot run time: 28.05 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 30.00% (6/20) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.55% (19369/36860)
Line Coverage 36.03% (179893/499238)
Region Coverage 32.38% (139366/430442)
Branch Coverage 33.40% (60395/180828)

@dataroaring
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.29% (1792/2260)
Line Coverage 64.76% (31833/49158)
Region Coverage 65.42% (15885/24280)
Branch Coverage 55.96% (8438/15078)

airborne12
airborne12 previously approved these changes Mar 3, 2026
Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 3, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR approved by anyone and no changes requested.

dataroaring and others added 4 commits March 3, 2026 19:13
…own for MOR tables

Add a new session variable `enable_mor_value_predicate_pushdown_tables` to allow
users to selectively enable value column predicate pushdown for MOR (Merge-On-Read)
tables. This can improve query performance by utilizing inverted indexes on value
columns for filtering.

The session variable accepts:
- Comma-separated table names: `db1.tbl1,db2.tbl2`
- Wildcard for all MOR tables: `*`
- Empty string to disable (default)

Changes:
- Add session variable in SessionVariable.java with helper method
- Add isMorTable() helper in OlapTable.java
- Add Thrift field enable_mor_value_predicate_pushdown in TOlapScanNode
- Set flag in OlapScanNode.toThrift() based on session variable
- Add virtual method _should_push_down_mor_value_predicate() in scan_operator
- Implement override in olap_scan_operator to read the flag
- Modify predicate pushdown condition in scan_operator.cpp
…bles

Add session variable enable_mor_value_predicate_pushdown_tables to control
value predicate pushdown for MOR (Merge-On-Read) tables. When enabled,
value column predicates are pushed to the storage layer for filtering.

Key changes:
- Add enable_mor_value_predicate_pushdown flag to ReaderParams and RowsetReaderContext
- Extend BetaRowsetReader::_should_push_down_value_predicates() to check the flag
- Drop __DORIS_DELETE_SIGN__ predicate from storage-layer pushdown to prevent
  deleted rows from reappearing (delete sign must only be evaluated post-merge)
- Add regression test covering dedup, delete sign, delete predicate, and update scenarios
…OR tables without merge

Add a per-table session variable that allows reading MOR (Merge-On-Read)
UNIQUE tables as DUP tables, skipping storage engine merge and delete
sign filter while still honoring delete predicates.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…omponent parsing

Address review feedback: replace naive string concatenation matching with
component-based parsing (split by ".") consistent with TableNameInfo, fixing
null dbName handling and empty entry edge cases.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@dataroaring
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Mar 4, 2026
@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.35% (1798/2266)
Line Coverage 64.55% (32217/49910)
Region Coverage 65.45% (16135/24653)
Branch Coverage 55.93% (8591/15360)

@doris-robot
Copy link

TPC-H: Total hot run time: 29334 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1c6db7775dde85d603a2291fda9366d1e3fb8ed2, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17656	4557	5286	4557
q2	q3	10753	834	534	534
q4	4739	378	251	251
q5	8145	1199	1013	1013
q6	237	176	149	149
q7	807	857	659	659
q8	10776	1488	1335	1335
q9	6522	4891	4753	4753
q10	6831	1884	1648	1648
q11	486	258	249	249
q12	736	566	467	467
q13	17773	4241	3439	3439
q14	221	233	214	214
q15	933	785	810	785
q16	749	721	663	663
q17	713	854	430	430
q18	6072	5437	5203	5203
q19	1125	1046	815	815
q20	545	612	419	419
q21	4520	1929	1497	1497
q22	365	314	254	254
Total cold run time: 100704 ms
Total hot run time: 29334 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4654	4579	4548	4548
q2	q3	1820	2226	1767	1767
q4	855	1203	791	791
q5	4057	4421	4423	4421
q6	184	176	139	139
q7	1801	1680	1531	1531
q8	2478	2713	2576	2576
q9	7500	7442	7325	7325
q10	2726	2992	2490	2490
q11	603	438	412	412
q12	507	581	446	446
q13	3971	4480	3540	3540
q14	283	296	272	272
q15	871	791	833	791
q16	721	803	710	710
q17	1156	1506	1342	1342
q18	7210	6848	6658	6658
q19	958	890	852	852
q20	2060	2174	2051	2051
q21	3956	3510	3344	3344
q22	465	413	360	360
Total cold run time: 48836 ms
Total hot run time: 46366 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184013 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1c6db7775dde85d603a2291fda9366d1e3fb8ed2, data reload: false

query5	4863	626	518	518
query6	344	253	218	218
query7	4238	489	274	274
query8	352	252	271	252
query9	8754	2740	2704	2704
query10	538	373	337	337
query11	16921	17273	17349	17273
query12	213	155	150	150
query13	1323	486	368	368
query14	6596	3341	3112	3112
query14_1	2920	2924	2920	2920
query15	217	206	190	190
query16	1034	526	493	493
query17	2082	780	654	654
query18	2628	478	358	358
query19	241	239	192	192
query20	147	138	137	137
query21	222	151	139	139
query22	5652	4912	4976	4912
query23	17161	16844	16658	16658
query23_1	16778	16617	16708	16617
query24	7144	1630	1237	1237
query24_1	1236	1200	1243	1200
query25	539	448	388	388
query26	1234	257	157	157
query27	2774	468	289	289
query28	4462	1858	1873	1858
query29	784	559	472	472
query30	316	253	211	211
query31	878	711	653	653
query32	79	72	71	71
query33	508	350	285	285
query34	928	906	566	566
query35	623	676	590	590
query36	1057	1095	990	990
query37	131	97	81	81
query38	2988	2948	2896	2896
query39	893	883	836	836
query39_1	833	817	827	817
query40	233	154	134	134
query41	65	62	58	58
query42	107	102	102	102
query43	388	389	353	353
query44	
query45	195	190	182	182
query46	909	1008	616	616
query47	2129	2127	2046	2046
query48	308	311	230	230
query49	626	470	408	408
query50	686	276	217	217
query51	4101	4050	4080	4050
query52	107	107	97	97
query53	296	336	285	285
query54	293	272	270	270
query55	92	88	87	87
query56	307	311	325	311
query57	1389	1358	1259	1259
query58	285	279	277	277
query59	2630	2739	2521	2521
query60	341	348	327	327
query61	151	147	145	145
query62	628	612	563	563
query63	311	281	280	280
query64	4852	1263	991	991
query65	
query66	1438	466	363	363
query67	16348	16418	16321	16321
query68	
query69	395	312	291	291
query70	983	1042	939	939
query71	345	302	306	302
query72	2836	2629	2437	2437
query73	542	551	321	321
query74	9960	9881	9753	9753
query75	2834	2737	2460	2460
query76	2280	1055	675	675
query77	367	384	309	309
query78	11150	11247	10620	10620
query79	2552	797	622	622
query80	1781	631	539	539
query81	561	283	244	244
query82	994	148	113	113
query83	341	269	248	248
query84	248	129	97	97
query85	876	506	433	433
query86	409	300	289	289
query87	3091	3073	2987	2987
query88	3554	2661	2640	2640
query89	423	379	347	347
query90	1982	183	179	179
query91	161	158	134	134
query92	80	74	72	72
query93	1212	856	518	518
query94	633	329	306	306
query95	604	413	321	321
query96	631	511	235	235
query97	2454	2504	2415	2415
query98	242	234	226	226
query99	1006	976	955	955
Total cold run time: 256786 ms
Total hot run time: 184013 ms

@dataroaring
Copy link
Contributor Author

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR adds two session variables (enable_mor_value_predicate_pushdown_tables and read_mor_as_dup_tables) to control value column predicate pushdown and merge-skipping for MOR (Merge-On-Read) unique key tables. The implementation spans FE (session variable, plan generation, Nereids rules), Thrift protocol, and BE (scan operator, tablet reader, rowset reader), with regression tests and FE unit tests.

Critical Checkpoint Conclusions

1. Goal and correctness: The feature enables per-segment value predicate pushdown for MOR tables. However, enable_mor_value_predicate_pushdown can produce incorrect results on MOR tables with sequence columns (see inline comment on beta_rowset_reader.cpp). The existing safe path explicitly guards against this with sequence_id_idx == -1, but the new flag bypasses this check unconditionally.

2. Modification scope and focus: The changes are well-scoped to the feature. Two session variables for two distinct behaviors (value predicate pushdown vs full DUP-mode reading) is a clean design.

3. Concurrency: No concurrency concerns — the feature propagates immutable flags through the read path.

4. Lifecycle management: No lifecycle concerns.

5. Configuration items: Two session variables added with needForward = true. read_mor_as_dup_tables correctly has affectQueryResultInPlan = true. However, enable_mor_value_predicate_pushdown_tables does NOT have affectQueryResultInPlan despite the fact that it can affect query results (the test explicitly demonstrates stale data).

6. Incompatible changes: New optional Thrift fields (24, 25) on TOlapScanNode are backward-compatible.

7. Parallel code paths: All parallel paths (Nereids planner, legacy planner via shared OlapScanNode.toThrift(), BE storage layer) are covered.

8. Special conditional checks: The delete sign exclusion logic in tablet_reader.cpp is correct and well-commented.

9. Test coverage: Good breadth (dedup, delete markers, delete predicates, MOW/DUP unaffected, multiple tables, wildcard). However, no test for MOR tables with sequence columns — the most dangerous scenario. Regression tests have style violations (tables dropped at end, def tableName usage).

10. Observability: No new metrics or logging added. Given this is a user-controlled session variable, this is acceptable.

11. Transaction/persistence: Not applicable.

12. Data writes: Not applicable (read-only feature).

13. FE-BE variable passing: Thrift fields added and set in OlapScanNode.toThrift(), consumed correctly in BE.

14. Performance: The isTableInList() helper parses the comma-separated string on every call without caching. For queries scanning many partitions/tablets this could add up, though it's likely negligible in practice.

15. Other issues: NPE risk in LogicalOlapScan.java where ConnectContext.get() is called without null check (see inline comment).

Issue Summary

# Severity Description
1 HIGH enable_mor_value_predicate_pushdown bypasses sequence column safety check — can produce incorrect results on MOR tables with sequence columns
2 HIGH NPE risk in LogicalOlapScan.javaConnectContext.get() not null-checked
3 MEDIUM Test explicitly asserts stale/incorrect data as expected behavior (select_update_enabled_old), proving the feature can return wrong results when keys are updated
4 MEDIUM Redundant C-style casts of _local_state in olap_scanner.cpp
5 LOW Regression test style: tables dropped at end of test, def tableName usage

_read_context->enable_unique_key_merge_on_write);
_read_context->enable_unique_key_merge_on_write ||
_read_context->enable_mor_value_predicate_pushdown);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Sequence column safety bypass: The new enable_mor_value_predicate_pushdown flag is OR'd unconditionally, bypassing the existing safety check _read_context->sequence_id_idx == -1.

When a MOR table has a sequence column, the merge winner is determined by the sequence value (not version order). Pushing value predicates to individual segments can filter out the correct merge winner, causing a different (wrong) row to survive merge.

Example: key=1 in Segment A has (value=10, seq=100) and in Segment B has (value=20, seq=50). With predicate value > 15:

  • Without pushdown: merge picks A (seq=100 wins), then predicate filters it → empty. Correct.
  • With pushdown: A filtered at segment level, B passes. Merge sees only B → returns (key=1, value=20). Wrong — A should have won.

Consider either:

  1. Adding a sequence_id_idx == -1 guard to this branch as well, or
  2. Documenting this limitation clearly in the session variable description and adding a WARNING log when the flag is enabled on a table with a sequence column.

if (getTable().getKeysType() == KeysType.UNIQUE_KEYS
&& getTable().isMorTable()
&& ConnectContext.get().getSessionVariable().isReadMorAsDupEnabled(
getTable().getQualifiedDbName(), getTable().getName())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] NPE risk: ConnectContext.get() can return null (e.g., during background tasks, internal metadata operations, or unit tests without thread-local ConnectContext). This will cause a NullPointerException.

Other places in this PR correctly guard with ConnectContext.get() != null (e.g., SetPreAggStatus.java line 158, OlapScanNode.java line 1194). This location should be consistent:

if (ConnectContext.get() != null
        && getTable().getKeysType() == KeysType.UNIQUE_KEYS
        && getTable().isMorTable()
        && ConnectContext.get().getSessionVariable().isReadMorAsDupEnabled(
            getTable().getQualifiedDbName(), getTable().getName())) {
    return;
}

Note: the pre-existing skipDeleteBitmap check at line 768 has the same NPE risk.

if (olap_scan_node.__isset.enable_mor_value_predicate_pushdown &&
olap_scan_node.enable_mor_value_predicate_pushdown) {
_tablet_reader_params.enable_mor_value_predicate_pushdown = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] enable_mor_value_predicate_pushdown is only set when !skip_storage_engine_merge(): This means when read_mor_as_dup is active (which sets direct_mode = true and skips merge at line 340), this flag is NOT set. This is correct because in DUP mode there's no merge to interfere with. But the interaction between these two flags should be documented — currently a reader has to trace through both code paths to understand this.

Also note: at line 337, olap_local_state is obtained via proper static_cast, but at line 480 (just above this block), olap_scan_local_state is obtained via C-style cast (pipeline::OlapScanLocalState*). Please use consistent static_cast and consider reusing the variable from line 337 to avoid the redundant cast.


// v1=100: rowset 2 (v1=500) filtered per-segment, old version survives.
// Returns stale data — this proves pushdown is filtering at storage layer.
qt_select_update_enabled_old """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Test explicitly asserts stale data as correct behavior: This test proves that enable_mor_value_predicate_pushdown can return incorrect results (1, 100, old instead of empty) when keys have been updated across rowsets. The .out file records this stale result as the expected output.

While the test comment explains this is intentional to prove the pushdown is happening, having a test that asserts incorrect query results as "expected" is concerning. Users who enable this session variable may not realize it can return stale data for any key that has been updated.

Consider:

  1. Adding a prominent WARNING in the session variable description that this feature is only safe for insert-only/dedup-only workloads and will produce incorrect results if keys are ever updated.
  2. Potentially logging a warning when the session variable is set.

"""

// Cleanup
sql "SET enable_mor_value_predicate_pushdown_tables = ''"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Test style: Per coding standards, tables should be dropped before use (not after tests end), to preserve the environment for debugging. This cleanup block at the end should be removed — the DROP TABLE IF EXISTS before each CREATE TABLE already handles cleanup.

assertTrue(table2DupResult.size() == 2, "Table2 in DUP mode should return 2 rows, got ${table2DupResult.size()}")

// === Cleanup ===
sql "SET read_mor_as_dup_tables = '';"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Test style: Same issue as the other test — tables should not be dropped at the end of tests. Remove this cleanup block to preserve tables for debugging.

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 25.00% (6/24) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.60% (19650/37359)
Line Coverage 36.20% (183411/506680)
Region Coverage 32.48% (142203/437882)
Branch Coverage 33.45% (61715/184511)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (18/24) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.27% (20951/36581)
Line Coverage 40.35% (203801/505131)
Region Coverage 37.18% (164338/442011)
Branch Coverage 37.71% (69796/185073)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 84.00% (42/50) 🎉
Increment coverage report
Complete coverage report

@dataroaring
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.35% (1798/2266)
Line Coverage 64.55% (32215/49910)
Region Coverage 65.41% (16126/24653)
Branch Coverage 55.94% (8592/15360)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 58.00% (29/50) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-H: Total hot run time: 27781 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dedf9fed9459d1e5580642e7ebd0a5be9df4569e, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17639	4468	4352	4352
q2	q3	10791	799	527	527
q4	4729	354	250	250
q5	8193	1188	1033	1033
q6	232	173	146	146
q7	815	838	675	675
q8	10896	1491	1384	1384
q9	6495	4712	4737	4712
q10	6565	1959	1654	1654
q11	487	252	246	246
q12	782	569	488	488
q13	18069	2952	2195	2195
q14	227	232	215	215
q15	934	801	820	801
q16	731	714	671	671
q17	707	860	427	427
q18	6593	5373	5252	5252
q19	1134	995	599	599
q20	492	515	395	395
q21	4517	1917	1468	1468
q22	368	324	291	291
Total cold run time: 101396 ms
Total hot run time: 27781 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4649	4463	4583	4463
q2	q3	3927	4345	3799	3799
q4	880	1177	770	770
q5	4061	4539	4386	4386
q6	187	179	138	138
q7	1828	1629	1541	1541
q8	2480	2756	2555	2555
q9	7439	7411	7365	7365
q10	3858	3931	3566	3566
q11	513	421	413	413
q12	470	584	455	455
q13	2630	3315	2311	2311
q14	294	288	283	283
q15	862	812	788	788
q16	711	789	711	711
q17	1156	1448	1377	1377
q18	7205	6895	6659	6659
q19	890	859	866	859
q20	2147	2175	2023	2023
q21	4049	3475	3449	3449
q22	511	430	374	374
Total cold run time: 50747 ms
Total hot run time: 48285 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153312 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dedf9fed9459d1e5580642e7ebd0a5be9df4569e, data reload: false

query5	4340	632	516	516
query6	329	222	210	210
query7	4210	465	266	266
query8	330	246	243	243
query9	8710	2761	2749	2749
query10	489	385	333	333
query11	7258	5853	5642	5642
query12	185	131	128	128
query13	1272	476	353	353
query14	5849	3801	3604	3604
query14_1	2855	2834	2817	2817
query15	204	197	173	173
query16	968	474	452	452
query17	1079	690	613	613
query18	2461	431	344	344
query19	202	202	174	174
query20	135	129	126	126
query21	220	147	119	119
query22	4907	4782	4842	4782
query23	16744	16099	15642	15642
query23_1	15870	15797	16006	15797
query24	7408	1656	1256	1256
query24_1	1279	1276	1286	1276
query25	597	521	490	490
query26	2530	270	161	161
query27	2849	469	308	308
query28	4499	1902	1918	1902
query29	856	586	493	493
query30	323	252	214	214
query31	1373	1288	1218	1218
query32	87	74	77	74
query33	518	337	288	288
query34	977	922	564	564
query35	658	700	595	595
query36	1068	1128	1003	1003
query37	132	98	88	88
query38	2941	2933	2855	2855
query39	886	872	856	856
query39_1	835	834	817	817
query40	249	156	140	140
query41	70	64	66	64
query42	311	311	302	302
query43	243	247	229	229
query44	
query45	205	191	184	184
query46	878	978	623	623
query47	2155	2137	2047	2047
query48	349	325	244	244
query49	653	489	393	393
query50	683	279	217	217
query51	4259	4176	4006	4006
query52	288	292	282	282
query53	295	339	289	289
query54	320	286	276	276
query55	98	86	90	86
query56	342	351	323	323
query57	1394	1327	1287	1287
query58	307	337	273	273
query59	1323	1413	1262	1262
query60	350	339	327	327
query61	147	143	140	140
query62	614	581	540	540
query63	319	278	270	270
query64	4971	1279	1021	1021
query65	
query66	1457	472	372	372
query67	16319	16367	16375	16367
query68	
query69	384	307	280	280
query70	961	929	965	929
query71	336	307	319	307
query72	2829	2632	2417	2417
query73	538	547	329	329
query74	9952	9921	9803	9803
query75	2851	2781	2476	2476
query76	2283	1041	679	679
query77	355	373	309	309
query78	11235	11377	10647	10647
query79	1147	804	608	608
query80	1324	614	555	555
query81	572	275	257	257
query82	1026	148	117	117
query83	326	260	250	250
query84	241	117	98	98
query85	892	475	427	427
query86	423	306	287	287
query87	3138	3105	3002	3002
query88	3538	2679	2682	2679
query89	428	364	353	353
query90	2032	174	173	173
query91	162	157	134	134
query92	78	77	70	70
query93	955	859	527	527
query94	650	321	287	287
query95	601	408	316	316
query96	652	511	230	230
query97	2479	2554	2437	2437
query98	226	216	211	211
query99	1001	1004	926	926
Total cold run time: 235783 ms
Total hot run time: 153312 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 25.00% (6/24) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.56% (19650/37387)
Line Coverage 36.18% (183456/507031)
Region Coverage 32.52% (142442/438059)
Branch Coverage 33.47% (61753/184476)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (24/24) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.54% (26191/36609)
Line Coverage 54.31% (274537/505492)
Region Coverage 51.59% (228123/442207)
Branch Coverage 52.94% (97963/185052)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants