Skip to content

[feature](lineage) Support lineage SPI framework for data lineage collection#61004

Open
seawinde wants to merge 5 commits intoapache:masterfrom
seawinde:lineage_opensource
Open

[feature](lineage) Support lineage SPI framework for data lineage collection#61004
seawinde wants to merge 5 commits intoapache:masterfrom
seawinde:lineage_opensource

Conversation

@seawinde
Copy link
Member

@seawinde seawinde commented Mar 3, 2026

What problem does this PR solve?

What problem does this PR solve?

This PR introduces a data lineage collection framework for Apache Doris, enabling automatic extraction and reporting of table-level and column-level lineage from DML operations (INSERT INTO SELECT, INSERT OVERWRITE, CREATE TABLE AS SELECT).

The framework is designed as an SPI (Service Provider Interface) to allow pluggable lineage consumers, aligned with the existing fe-authentication SPI architecture using fe-extension-spi and fe-extension-loader.

Architecture Design

┌─────────────────────────────────────────────────────────────────────┐
│                         SQL Execution                               │
│  InsertIntoTableCommand / InsertOverwriteTableCommand / CTAS        │
└──────────────────────────┬──────────────────────────────────────────┘
                           │ PlannerHook callback
                           ▼
                    ┌──────────────┐
                    │ LineageUtils │  Extracts lineage from Nereids plan
                    └──────┬───────┘
                           │ submitLineageEvent(LineageInfo)
                           ▼
              ┌────────────────────────┐
              │ LineageEventProcessor  │  Async event queue + worker thread
              │                        │
              │  ┌──────────────────┐  │
              │  │  Dual Discovery  │  │
              │  │  1. ServiceLoader│  │  Built-in (classpath)
              │  │  2. DirPluginMgr │  │  External ($plugin_dir/lineage/)
              │  └──────────────────┘  │
              │                        │
              │  ClassLoadingPolicy    │  Parent-first: org.apache.doris.nereids.lineage.*
              │  PluginContext         │  Configuration injection (plugin.path, plugin.name)
              └───────────┬────────────┘
                          │ plugin.exec(LineageInfo)
                          ▼
              ┌────────────────────────┐
              │   LineagePlugin (SPI)  │  Extends fe-extension-spi Plugin
              │   LineagePluginFactory │  Extends fe-extension-spi PluginFactory
              └────────────────────────┘
                          │
              ┌───────────┴────────────┐
              │  External Plugins      │  Discovered from $plugin_dir/lineage/
              │  (e.g. DataWorks)      │  via DirectoryPluginRuntimeManager
              └────────────────────────┘

Key Components

1. Lineage Extraction (LineageInfoExtractor)

  • Traverses the Nereids logical/physical plan tree to extract source-target relationships
  • Supports table-level lineage: which source tables contribute to each target table
  • Supports column-level lineage: which source columns map to each target column, including through expressions (functions, casts, aggregations)
  • Handles complex query patterns: JOINs, subqueries, UNIONs, window functions, CTEs

2. SPI Interfaces (aligned with fe-authentication)

Interface Base Class Description
LineagePlugin extends Plugin (fe-extension-spi) Plugin contract: eventFilter() + exec(LineageInfo)
LineagePluginFactory extends PluginFactory (fe-extension-spi) Factory contract: name() + create()

3. Event Processing (LineageEventProcessor)

  • Dual discovery (aligned with AuthenticationPluginManager):
    • ServiceLoader.load() for built-in classpath plugins
    • DirectoryPluginRuntimeManager.loadAll() for external directory plugins
  • ClassLoadingPolicy: parent-first for org.apache.doris.nereids.lineage.* to ensure SPI interface identity
  • Async processing: bounded BlockingQueue (configurable via lineage_event_queue_size) + daemon worker thread
  • Fault tolerance: plugin exceptions are caught and logged, never crash the worker

4. Integration Points

  • NereidsPlannerPlannerHook callback after plan optimization
  • InsertIntoTableCommand / InsertOverwriteTableCommand / CreateTableCommand → trigger lineage extraction on successful execution
  • ConnectContext → carries lineage metadata (query ID, user, database, timing)
  • Env.javalineageEventProcessor.start() at FE startup

Configuration

Config Default Description
activate_lineage_plugin {} (empty) Active plugin names, e.g. dataworks
lineage_event_queue_size 50000 Max pending lineage events
plugin_dir $DORIS_HOME/plugins External plugin root directory

How to Use

  1. Implement LineagePlugin + LineagePluginFactory in an external module
  2. Register via META-INF/services/org.apache.doris.nereids.lineage.LineagePluginFactory
  3. Deploy jar + plugin.conf to $plugin_dir/lineage/{plugin_name}/
  4. Set activate_lineage_plugin = {plugin_name} in fe.conf

Testing

  • LineageEventProcessorTest: 11 tests covering plugin lifecycle, event dispatch, error handling, SPI factory contract
  • LineageInfoExtractorTest: 25+ tests covering INSERT INTO, CTAS, INSERT OVERWRITE, JOINs, subqueries, expressions, multi-catalog
  • LineageUtilsSkipTest: Tests for lineage skip conditions

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 3, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Member Author

seawinde commented Mar 3, 2026

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 56.51% (408/722) 🎉
Increment coverage report
Complete coverage report

@seawinde seawinde force-pushed the lineage_opensource branch from 7463610 to 36125d9 Compare March 4, 2026 02:04
@seawinde
Copy link
Member Author

seawinde commented Mar 4, 2026

run buildall

@seawinde seawinde changed the title [feature](lineage) Support lineage SPI framework for dataworks lineage collection [feature](lineage) Support lineage SPI framework for data lineage collection Mar 4, 2026
@seawinde seawinde force-pushed the lineage_opensource branch from 36125d9 to b958837 Compare March 4, 2026 08:01
@seawinde
Copy link
Member Author

seawinde commented Mar 4, 2026

run buildall

@seawinde seawinde force-pushed the lineage_opensource branch from b958837 to 954df99 Compare March 4, 2026 11:29
@seawinde
Copy link
Member Author

seawinde commented Mar 4, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28741 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 954df999a81a4753fd73c884173f9853996141a9, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17668	4497	4367	4367
q2	q3	10649	767	520	520
q4	4674	350	259	259
q5	7549	1208	1039	1039
q6	176	175	152	152
q7	783	839	684	684
q8	9319	1452	1319	1319
q9	4966	4786	4594	4594
q10	6849	1854	1648	1648
q11	466	243	255	243
q12	738	580	469	469
q13	17768	4239	3428	3428
q14	238	244	213	213
q15	955	798	794	794
q16	769	735	686	686
q17	724	861	423	423
q18	5914	5344	5171	5171
q19	1140	977	585	585
q20	501	487	399	399
q21	4746	1993	1483	1483
q22	383	339	265	265
Total cold run time: 96975 ms
Total hot run time: 28741 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4703	4577	4673	4577
q2	q3	1804	2216	1746	1746
q4	883	1225	782	782
q5	4158	4393	4330	4330
q6	181	180	136	136
q7	1787	1633	1546	1546
q8	2454	2704	2576	2576
q9	7566	7438	7231	7231
q10	2722	2855	2420	2420
q11	509	436	412	412
q12	503	590	454	454
q13	4119	4491	3652	3652
q14	292	290	270	270
q15	868	824	793	793
q16	786	781	711	711
q17	1163	1559	1276	1276
q18	7093	6718	6614	6614
q19	870	831	844	831
q20	2070	2160	2026	2026
q21	3998	3490	3579	3490
q22	463	428	376	376
Total cold run time: 48992 ms
Total hot run time: 46249 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184195 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 954df999a81a4753fd73c884173f9853996141a9, data reload: false

query5	4344	645	542	542
query6	329	225	197	197
query7	4213	466	276	276
query8	341	259	234	234
query9	8705	2759	2775	2759
query10	490	373	361	361
query11	16857	17491	17411	17411
query12	193	134	138	134
query13	1326	499	377	377
query14	7584	3456	3065	3065
query14_1	2976	3013	2945	2945
query15	203	202	181	181
query16	1103	469	468	468
query17	1603	779	622	622
query18	2567	451	349	349
query19	207	212	191	191
query20	135	126	127	126
query21	273	139	121	121
query22	5593	5454	4689	4689
query23	17166	16781	16654	16654
query23_1	16791	16588	16694	16588
query24	7192	1596	1229	1229
query24_1	1200	1243	1240	1240
query25	539	467	470	467
query26	1237	259	152	152
query27	2772	479	296	296
query28	4439	1883	1918	1883
query29	805	551	461	461
query30	310	237	212	212
query31	874	737	649	649
query32	88	71	70	70
query33	525	326	282	282
query34	943	922	554	554
query35	646	673	588	588
query36	1091	1090	949	949
query37	139	102	79	79
query38	2952	2884	2949	2884
query39	876	884	848	848
query39_1	820	838	822	822
query40	230	154	137	137
query41	65	61	58	58
query42	106	104	102	102
query43	372	377	352	352
query44	
query45	198	198	185	185
query46	885	976	635	635
query47	2135	2132	2036	2036
query48	337	309	239	239
query49	638	506	392	392
query50	681	289	218	218
query51	4069	4081	4024	4024
query52	106	107	99	99
query53	297	338	288	288
query54	296	273	257	257
query55	93	84	83	83
query56	325	303	320	303
query57	1416	1344	1297	1297
query58	286	279	269	269
query59	2595	2710	2510	2510
query60	342	336	317	317
query61	147	144	149	144
query62	632	590	547	547
query63	312	279	285	279
query64	4928	1284	973	973
query65	
query66	1453	447	356	356
query67	16337	16513	16518	16513
query68	
query69	410	310	297	297
query70	1017	955	945	945
query71	332	307	300	300
query72	2703	2633	2419	2419
query73	542	558	326	326
query74	9947	9912	9766	9766
query75	2853	2749	2489	2489
query76	2285	1070	666	666
query77	356	374	323	323
query78	11109	11324	10607	10607
query79	1128	801	600	600
query80	739	641	576	576
query81	499	286	265	265
query82	1320	154	120	120
query83	336	270	252	252
query84	251	122	107	107
query85	959	571	503	503
query86	385	317	304	304
query87	3125	3115	2967	2967
query88	3570	2717	2701	2701
query89	432	374	368	368
query90	1844	179	178	178
query91	185	171	151	151
query92	82	81	79	79
query93	913	856	507	507
query94	490	371	282	282
query95	591	398	318	318
query96	632	511	237	237
query97	2472	2506	2389	2389
query98	231	223	218	218
query99	1044	971	939	939
Total cold run time: 253843 ms
Total hot run time: 184195 ms

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 11.77% (85/722) 🎉
Increment coverage report
Complete coverage report

@seawinde
Copy link
Member Author

seawinde commented Mar 5, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28891 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a65dcde3f3722b2bc1978be8fbafd6f1b6a8b62f, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17647	4393	4279	4279
q2	q3	10642	768	514	514
q4	4676	357	254	254
q5	7538	1209	1033	1033
q6	177	174	147	147
q7	780	846	672	672
q8	9984	1464	1304	1304
q9	5128	4735	4681	4681
q10	6834	1863	1654	1654
q11	465	252	265	252
q12	736	577	467	467
q13	17767	4189	3424	3424
q14	227	233	206	206
q15	944	786	799	786
q16	774	714	680	680
q17	700	870	425	425
q18	5992	5339	5338	5338
q19	1267	977	610	610
q20	511	490	399	399
q21	4780	2037	1509	1509
q22	406	299	257	257
Total cold run time: 97975 ms
Total hot run time: 28891 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4627	4535	4541	4535
q2	q3	1852	2258	1738	1738
q4	872	1172	776	776
q5	4040	4446	4365	4365
q6	195	175	140	140
q7	1735	1676	1524	1524
q8	2474	2699	2502	2502
q9	7587	7319	7325	7319
q10	2664	2902	2384	2384
q11	500	424	416	416
q12	489	581	454	454
q13	4055	4419	3741	3741
q14	279	293	269	269
q15	840	850	797	797
q16	740	784	778	778
q17	1182	1722	1337	1337
q18	7072	6907	6891	6891
q19	859	829	858	829
q20	2062	2176	2050	2050
q21	3956	3460	3385	3385
q22	472	433	378	378
Total cold run time: 48552 ms
Total hot run time: 46608 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183870 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a65dcde3f3722b2bc1978be8fbafd6f1b6a8b62f, data reload: false

query5	4820	641	511	511
query6	326	226	206	206
query7	4210	465	273	273
query8	334	237	231	231
query9	8720	2735	2721	2721
query10	515	378	348	348
query11	16973	17530	17352	17352
query12	188	146	129	129
query13	1411	468	385	385
query14	7082	3300	3001	3001
query14_1	2936	3005	2923	2923
query15	224	220	195	195
query16	1097	479	370	370
query17	1088	790	658	658
query18	2790	454	347	347
query19	255	207	180	180
query20	144	136	146	136
query21	223	146	132	132
query22	5036	4989	4782	4782
query23	17150	16770	16570	16570
query23_1	16639	16738	16708	16708
query24	7134	1611	1233	1233
query24_1	1238	1236	1219	1219
query25	533	448	419	419
query26	1235	257	155	155
query27	2773	494	287	287
query28	4434	1863	1849	1849
query29	799	594	462	462
query30	307	251	208	208
query31	893	737	646	646
query32	80	72	71	71
query33	526	335	280	280
query34	934	908	554	554
query35	637	666	592	592
query36	1083	1143	958	958
query37	128	90	81	81
query38	2966	2927	2885	2885
query39	883	863	838	838
query39_1	819	845	818	818
query40	232	146	138	138
query41	61	60	58	58
query42	104	101	101	101
query43	365	374	358	358
query44	
query45	194	192	186	186
query46	875	970	601	601
query47	2108	2101	2042	2042
query48	312	313	246	246
query49	624	468	377	377
query50	679	282	218	218
query51	4073	4143	4064	4064
query52	105	110	97	97
query53	294	336	280	280
query54	298	270	280	270
query55	102	84	80	80
query56	328	322	318	318
query57	1364	1327	1272	1272
query58	287	284	281	281
query59	2586	2717	2579	2579
query60	330	337	336	336
query61	148	147	142	142
query62	628	585	549	549
query63	309	273	280	273
query64	4867	1285	1015	1015
query65	
query66	1407	462	362	362
query67	16308	16601	16394	16394
query68	
query69	397	316	287	287
query70	1014	973	957	957
query71	346	314	309	309
query72	2999	2895	2407	2407
query73	539	545	320	320
query74	9956	9913	9818	9818
query75	2818	2749	2435	2435
query76	2300	1035	673	673
query77	349	381	300	300
query78	11205	11384	10654	10654
query79	2615	806	631	631
query80	1770	623	560	560
query81	560	291	246	246
query82	999	147	115	115
query83	357	261	250	250
query84	257	116	88	88
query85	879	471	450	450
query86	424	318	323	318
query87	3097	3105	3012	3012
query88	3567	2637	2606	2606
query89	420	368	356	356
query90	2017	170	171	170
query91	163	153	133	133
query92	76	75	68	68
query93	1135	822	510	510
query94	653	329	295	295
query95	584	397	315	315
query96	634	505	228	228
query97	2455	2460	2416	2416
query98	244	228	216	216
query99	1007	1019	922	922
Total cold run time: 256521 ms
Total hot run time: 183870 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 56.63% (410/724) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 11.74% (85/724) 🎉
Increment coverage report
Complete coverage report

@morningman
Copy link
Contributor

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary: feature Support lineage SPI framework for data lineage collection

This PR introduces a data lineage collection framework with SPI plugin architecture, lineage extraction from Nereids plans, and async event processing. The overall design is sound and aligns well with the existing fe-extension-spi/fe-extension-loader architecture. The test coverage for the extractor is thorough (25+ test cases).

Below are the issues found, organized by the critical checkpoints from our review guidelines.


Critical Checkpoint Conclusions

1. Goal and Correctness: The PR accomplishes its stated goal of providing a pluggable lineage SPI framework. The lineage extraction logic correctly handles JOINs, CTEs, UNIONs, window functions, and conditional expressions. However, there is a design gap where Config.activate_lineage_plugin is not enforced at the framework level (see Issue #1).

2. Modification Scope: The PR is focused and well-scoped. Changes to existing files are minimal and non-intrusive (PlannerHook signature change is safe since no existing implementations override the analyze hooks).

3. Concurrency: The Worker thread has a subtle race where it reads the plugin list BEFORE polling the queue, meaning events can be dispatched to a stale plugin list (see Issue #2). The worker thread also has no graceful shutdown mechanism.

4. Lifecycle Management: No shutdown/close mechanism for the worker thread or loaded plugins. The LineagePlugin extends Plugin which extends Closeable, but close() is never called on plugins.

5. Configuration Items: lineage_event_queue_size is marked mutable=true but the queue capacity is baked in at construction time, making runtime changes ineffective (see Issue #3).

6. Incompatible Changes: The PlannerHook interface changed beforeAnalyze(CascadesContext) to beforeAnalyze(NereidsPlanner). Verified safe: no existing implementations override the analyze hooks.

7. Parallel Code Paths: N/A for this feature.

8. Test Coverage: Unit tests are comprehensive for extraction and processor lifecycle. However, there are no regression tests under regression-test/ (see Issue #5). The LineageUtilsSkipTest is a good addition.

9. Observability: Debug logging is adequate. No metrics are added for the lineage path, which is acceptable for an initial implementation.

10. Transaction/Persistence: N/A - no EditLog or transaction changes.

11. Performance: isLineagePluginConfigured() is called on every DML execution path and iterates all plugins calling eventFilter(). This should use a cached flag rather than iterating plugins on every query (see Issue #4).

List<LineagePlugin> plugins = new ArrayList<>();
for (Map.Entry<String, LineagePluginFactory> entry : factories.entrySet()) {
String pluginName = entry.getKey();
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #1 (Medium): Config.activate_lineage_plugin is not enforced at framework level

All discovered plugin factories are instantiated unconditionally here. The activate_lineage_plugin config is documented as the mechanism to select which plugins are active, but it is never checked by the framework - filtering is entirely delegated to each plugin's eventFilter() implementation.

This means:

  1. Plugins are loaded and initialized even when not listed in activate_lineage_plugin
  2. A buggy plugin that returns true from eventFilter() regardless of config will receive events even when not configured
  3. There is no built-in way to disable a misbehaving plugin without removing its JAR

Suggestion: Either filter factories by activate_lineage_plugin before instantiation, or provide a default eventFilter() implementation in the framework that checks the config:

// In discoverPlugins(), before creating plugin:
Set<String> activeNames = new HashSet<>(Arrays.asList(Config.activate_lineage_plugin));
if (!activeNames.isEmpty() && !activeNames.contains(pluginName)) {
    LOG.info("Skip lineage plugin not in activate_lineage_plugin: {}", pluginName);
    continue;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed add Config.activate_lineage_plugin filter

while (true) {
List<LineagePlugin> currentPlugins = lineagePlugins.get();

try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #2 (Medium): Worker reads stale plugin list and has no graceful shutdown

Two issues here:

  1. Stale plugin list: currentPlugins is read at line 298 BEFORE eventQueue.poll() at line 301, which blocks for up to 5 seconds. If refreshPlugins() is called during that wait, the event will be dispatched to the old plugin list. The fix is to read lineagePlugins.get() AFTER successfully polling an event:
while (true) {
    try {
        lineageInfo = eventQueue.poll(EVENT_POLL_TIMEOUT_SECONDS, TimeUnit.SECONDS);
        if (lineageInfo == null) {
            continue;
        }
    } catch (InterruptedException e) {
        LOG.warn("...", e);
        continue;
    }
    List<LineagePlugin> currentPlugins = lineagePlugins.get(); // Move here
    for (LineagePlugin lineagePlugin : currentPlugins) {
        ...
    }
}
  1. No graceful shutdown: The while(true) loop has no exit condition. When FE shuts down, the daemon thread is killed abruptly. Consider adding a volatile boolean running flag and a stop() method that sets it to false and interrupts the worker thread, so plugins can be closed cleanly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Agreed, will fix. Moving lineagePlugins.get() after the successful poll() is a zero-cost improvement for correctness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 The setDaemon(true) is intentional — it follows the same pattern as AuditEventProcessor in Doris FE, which also uses daemon + while(true) without an explicit stop(). Since there's no unified graceful shutdown framework in FE today, adding one here alone would be inconsistent. Happy to revisit if FE introduces a global shutdown lifecycle later.


@ConfField(mutable = true, description = {"血缘事件队列最大长度,超过长度事件会被舍弃",
"Max size of lineage event queue, events will be discarded when exceeded"})
public static int lineage_event_queue_size = 50000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #3 (Low): lineage_event_queue_size marked mutable=true but is effectively immutable

The LinkedBlockingDeque in LineageEventProcessor is constructed with this config value at field initialization time (line 70-71 of LineageEventProcessor.java). LinkedBlockingDeque has no setCapacity() method, so changing this config at runtime via SET GLOBAL has no effect on the actual queue capacity.

To make it truly mutable, follow the AuditEventProcessor pattern: use an unbounded LinkedBlockingDeque and check Config.lineage_event_queue_size at enqueue time in submitLineageEvent():

if (eventQueue.size() >= Config.lineage_event_queue_size) {
    LOG.warn("...");
    return false;
}
eventQueue.add(lineageInfo);

Alternatively, remove mutable = true if runtime changes are not intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed mutable = true from lineage_event_queue_size. The LinkedBlockingDeque capacity is fixed at construction time with no setCapacity() API, so runtime changes via SET GLOBAL would silently have no effect. Queue size is a low-frequency config — restarting FE to apply changes is acceptable.

private static boolean isLineagePluginConfigured() {
LineageEventProcessor processor = Env.getCurrentEnv().getLineageEventProcessor();
return processor != null && processor.hasActivePlugins();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #4 (Medium): isLineagePluginConfigured() iterates all plugins on every DML execution

hasActivePlugins() is called on every INSERT INTO SELECT, INSERT OVERWRITE, and CTAS execution. It iterates through all loaded plugins and calls eventFilter() on each one. For a system processing thousands of queries per second, this adds unnecessary overhead to the hot path.

Consider caching the result with a short TTL (e.g., 5-10 seconds) or using a simple volatile boolean flag that is updated when plugins are refreshed or config changes:

private static boolean isLineagePluginConfigured() {
    LineageEventProcessor processor = Env.getCurrentEnv().getLineageEventProcessor();
    return processor != null && processor.hasActivePlugins(); // This iterates all plugins
}

Alternatively, since activate_lineage_plugin defaults to empty, a quick check of Config.activate_lineage_plugin.length > 0 before calling hasActivePlugins() would be a cheap fast-path exit for the common case of no lineage plugins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead is negligible in practice hasActivePlugins()
is a volatile read + iteration over typically 0-2 plugins.

boolean eventFilter();

/**
* Processes a lineage event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #5 (Low): SPI contract should document thread-safety requirement

eventFilter() and exec() are called from the single Worker thread, but eventFilter() is also called from arbitrary query threads (via hasActivePlugins() in isLineagePluginConfigured()). The SPI contract should document that:

  1. eventFilter() must be thread-safe (called from multiple threads concurrently)
  2. exec() is called from a single worker thread (but concurrent with eventFilter() calls)

This helps plugin implementors avoid subtle concurrency bugs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have added

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 11.33% (82/724) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 11.33% (82/724) 🎉
Increment coverage report
Complete coverage report

@seawinde
Copy link
Member Author

seawinde commented Mar 5, 2026

run buildall

1 similar comment
@seawinde
Copy link
Member Author

seawinde commented Mar 6, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27508 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4f60a7c0a30fd11e8dd1664f3cd02da22b89b58c, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17630	4604	4267	4267
q2	q3	10647	784	510	510
q4	4676	360	258	258
q5	7559	1214	1033	1033
q6	184	175	147	147
q7	802	840	657	657
q8	9509	1473	1327	1327
q9	4857	4751	4671	4671
q10	6341	1888	1691	1691
q11	485	261	245	245
q12	755	571	458	458
q13	18058	2946	2169	2169
q14	240	230	216	216
q15	948	791	797	791
q16	755	713	688	688
q17	688	847	417	417
q18	5905	5310	5211	5211
q19	1116	1001	625	625
q20	492	487	388	388
q21	4697	2112	1479	1479
q22	381	311	260	260
Total cold run time: 96725 ms
Total hot run time: 27508 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4666	4606	4522	4522
q2	q3	3962	4299	3868	3868
q4	876	1206	797	797
q5	4052	4340	4318	4318
q6	185	176	136	136
q7	1734	1656	1497	1497
q8	2474	2707	2566	2566
q9	7558	7313	7326	7313
q10	3728	4002	3596	3596
q11	521	446	450	446
q12	509	591	443	443
q13	2716	3442	2376	2376
q14	286	294	271	271
q15	860	830	818	818
q16	740	764	735	735
q17	1165	1459	1374	1374
q18	7115	6857	6588	6588
q19	871	881	964	881
q20	2094	2165	1992	1992
q21	3937	3543	3288	3288
q22	476	430	405	405
Total cold run time: 50525 ms
Total hot run time: 48230 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153126 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4f60a7c0a30fd11e8dd1664f3cd02da22b89b58c, data reload: false

query5	4320	646	516	516
query6	335	228	214	214
query7	4215	470	264	264
query8	336	238	215	215
query9	8748	2740	2732	2732
query10	534	372	338	338
query11	7319	5789	5536	5536
query12	188	131	127	127
query13	1275	448	371	371
query14	5702	3792	3531	3531
query14_1	2815	2775	2770	2770
query15	202	206	170	170
query16	977	464	458	458
query17	1027	692	594	594
query18	2419	432	338	338
query19	210	208	176	176
query20	135	132	125	125
query21	221	147	130	130
query22	4896	5180	4769	4769
query23	16342	15941	15863	15863
query23_1	15967	15964	15803	15803
query24	8906	1670	1278	1278
query24_1	1297	1295	1300	1295
query25	595	528	470	470
query26	1258	275	166	166
query27	2761	490	288	288
query28	4549	1899	1875	1875
query29	842	610	490	490
query30	317	249	216	216
query31	1362	1283	1242	1242
query32	86	76	74	74
query33	515	345	285	285
query34	941	915	563	563
query35	627	690	616	616
query36	1118	1139	959	959
query37	130	98	85	85
query38	2935	2931	2855	2855
query39	870	851	836	836
query39_1	824	831	828	828
query40	235	159	140	140
query41	67	63	64	63
query42	304	310	297	297
query43	249	251	217	217
query44	
query45	197	195	184	184
query46	884	977	607	607
query47	2176	2191	2070	2070
query48	326	320	230	230
query49	645	466	437	437
query50	672	288	213	213
query51	4116	4079	4026	4026
query52	286	291	277	277
query53	293	331	279	279
query54	295	270	256	256
query55	99	87	81	81
query56	310	322	301	301
query57	1372	1371	1308	1308
query58	285	273	278	273
query59	1339	1447	1316	1316
query60	336	335	324	324
query61	145	143	148	143
query62	635	601	514	514
query63	316	298	274	274
query64	5033	1274	1061	1061
query65	
query66	1459	454	356	356
query67	16466	16439	16222	16222
query68	
query69	402	300	280	280
query70	1000	968	966	966
query71	345	312	295	295
query72	2748	2643	2450	2450
query73	531	538	320	320
query74	9931	9923	9833	9833
query75	2837	2767	2459	2459
query76	2278	1015	677	677
query77	350	375	300	300
query78	11176	11270	10622	10622
query79	2983	805	594	594
query80	1745	621	572	572
query81	561	275	249	249
query82	976	149	120	120
query83	334	256	241	241
query84	256	125	95	95
query85	950	491	456	456
query86	404	330	292	292
query87	3100	3111	2983	2983
query88	3530	2647	2618	2618
query89	428	365	342	342
query90	1972	183	165	165
query91	160	159	144	144
query92	75	73	66	66
query93	1223	805	519	519
query94	652	334	310	310
query95	580	395	316	316
query96	648	511	227	227
query97	2512	2536	2416	2416
query98	244	228	221	221
query99	982	988	919	919
Total cold run time: 236811 ms
Total hot run time: 153126 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 56.46% (411/728) 🎉
Increment coverage report
Complete coverage report

@seawinde seawinde force-pushed the lineage_opensource branch from 034013e to 426b8b3 Compare March 6, 2026 08:13
@seawinde
Copy link
Member Author

seawinde commented Mar 6, 2026

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 27663 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 426b8b3c9cf8e2554564cb0ef698fff4050291d8, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17665	4514	4285	4285
q2	q3	10654	793	516	516
q4	4676	367	255	255
q5	7554	1211	1015	1015
q6	194	176	147	147
q7	766	849	656	656
q8	9296	1476	1326	1326
q9	4820	4739	4717	4717
q10	6251	1901	1637	1637
q11	485	255	243	243
q12	709	569	465	465
q13	18033	2907	2192	2192
q14	228	228	220	220
q15	911	796	805	796
q16	743	729	680	680
q17	719	887	410	410
q18	6021	5353	5217	5217
q19	1104	970	611	611
q20	515	498	405	405
q21	4445	1926	1589	1589
q22	343	309	281	281
Total cold run time: 96132 ms
Total hot run time: 27663 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4694	4582	4485	4485
q2	q3	3880	4318	3823	3823
q4	912	1171	771	771
q5	4039	4385	4318	4318
q6	183	182	143	143
q7	1742	1621	1528	1528
q8	2522	2743	2636	2636
q9	7561	7716	7504	7504
q10	3784	3978	3617	3617
q11	507	430	433	430
q12	473	582	428	428
q13	2732	3169	2426	2426
q14	278	299	303	299
q15	877	846	831	831
q16	730	756	716	716
q17	1143	1382	1369	1369
q18	7331	6676	6621	6621
q19	1076	925	909	909
q20	2067	2175	2060	2060
q21	4013	3407	3320	3320
q22	472	416	384	384
Total cold run time: 51016 ms
Total hot run time: 48618 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 152588 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 426b8b3c9cf8e2554564cb0ef698fff4050291d8, data reload: false

query5	4381	643	514	514
query6	313	217	205	205
query7	4236	473	265	265
query8	335	241	230	230
query9	8706	2756	2756	2756
query10	543	401	335	335
query11	7416	5917	5606	5606
query12	187	136	137	136
query13	1293	453	369	369
query14	5605	3882	3563	3563
query14_1	2829	2827	2803	2803
query15	207	203	174	174
query16	1028	479	381	381
query17	887	721	620	620
query18	2445	454	357	357
query19	209	211	189	189
query20	142	128	124	124
query21	222	145	124	124
query22	4905	5018	4714	4714
query23	16599	15895	15824	15824
query23_1	15631	15696	15528	15528
query24	7931	1717	1250	1250
query24_1	1327	1305	1288	1288
query25	548	483	395	395
query26	1244	254	145	145
query27	2789	471	281	281
query28	4580	1864	1843	1843
query29	849	549	465	465
query30	305	243	209	209
query31	1381	1295	1231	1231
query32	80	72	71	71
query33	504	324	270	270
query34	965	920	568	568
query35	628	678	596	596
query36	1079	1142	986	986
query37	134	102	89	89
query38	2953	2896	2855	2855
query39	899	856	835	835
query39_1	834	808	832	808
query40	229	155	134	134
query41	60	63	56	56
query42	308	304	303	303
query43	240	247	225	225
query44	
query45	192	185	182	182
query46	874	986	598	598
query47	2112	2152	2045	2045
query48	309	329	238	238
query49	626	457	374	374
query50	674	280	216	216
query51	4079	4129	4024	4024
query52	287	290	279	279
query53	290	332	277	277
query54	290	272	260	260
query55	95	87	83	83
query56	307	319	297	297
query57	1395	1331	1272	1272
query58	287	279	267	267
query59	1344	1478	1277	1277
query60	338	336	319	319
query61	146	152	153	152
query62	607	587	527	527
query63	311	270	278	270
query64	5069	1274	1011	1011
query65	
query66	1448	467	341	341
query67	16434	16432	16255	16255
query68	
query69	405	307	276	276
query70	964	981	963	963
query71	339	312	303	303
query72	2781	2686	2547	2547
query73	548	558	328	328
query74	9992	9919	9757	9757
query75	2874	2757	2514	2514
query76	2312	1040	673	673
query77	359	385	331	331
query78	11176	11394	10658	10658
query79	1192	801	611	611
query80	844	656	571	571
query81	518	272	242	242
query82	1335	153	120	120
query83	375	303	250	250
query84	294	116	91	91
query85	954	468	442	442
query86	386	335	301	301
query87	3154	3085	2969	2969
query88	3528	2675	2664	2664
query89	420	368	340	340
query90	1820	179	171	171
query91	162	158	131	131
query92	79	70	69	69
query93	916	830	510	510
query94	503	319	292	292
query95	588	333	311	311
query96	634	525	226	226
query97	2508	2475	2399	2399
query98	231	217	220	217
query99	1039	1000	932	932
Total cold run time: 233563 ms
Total hot run time: 152588 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 56.18% (409/728) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 11.68% (85/728) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants