This guide helps AI assistants (LLMs / Agents) make safe, consistent, and verifiable changes to the Apache SeaTunnel codebase. It mirrors practices from mature Apache projects and adapts them to SeaTunnel’s build, testing, architecture, and documentation conventions.
Agents MUST run verification commands locally before suggesting or finalizing changes.
# Format code (mandatory)
./mvnw spotless:apply
# Quick verification (mandatory)
./mvnw -q -DskipTests verify
# Unit tests (strongly recommended)
./mvnw testFailure to meet these requirements will likely result in PR rejection.
SeaTunnel follows a strict commit message format to maintain a clean and searchable history.
Format:
[Type][Module] Description
Feature– New featuresFix– Bug fixesImprove– Improvements to existing behaviorDocs– Documentation-only changesTest– Test cases or test framework changesChore– Build, dependency, or maintenance tasks
Connector-V2– seatunnel-connectors-v2Zeta– seatunnel-engine (Zeta engine)Core– seatunnel-coreAPI– seatunnel-apiTransform-V2– seatunnel-transforms-v2Format– seatunnel-formatsTranslation– seatunnel-translationE2E– seatunnel-e2e
[Fix][Connector-V2] Fix MySQL source split enumeration bug[Fix][Zeta] Fix checkpoint timeout under heavy backpressure[Feature][Transform-V2] Add LLM transform plugin[Improve][Core] Optimize jar package loading speed[Docs] Update quick start guide
seatunnel/
├── seatunnel-api/ # Core API definitions
├── seatunnel-connectors-v2/ # Source & Sink connectors (main contribution area)
├── seatunnel-transforms-v2/ # Transform plugins (including LLM)
├── seatunnel-engine/ # Zeta engine & Web UI
├── seatunnel-core/ # Job submission & CLI entry points
├── seatunnel-translation/ # Flink & Spark adapters
├── seatunnel-formats/ # Data formats (JSON, Avro, etc.)
├── seatunnel-e2e/ # End-to-End integration tests
├── docs/ # Documentation (en & zh)
└── config/ # Default configurations
- Formatting: Google Java Format (AOSP style), enforced by Spotless
- Imports:
- No wildcard imports
- Use shaded dependencies:
org.apache.seatunnel.shade.*
- Nullability: Avoid implicit null assumptions
- Visibility: Keep APIs minimal; prefer package-private when possible
- Comments: Add comments for important methods (public APIs, complex logic). Important methods include public APIs, lifecycle hooks (initialization, start/stop, checkpoint), and complex or performance-critical logic. Example:
/**
* Enumerates source splits for parallel reading.
* Called once during job initialization.
*
* @param context Split enumeration context
* @return Collection of discovered splits
*/
@Override
public List<SourceSplit> enumerateSplits(SplitEnumerationContext context) {
// Implementation
}All new files MUST include the ASF license header:
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/Agents MUST treat backward compatibility as a hard constraint.
- DO NOT remove or rename existing config options
- DO NOT change default values casually
- DO NOT break public APIs or SPI contracts
Any incompatible change MUST:
- Be explicitly documented
- Be documented in
docs/en/introduction/concepts/incompatible-changes.md - Include migration guidance
- Be clearly explained in the PR description
- DO NOT introduce new dependencies unless absolutely necessary
- Prefer existing shaded dependencies under
org.apache.seatunnel.shade.* - Any new dependency MUST:
- Be justified in the PR description
- Consider shading, size, and conflict risks
- Implement
SeaTunnelSourceorSeaTunnelSink - Define configs using
Option - Support parallelism via
SourceSplitEnumerator - Avoid connector-specific logic leaking into engine or core
- Client: Submits job config
- Master: Schedules & coordinates
- Worker: Executes tasks (Source → Transform → Sink)
Respect task boundaries and lifecycle semantics.
- All user-facing configs MUST be defined using
Option - Each option MUST include:
- name
- type
- default value (if applicable)
- clear description
- Option names are stable contracts and must not be renamed lightly
- Exceptions MUST include sufficient context (table, task, config key)
- Avoid swallowing exceptions
- Use proper log levels:
- INFO – lifecycle events
- WARN – recoverable issues
- ERROR – task-failing errors
- NEVER log sensitive information (passwords, tokens, credentials)
-
Any user-visible change MUST update:
docs/endocs/zh
-
Config names, defaults, and examples MUST match the code exactly
-
Documentation is part of the feature, not an afterthought
- Located under
src/test/java - Validate behavior, not implementation details
- Prefer deterministic and minimal tests
Command:
./mvnw test- Located in
seatunnel-e2e - Uses Testcontainers
- Extend
TestSuiteBase
Command:
./mvnw -DskipUT -DskipIT=false verifyAgents MUST consider performance implications:
- Avoid unnecessary object creation in hot paths
- Be cautious with large in-memory buffers
- Consider parallelism and resource usage
- Keep changes minimal and focused
- Avoid unrelated refactors or formatting-only changes
- One PR should solve one problem
./mvnw clean install -DskipTests -Dskip.spotless=truesh bin/install-plugin.sh $current_versionsh bin/seatunnel.sh --config config/v2.batch.config.template -e local