SeiPlant is a deep learning framework for predicting histone modification patterns in plant genomes. Built upon the Sei architecture, this model enables high-resolution inference of chromatin states directly from raw DNA sequences across diverse plant species.
Figure 1. Workflow of the SeiPlant framework for cross-species prediction of chromatin features in plants.
- Cross-species modeling for plant epigenomics
- Multi-task prediction of histone marks (e.g., H3K4me3, H3K27ac)
- Tested on representative monocots and dicots (e.g., Oryza sativa, Zea mays, Arabidopsis thaliana)
- Supports both species-specific and generalization settings
- One-click sequence-to-signal pipeline outputting BigWig and BedGraph
### Python enviroment constructed by Conda
conda create -n SeiPlant python=3.8
conda activate SeiPlant
git clone https://github.com/Lv-BioInfo/SeiPlant.git
pip install -r requirements.txt
# Install PyTorch (adjust the version according to your system environment)
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2+cu118 \
-f https://download.pytorch.org/whl/torch_stable.htmlNote: In our experiments, we used PyTorch 2.1.2 with CUDA 11.8.
This specific version was chosen because our server’s GLIBC version was low to support latest PyTorch releases.
Please install the latest PyTorch version compatible with your own system environment (see PyTorch official installation guide).
The SeiPlant project requires some files to be manually downloaded from Zenodo and placed into the correct folders. Below is the directory structure with notes on which files you need to provide:
-
models/
- model_architectures/
model.py— Model architecture definition
model.pth— [Download from Zenodo]tag— [Download from Zenodo]
- model_architectures/
-
scripts/
- fasta/
species.fa— [Download from Zenodo]species.size— [Download from Zenodo]
make_bedgraph.py— Convert bigWig to bedGraphmake_prediction_bed.py— Run predictions in bed formatprediction.py— Inference scripttrain.py— Training scriptevaluate.py— Evaluation script
- fasta/
-
utils/
- Utility functions for data processing & model training
Note
You can download the sample reference genomes and trained model parameters from
👉 Zenodo (DOI: 10.5281/zenodo.15421964)
and place them in the/scripts/fasta/and/models/folders, respectively.
Provide a reference genome in FASTA format for the species of interest. To tile the genome:
- Apply a sliding window approach (default: 1,024 bp window, 128 bp step size)
- Filter windows to retain only sequences with standard nucleotides (A/T/C/G)
- Save:
- BED file for genomic coordinates
- FASTA file for model input sequences
Example usage:
python make_prediction_bed.py \
--fasta fasta/arabidopsis_thaliana.fa \
--size fasta/arabidopsis_thaliana.size \
--species arabidopsis_thaliana \
--output_path ./bed/ \
--window_size 1024 \
--step_size 128Feed the .fasta file into the pretrained SeiPlant model to obtain chromatin feature predictions.
- Predicts probability scores for multiple histone modifications:
- H3K4ME3, H3K27AC, H3K4ME1, H3K9AC, H3K36ME3
- Output:
.npyfile containing multi-label prediction scores aligned with each genomic window
Post-process model predictions into standard genome browser formats:
- Align scores to central genomic coordinates (e.g.,
start+448,start+576) - Filter weak signals (< 0.01) and normalize (Min–Max scaling to 0.1–1.0)
- Export per-mark BedGraph files
Example usage:
python prediction.py --model_path ../models/Brassicaceae_20250312_203749_1024_nip_feature7.model \
--model_tag_file ../models/histone_modification_tag.txt \
--species arabidopsis_thaliana \
--fa_path ./bed/arabidopsis_thaliana_1024_128.fa \
--output_dir ./bedgraph \
--bed_file ./bed/arabidopsis_thaliana_1024_128_filtered.bed \
--seq_len 1024 \
--batch_size 256- Prepare your BedGraph file (e.g.,
H3K4ME3.bedgraph). - Make sure you have the chromosome sizes file (e.g.,
chrom.sizes). - Install UCSC tools (provides
bedGraphToBigWig). - Convert to BigWig format:
bedGraphToBigWig H3K4ME3.bedgraph chrom.sizes H3K4ME3.bwNote
bedGraphToBigWig is part of the UCSC utilities.
📌 You can download it from UCSC Genome Browser utilities.
Make sure thechrom.sizesfile matches the reference genome you are using.
We provide a complete from-scratch training guide used in this study, including data preparation, scoring criteria, and training procedures.
For details, please refer to: train_from_scratch
For specific details on the ablation experiment, please visit the following files in the experiments/ablation directory:
ablation
For specific details on the compare methods experiment, please visit the following files in the experiments/comparative_methods directory:
comparative_methods
For specific details on the plotting methods used in our study, please visit the following files in the docs/plotting directory:
plotting
If you use SeiPlant in your work, please cite:
Lv T, Han Q, Li Y, Liang C, Ruan Z, Chao H, Chen M, Chen D. Cross-species prediction of histone modifications in plants via deep learning. Genome Biology (2026). https://doi.org/10.1186/s13059-025-03929-4
We also welcome citation of related studies:
A sequence-based global map of regulatory activity for deciphering human genetics
Chen KM, Wong AK, Troyanskaya OG, Zhou J
Nature Genetics. 2022; 54:940–949. doi: https://doi.org/10.1038/s41588-022-01102-2
Deep learning on chromatin profiles reveals the cis-regulatory sequence code of the rice genome
Zhou X, Ruan Z, Zhang C, Kaufmann K, Chen D
Journal of Genetics and Genomics. 2024; S1673852724003564. doi: https://doi.org/10.1016/j.jgg.2024.12.007
Any questions or suggestions on SeiPlant are welcomed! Please report it on issues, or contact Dijun Chen (dijunchen@nju.edu.cn).
