Optimized pipelines for Spatial Transcriptomics (ST) data analysis using Seurat & Giotto, designed for reproducible benchmarking and biological insight.
MSc Thesis Project | Bioinformatics & Biostatistics, UB-UOC | 2025
This repository provides a comprehensive and reproducible suite of R scripts for analyzing Spatial Transcriptomics (ST) data. It compares the performance and results of Seurat and Giotto, and introduces a novel Optimized Hybrid Pipeline that leverages the strengths of both.
Developed as part of a Master's Thesis, this project is structured as a robust resource for the wider bioinformatics community, focusing on lung cancer research using 10x Genomics Visium HD data.
- Standalone Pipelines: Fully documented, independent scripts for Seurat and Giotto using harmonized parameters for fair comparison.
- Hybrid Pipeline: A novel workflow that combines Seurat’s efficiency for initial analysis with Giotto’s granularity for detailed clustering, governed by an automated decision step.
- Rigorous Benchmarking: Measures key performance metrics including memory usage (RAM), execution time, and clustering resolution.
- In-depth Downstream Analysis: Includes modules for marker gene detection, cell-type signature scoring, simulated deconvolution, and advanced spatial visualization.
- 100% Reproducible: All code is open, well-documented, and designed for reproducibility.
- The provided scripts and benchmarks are configured for 16 µm resolution binning. This resolution was chosen to balance spatial granularity with accessibility, allowing the full workflow (including Giotto) to run on standard workstations (32GB - 64GB RAM).
- However, the framework is resolution-agnostic. To analyze finer resolutions (8 µm or 2 µm), simply provide the corresponding input matrix. Caution: High-resolution bins (2 µm) scale memory requirements exponentially.
📁 scripts/
📜 01_seurat_pipeline.R
📜 02_giotto_pipeline.R
📜 03_compare_pipelines.R
📜 4_Optimized_pipeline.R
📜 5_Integrated_analysis.R
📜 6_native_visualizations.R
📜 7_final_report_generation.R
📜 8_deconvolution_analysis.R
📜 9_balanced_deconvolution.R
📁 data/
📄 Tumor.csv
📄 Fibroblasts.csv
📄 ... (and other signature files))
📁 visium_dataset/
*(Raw spatial dataset - to be downloaded separately)*
📁 results/ # (Git-ignored) Where all outputs are saved.
📄 .gitignore
📄 README.md
#### 1. Clone the Repository
git clone https://github.com/Nuiter/seurigiotto-benchmark-framework.git
cd seurigiotto-benchmark-framework
This repository reflects the core code and environment used for the MSc thesis defence (June 2025).
Key versions:
| Component | Version |
|---|---|
| R | 4.4.2 |
| Seurat | 5.3.0 |
| Giotto | 4.2.1 |
| OS | Linux |
- R version: 4.4.2 or higher recommended.
- Key R Packages:
Seurat, Giotto, ggplot2, dplyr, patchwork, pryr, arrow, data.table, scales, RColorBrewer, viridis, pheatmap, tidyr.
You can install them in your R session with:
install.packages(c("Seurat", "Giotto", "ggplot2", "dplyr", "patchwork", "pryr", "arrow", "data.table", "scales", "RColorBrewer", "viridis", "pheatmap", "tidyr"))
(Note: Seurat and Giotto may require installation from Bioconductor or specific sources. Please refer to their official documentation.)
The analysis scripts are configured to use public data from 10x Genomics (Visium HD Human Lung Cancer).
- The cell signature CSV files are included in the /data directory.
- The main spatial dataset must be downloaded separately. Place the dataset folder (e.g., visium_dataset) inside the /data directory.
- Link: Visium HD Human Lung Cancer Dataset
The scripts are designed to be run sequentially from the command line.
The core benchmarking workflow consists of scripts 1, 2, and 3.
The others perform more advanced, integrated analyses.
# === Core Benchmarking Workflow ===
# 1. Run the standalone Seurat pipeline
Rscript scripts/01_seurat_pipeline.R
# 2. Run the standalone Giotto pipeline
Rscript scripts/02_giotto_pipeline.R
# 3. Run the comparative analysis
Rscript scripts/03_compare_pipelines.R
# === Advanced & Integrated Analysis ===
# These scripts build upon the initial results for deeper insights.
Rscript scripts/4_Optimized_pipeline.R
Rscript scripts/5_Integrated_analysis.R
Rscript scripts/6_native_visualizations.R
Rscript scripts/7_final_report_generation.R
Rscript scripts/8_deconvolution_analysis.R
Rscript scripts/9_balanced_deconvolution.R
All outputs (figures, tables, logs) will be saved in the /results directory in organized subfolders.
If you use this workflow or find the code helpful in your research, please cite this repository.
Author: Ángel I. Pérez Santiago
Project: MSc in Bioinformatics & Biostatistics (University of Barcelona / UOC, 2025)
GitHub: https://github.com/nuiter
This project is licensed under the MIT License. See the LICENSE file for full details.
- Dataset: Visium HD CytAssist Gene Expression of Human Lung Cancer (Fixed Frozen) by 10x Genomics.
- Supervision: Dr. Alfonso Saera Vila (MSc Thesis Advisor).
- Core Frameworks: The Satija Lab for Seurat and the Dries Lab for Giotto.
For questions, collaborations, or feedback, please open an issue in this repository.