Dataset: Selevsek et al., 2015 — DIA/SWATH-MS time-course protein abundance under osmotic stress
Methods: ANOVA · Repeated Measures ANOVA · LMM · LM · EMMeans · Pairwise Comparisons
Understanding how cells modulate their proteomic composition in response to environmental challenges is a central question in systems biology. Saccharomyces cerevisiae serves as a well-established model for exploring dynamic responses to osmotic perturbation, which involve coordinated regulation across the proteome. Selevsek et al.(2015) generated a high-resolution temporal dataset using SWATH-MS, capturing protein abundance changes at six post-treatment intervals following NaCl-induced stress.
Building on this resource, the present study applies statistical methods tailored for longitudinal data. The goal is to test whether protein expression varies significantly across timepoints and whether replicate-level variability contributes meaningfully to overall variation, thereby identifying proteins responsive to osmotic stress.
Through a comparative framework involving fixed-effects ANOVA, repeated measures ANOVA, linear mixed-effects models (LMMs), and fixed-effects linear models (LMs), the analysis illustrates how progressively flexible statistical methods address core challenges in longitudinal proteomics, guiding model selection based on variance structure and model fit.
This repository contains a fully reproducible time-series proteomics analysis pipeline implemented in R. The workflow analyzes protein abundance changes in Saccharomyces cerevisiae exposed to osmotic stress over six time points, using univariate statistical models and model-based pairwise comparisons.
The analysis follows the structure of a statistical proteomics report and includes:
- Data preprocessing & transformation
- Filtering and random protein selection
- One-way ANOVA
- Repeated Measures ANOVA
- Linear Mixed-Effects Models (LMM)
- Fallback Linear Models (LM)
- Model selection using ICC
- Nested LM comparison
- Pairwise comparisons (Tukey & EMMeans)
- Volcano plot for significant contrasts
- Extraction of significant proteins
All intermediate files (summary tables, model results, p-value tables, EMMeans outputs, etc.) are automatically saved as CSV outputs.
install.packages(c(
"tidyverse", "lme4", "lmerTest", "ez", "performance", "cluster",
"ggplot2", "ggVennDiagram", "emmeans"
))Selevsek2015_DIA_Spectronaut_annotation.csvSelevsek2015.csvTIME_SERIES_DATA_ANALYSIS.Rmd
- Read metadata & protein abundance matrix
- Pivot to long format
- Merge annotation info
- Remove missing values and low-variance proteins
Random selection of 150 proteins for efficient modeling.
- Per-protein ANOVA
- P-value distribution summary
- Full ANOVA table exported
- Biological replicate treated as within-subject factor
- Extraction of p-values, F-values, and model tables
- LMM with random intercepts
- Check for singular fits
- Fallback to LM when appropriate
- Calculate ICC and determine best-fitting model
- Select LMM or LM based on ICC ≥ 0.01
- Save p-values, AIC, ICC, and chosen model
- Tukey HSD for ANOVA
- EMMeans for LMM/LM with FDR correction
- Top proteins with the most significant contrasts
Contrast: T030 vs T000
- log2FC calculated
- FDR-adjusted p-values
- Red = significant proteins
The pipeline generates:
- ANOVA p-values and full tables
- RM ANOVA p-values and F-values
- LMM vs LM model selection
- ICC values for repeated measures
- Nested LM comparison results
- Tukey pairwise tables
- EMMeans contrast tables
- Top proteins by significant timepoint changes
- Venn diagram of significant proteins
- Model usage heatmap
- Volcano plot (T030 vs T000)
-
This study presents a statistical evaluation of time-dependent proteomic changes in S. cerevisiae under osmotic stress, using a repeated-measures design and a layered modeling framework.
-
Initial one-way ANOVA detected significant timepoint effects in 52% of proteins, suggesting early rejection of the null hypothesis that protein abundance remains constant over time. However, it did not account for within-subject correlation.
-
Repeated measures ANOVA improved on this by modeling intra-subject variation but failed to detect strong replicate effects. The assumption of compound symmetry and limited power due to only three biological replicates likely contributed to its reduced sensitivity.
-
Linear mixed-effects models (LMMs) offered the most robust analysis, capturing both fixed time effects and random replicate-level variation.
-
Among 47 proteins modeled with LMMs, 89% showed significant temporal changes, with moderate ICC values confirming replicate-specific contributions. For proteins where replicate effects were negligible or non-estimable, fixed-effects linear models (LMs) served as a fallback, identifying significant timepoint effects in 66% of remaining cases.
-
Overall, 71% of proteins showed significant time-dependent expression under at least one model.
-
This underscores the dynamic nature of the proteome in response to stress and validates the use of a model selection strategy guided by ICC and AIC.
-
Future work should consider expanding the number of biological replicates and applying more flexible or hierarchical Bayesian models to better quantify subject-level variance and enhance inference reliability.