Skip to content

thustorage/GCR

Repository files navigation

GPU Checkpoint/Restore Made Fast and Lightweight

This is the artifact repository for evaluating FAST'26 GPU Checkpoint/Restore Made Fast and Lightweight.

Table of Contents

Evaluating the Artifact

bash run.sh

Results summary will be written to results.txt.

Detailed results are stored in eval/log/ with subdirectories for each experiment type.

Artifact Overview

This artifact is a comprehensive evaluation framework for GPU checkpoint/restore systems. It includes:

  • Source code:

    • GCR/ - GCR implementation
  • Analysis scripts: in eval/analysis/

    • Scripts for generating summary and analyzing results
  • Configuration and utilities:

    • run.sh - Main evaluation script
    • run_analysis.sh - Result analysis script

Environment Setup

  • Hardware:

    • 2× NVIDIA A100-40GB GPUs with NVLink and PCIe 4.0 (tested configuration)
  • Software:

    • CUDA toolkit 12.6
    • PyTorch 2.7.1
    • vLLM 0.9.1
    • DeepSpeed 0.17.5

Note on framework versions: Since PhOS only supports Transformers 4.30.0, we use the same Transformers version when evaluating workloads that are supported by PhOS to ensure fair comparison. For workloads not supported by PhOS, we use newer versions of Transformers (4.53.3) to enable broader workload coverage.

Dependencies

  • ServerlessLLM is installed following ServerlessLLM (commit id: 76d472f)
  • Kernel PTX is generated using Neutrino
  • Python library dependencies are listed in environment.yml

Citation

@inproceedings{GCR,
  author    = {Shaoxun Zeng and Tingxu Ren and Jiwu Shu and Youyou Lu},
  title     = {GPU Checkpoint/Restore Made Fast and Lightweight},
  booktitle = {24rd USENIX Conference on File and Storage Technologies (FAST'26)},
  year      = {2026},
  address   = {Santa Clara, CA},
  month     = feb,
  publisher = {USENIX Association},
  url       = {https://www.usenix.org/conference/fast26/presentation/zeng}
}

About

code repo for GCR [FAST'26]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors