⭐ If DiCo is helpful to your projects, please help star this repo. Thanks! 🤗
- 2025.9.19: We release code, models and training logs of DiCo.
- 2025.9.18: DiCo is accepted by NeurIPS 2025 as a spotlight paper! 🎉
- 2025.5.18: This repo is created.
Our DiCo models consistently require fewer Gflops compared to their Transformer counterparts, while achieving superior generative performance.
| Model-iters | Resolution | CFG | FID | IS | Params | FLOPs | ckpt | log |
|---|---|---|---|---|---|---|---|---|
| DiCo-S-400k | 256x256 | 1.0 | 49.97 | 31.38 | 33.1M | 4.25G | ckpt | log |
| DiCo-B-400k | 256x256 | 1.0 | 27.20 | 56.52 | 130.0M | 16.88G | ckpt | log |
| DiCo-L-400k | 256x256 | 1.0 | 13.66 | 91.37 | 463.9M | 60.24G | ckpt | log |
| DiCo-XL-400k | 256x256 | 1.0 | 11.67 | 100.42 | 701.2M | 87.30G | ckpt | log |
| DiCo-XL-3750k | 256x256 | 1.4 | 2.05 | 282.17 | 701.2M | 87.30G | ckpt | log |
Similar to fast-DiT, we use VAE to extract ImageNet features before starting training:
torchrun --nnodes=1 --nproc_per_node=1 --master_port=1234 extract_features.py \
--model DiT-XL/2 \
--data-path /path/to/imagenet/train \
--features-path /path/to/store/featuresTo launch DiCo-XL (256x256) training with 8 GPUs on 1 node:
export WANDB_API_KEY='YOUR_WANDB_API_KEY'
accelerate launch \
--multi_gpu \
--num_processes=8 \
--main_process_port=1234 \
--mixed_precision=no \
train_accelerate.py \
--feature-path=/path/to/store/features \
--image-size=256 \
--model-domain=dico \
--model=DiCo-XL \
--results-dir=/path/to/store/exp/results \
--exp-name=DiCo-XL-256To launch DiCo-XL (256x256) training with 32 GPUs on 4 nodes:
export WANDB_API_KEY='YOUR_WANDB_API_KEY'
accelerate launch \
--multi_gpu \
--num_processes=32 \
--num_machines=4 \
--main_process_ip=... \
--main_process_port=1234 \
--machine_rank=... \
--mixed_precision=no \
train_accelerate.py \
--feature-path=/path/to/store/features \
--image-size=256 \
--model-domain=dico \
--model=DiCo-XL \
--results-dir=/path/to/store/exp/results \
--exp-name=DiCo-XL-256To sample 50K images from our pre-trained DiCo-XL (400K iters, w/o cfg, FID=11.67) model over 8 GPUs, run:
torchrun --nnodes=1 --nproc_per_node=8 --master-port=1234 \
sample_ddp.py \
--ckpt=/path/to/DiCo-XL-400K-256x256.pt \
--model=DiCo-XL \
--model-domain=dico \
--cfg-scale=1.0 \
--global-seed=1234To sample 50K images from our pre-trained DiCo-XL (3750K iters, w/ cfg=1.4, FID=2.05) model over 8 GPUs, run:
torchrun --nnodes=1 --nproc_per_node=8 --master-port=1234 \
sample_ddp.py \
--ckpt=/path/to/DiCo-XL-3750K-256x256.pt \
--model=DiCo-XL \
--model-domain=dico \
--cfg-scale=1.4 \
--global-seed=1234These scripts generate a folder of samples as well as a .npz file which can be directly used with ADM's TensorFlow evaluation suite to compute FID, Inception Score and other metrics.
The provided code and pre-trained weights are licensed under the Apache 2.0 license.
This code is based on DiT, fast-DiT and U-DiT. We thank the authors for their awesome work.
If you have any questions, please feel free to reach me out at shallowdream555@gmail.com.
If you find our work useful for your research, please consider citing our paper:
@inproceedings{ai2025dico,
title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=UnslcaZSnb}
}

