Nicolas Sereyjol-Garros · Ellington Kirby · Victor Besnier · Nermin Samet
Valeo.ai, Paris, France
Accepted at ICRA 2026
LiDAR scene synthesis is an emerging solution to scarcity in 3D data for robotic tasks such as autonomous driving. Recent approaches employ diffusion or flow matching models to generate realistic scenes, but 3D data remains limited compared to RGB datasets with millions of samples. We introduce R3DPA, the first LiDAR scene generation method to unlock image-pretrained priors for LiDAR point clouds, and leverage self-supervised 3D representations for state-of-the-art results. Specifically, we (i) align intermediate features of our generative model with self-supervised 3D features, which substantially improves generation quality; (ii) transfer knowledge from largescale image-pretrained generative models to LiDAR generation, mitigating limited LiDAR datasets; and (iii) enable point cloud control at inference for object inpainting and scene mixing with solely an unconditional model. On the KITTI-360 benchmark R3DPA achieves state of the art performance.
LiDAR point cloud generation from range images commonly follows a two-stage approach: the VAE is trained independently and then frozen, while the generative model is trained on its latent space. In contrast, our method leverages priors from a backbone pretrained on large-scale image datasets. The alignment step trains the VAE from scratch while initializing and freezing the generative model with pretrained weights. This stage ensures that the latent space of our newly trained VAE remains compatible with the knowledge of the pretrained generative model. We then jointly optimize the VAE encoder and the generative model under the supervision of 3D representations. Range VAE denotes a model trained on range images.
If you find our work useful, please consider citing:
@inproceedings{sereyjol2026r3dpa,
title={Leveraging 3D Representation Alignment and RGB Pretrained Priors for LiDAR Scene Generation},
author={Nicolas Sereyjol-Garros and Ellington Kirby and Victor Besnier and Nermin Samet},
year={2026},
booktitle={ICRA},
}To set up our environment, please run:
git clone https://github.com/valeoai/R3DPA.git
cd R3DPA
conda env create -f environment.yml
conda activate r3dpaPut the KITTI-360 dataset under the dataset folder.
Download ScaLR
Install WaffleIron package
cd ..
git clone https://github.com/valeoai/WaffleIron
cd WaffleIron
pip install -e ./
cd ../R3DPAPre compute features
python feature_extraction/preprocess_scalr_fearures.py \
--dataset-path dataset \
--model-path pretrained_weights/scalr/WI_768-DINOv2_ViT_L_14-NS_KI_PDFollow the steps described in REPA-E
Or download our pretrained weights
- End-to-end training from scratch
bash scripts/train_e2e.sh - VAE alignment
Put in scripts/train_vae_align.sh the right SiT pretrained on RGB images checkpoint path and run
bash scripts/train_vae_align.sh - End-to-end tuning from pretrained weights
bash scripts/tuning_e2e.sh To generate samples and save them in a .npz file for evaluation, run the following script after after making sure the parameters match your model path.
bash scripts/sample.sh Download statistics from the release or recompute them with the following command.
python -m eval.extract_logits_dataset \
--save_path log/activations \
--dataset_path dataset/KITTI-360Install the following packages and run the evaluation script.
apt-get install libsparsehash-dev
pip install git+https://github.com/mit-han-lab/[email protected]python evaluate.py --config-path configs/eval/ablations/r3dpa.yaml| Method | FRID ×10⁰ |
FLD ×10⁻¹ |
FSVD ×10⁰ |
FPVD ×10⁰ |
JSD ×10⁻² |
MMD ×10⁻⁵ |
|---|---|---|---|---|---|---|
| UltraLiDAR | – | – | 73.59 | 65.83 | 74.72 | 123.30 |
| LiDM | 47.33 | 10.19 | 16.01 | 17.36 | 19.17 | 11.32 |
| LiDM w/ APE | 42.09 | 9.76 | 13.68 | 13.86 | 11.69 | 9.95 |
| R2DM | 15.54 | 7.89 | 12.67 | 13.21 | 5.78 | 8.50 |
| R2Flow | 8.87 | 8.36 | 20.80 | 20.27 | 5.97 | 7.84 |
| R3DPA (ours) | 8.46 | 6.34 | 9.83 | 11.00 | 5.67 | 8.72 |
-
Bold = best result; underlined = second-best result.
-
FRID and FLD measure generation quality in the range image level.
-
FSVD and FPVD measure quality in the point-cloud space.
-
JSD and MMD evaluate similarity in the bird’s-eye view.
This codebase is largely built upon:
We sincerely thank the authors for making their work publicly available.

