Skip to content

LDenninger/CamC2V

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CamC2V: Context-aware Controllable Video Generation

Luis Denninger1 · Sina Mokhtarzadeh Azar1 · Jürgen Gall1,2 ·

1University of Bonn, Germany 2Lamarr Institute for Machine Learning and Artificial Intelligence

Paper PDF Paper PDF

📈 Results

All our results are reported after 50K training steps using 25 DDIM steps with a guidance scale of 7.5 as reported by the baseline. Our model performs best at a guidance scale of 3.5

Method FVD (VideoGPT) FVD (StyleGAN) MSE TransErr RotErr CamMC
MotionCtrl 78.30 64.47 3654.54 2.89 2.04 4.34
CameraCtrl 71.22 58.05 3130.63 2.54 1.84 3.85
CamI2V 71.01 57.90 2692.84 1.79 1.16 2.58
Ours 53.90 45.36 2579.96 1.53 1.09 2.29

🔧 Installation

Initialize your python environment and install the PyTorch library like:

conda create -n camcontexti2v python=3.10
conda activate camcontexti2v
conda install -y pytorch==2.4.1 torchvision==0.19.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y xformers -c xformers

Install all other requirements using:

pip install -r requirements.txt

Checkpoints

Finally, download all required checkpoints and place them as follows:

Model Location
CamContextI2V ./ckpts/256_camcontexti2v.pt
DynamiCrafter ./ckpts/dynamicrafter/model.ckpt
CamI2V ./ckpts/256_cami2v.pt
CameraCtrl ./ckpts/256_cameractrl.pt
MotionCtrl ./ckpts/256_motionctrl.pt
I3D (VideoGPT) .ckpts/videogpt/i3d_pretrained_400.pt
I3D (StyleGAN) .ckpts/stylegan/i3d_torchscript.pt

Evaluation

For the evaluation pipeline you additionally need to install the following requirements.

FVD

Simply clone and install the following repository:

git clone [email protected]:LDenninger/FVD.git evaluation
pip install -e evaluation/FVD

Glomap

To install Glomap, please first install Colmap using the provided instructions, then follow the installation guide in the repository.

📥 Data

This project uses the RealEstate10K dataset which needs to be downloaded from YouTube. To get the meta data for the videos first obtain:

wget https://storage.cloud.google.com/realestate10k-public-files/RealEstate10K.tar.gz

How you obtain and unpack the dataset is up to you but we recommend following the guide as proposed here.

Additionally you will need the video captions generated by CameraCtrl.

The final dataset should have the following structure:

 ─┬─ RealEstate10K
  ├─┬─ valid_meta           # Directories holding txt files containg all meta data
  │ │─── train
  │ └─── test
  ├─┬─ video_clips          # Directories holding the video clips
  │ │─── train
  │ └─── test
  ├─── test_captions.json   # Test captions
  ├─── train_captions.json  # Train captions
  ├─── train_valid_list.txt # File containing all train video names
  └─── test_valid_list.txt  # File containg all test video names

🚀 Getting Started

This projects defines the directory and machine setup in CamContextI2V/utils/meta.py. Before running anything, please adjust this file to your setup.

Training your model:

python CamContextI2V/01_train.py -r <run name> -c <config file> -m <machine to run on>

For in-detail information on the command line arguments run python CamContextI2V/01_train.py -h.

Running inference:

python CamContextI2V/02_generate_videos.py <run name>

For in-detail information on the command line arguments run python CamContextI2V/02_generate_videos.py -h.

Evaluation:

python CamContextI2V/03_evaluation.py -p <video path> -o <output path> --max-videos-in-mem <Images in RAM> [--fvd/--extended/--glomap]

Visualization: To start and interactive gradio visualization, run:

python CamContextI2V/04_visualize.py

🙏 Acknowledgements

We thank the authors of CamI2V for their implementation of the camera pose conditioning and the authors of DynamiCrafter for the implementation of the base model.

📄 Citation

@article{denninger2025camcontexti2v,
  title={CamContextI2V: Context-aware Controllable Video Generation},
  author={Denninger, Luis and Mokhtarzadeh Azar, Sina and Gall, Juergen},
  journal={},
  year={2025}
}

About

[3DV2026] Official repository for "CamC2V: Context-aware Controllable Video Generation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors