|
|
--- |
|
|
pipeline_tag: image-to-video |
|
|
--- |
|
|
|
|
|
# InfCam: Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation |
|
|
|
|
|
This repository contains the InfCam model presented in the paper [Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation](https://huggingface.co/papers/2512.17040). |
|
|
|
|
|
Project Page: [https://emjay73.github.io/InfCam/](https://emjay73.github.io/InfCam/) |
|
|
Code: [https://github.com/emjay73/InfCam](https://github.com/emjay73/InfCam) |
|
|
|
|
|
<div align="center"> |
|
|
[Min-Jung Kim<sup>*</sup>](https://emjay73.github.io/), [Jeongho Kim<sup>*</sup>](https://scholar.google.co.kr/citations?user=4SCCBFwAAAAJ&hl=ko), [Hoiyeong Jin<sup></sup>](https://scholar.google.co.kr/citations?hl=ko&user=Jp-zhtUAAAAJ), [Junha Hyung<sup></sup>](https://junhahyung.github.io/), [Jaegul Choo<sup></sup>](https://sites.google.com/site/jaegulchoo) |
|
|
<br> |
|
|
*Equal Contribution |
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/emjay73/InfCam/resolve/main/assets/GSAI_preview_image.png" width="20%" alt="GSAI Preview"> |
|
|
</p> |
|
|
</div> |
|
|
|
|
|
## Teaser Video |
|
|
https://github.com/user-attachments/assets/1c52baf4-b5ff-417e-a6c6-c8570e667bd8 |
|
|
|
|
|
## Introduction |
|
|
|
|
|
InfCam is a depth-free, camera-controlled video-to-video generation framework with high pose fidelity. It aims to provide creators with cinematic camera control capabilities in post-production. |
|
|
**TL;DR:** Given a video and a target camera trajectory, InfCam generates a video that faithfully follows the specified camera path without depth prior. |
|
|
|
|
|
## π₯ Updates |
|
|
- [ ] Release training code |
|
|
- [x] Release inference code and model weights (2025.12.19) |
|
|
|
|
|
## βοΈ Code |
|
|
### Inference |
|
|
Step 1: Set up the environment |
|
|
|
|
|
``` |
|
|
conda create -n infcam python=3.12 |
|
|
conda activate infcam |
|
|
|
|
|
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121 |
|
|
pip install cupy-cuda12x |
|
|
pip install transformers==4.46.2 |
|
|
pip install sentencepiece |
|
|
pip install controlnet-aux==0.0.7 |
|
|
pip install imageio |
|
|
pip install imageio[ffmpeg] |
|
|
pip install safetensors |
|
|
pip install einops |
|
|
pip install protobuf |
|
|
pip install modelscope |
|
|
pip install ftfy |
|
|
pip install lpips |
|
|
pip install lightning |
|
|
pip install pandas |
|
|
pip install matplotlib |
|
|
pip install wandb |
|
|
pip install ffmpeg-python |
|
|
pip install numpy |
|
|
pip install opencv-python |
|
|
``` |
|
|
|
|
|
Step 2: Download the pretrained checkpoints |
|
|
1. Download the pre-trained Wan2.1 model |
|
|
|
|
|
```shell |
|
|
python download_wan2.1.py |
|
|
``` |
|
|
|
|
|
2. Download the pre-trained UniDepth model |
|
|
|
|
|
Download the pre-trained weights from [huggingface](https://huggingface.co/lpiccinelli/unidepth-v2-vitl14) and place it in ```models/unidepth-v2-vitl14```. |
|
|
```shell |
|
|
cd models |
|
|
git clone https://huggingface.co/lpiccinelli/unidepth-v2-vitl14 |
|
|
``` |
|
|
|
|
|
3. Download the pre-trained InfCam checkpoint |
|
|
|
|
|
Download the pre-trained InfCam weights from [huggingface](https://huggingface.co/emjay73/InfCam/tree/main) and place it in ```models/InfCam```. |
|
|
|
|
|
```shell |
|
|
cd models |
|
|
git clone https://huggingface.co/emjay73/InfCam |
|
|
``` |
|
|
Step 3: Test the example videos |
|
|
|
|
|
```shell |
|
|
bash run_inference.sh |
|
|
``` |
|
|
or |
|
|
```shell |
|
|
for CAM in {1..10}; do |
|
|
CUDA_VISIBLE_DEVICES=0 python inference_infcam.py \ |
|
|
--cam_type ${CAM} \ |
|
|
--ckpt_path "models/InfCam/step35000.ckpt" \ |
|
|
--camera_extrinsics_path "./sample_data/cameras/camera_extrinsics_10types.json" \ |
|
|
--output_dir "./results/sample_data" \ |
|
|
--dataset_path "./sample_data" \ |
|
|
--metadata_file_name "metadata.csv" \ |
|
|
--num_frames 81 --width 832 --height 480 \ |
|
|
--num_inference_steps 20 \ |
|
|
--zoom_factor 1.0 \ |
|
|
--k_from_unidepth \ |
|
|
--seed ${SEED} |
|
|
done |
|
|
``` |
|
|
This inference code requires 48 GB of memory for UniDepth and 28 GB for the InfCam pipeline. |
|
|
|
|
|
Step 4: Test your own videos |
|
|
|
|
|
If you want to test your own videos, you need to prepare your test data following the structure of the ```sample_data``` folder. This includes N mp4 videos, each with at least 81 frames, and a ```metadata.csv``` file that stores their paths and corresponding captions. You can refer to the '[caption branch](https://github.com/emjay73/InfCam/tree/caption) for metadata.csv extraction. |
|
|
|
|
|
|
|
|
We provide several preset camera types, as shown in the table below. |
|
|
These follow the ReCamMaster presets, but the starting point of each trajectory differs from that of the initial frame. |
|
|
|
|
|
| cam_type | Trajectory | |
|
|
|-------------------|-----------------------------| |
|
|
| 1 | Pan Right | |
|
|
| 2 | Pan Left | |
|
|
| 3 | Tilt Up | |
|
|
| 4 | Tilt Down | |
|
|
| 5 | Zoom In | |
|
|
| 6 | Zoom Out | |
|
|
| 7 | Translate Up (with rotation) | |
|
|
| 8 | Translate Down (with rotation) | |
|
|
| 9 | Arc Left (with rotation) | |
|
|
| 10 | Arc Right (with rotation) | |
|
|
|
|
|
## π€ Thank You Note |
|
|
Our work is based on the following repositories.\ |
|
|
Thank you for your outstanding contributions! |
|
|
|
|
|
[ReCamMaster](https://jianhongbai.github.io/ReCamMaster/): Re-capture in-the-wild videos with novel camera trajectories, and release a multi-camera synchronized video dataset rendered with Unreal Engine 5. |
|
|
|
|
|
[WAN2.1](https://github.com/Wan-Video/Wan2.1): A comprehensive and open suite of video foundation models. |
|
|
|
|
|
[UniDepthV2](https://github.com/lpiccinelli-eth/UniDepth): Monocular metric depth estimation. |
|
|
|
|
|
## π Citation |
|
|
|
|
|
Please leave us a star π and cite our paper if you find our work helpful. |
|
|
```bibtex |
|
|
@article{kim2025infinite, |
|
|
title={Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation}, |
|
|
author={Kim, Min-Jung and Kim, Jeongho and Jin, Hoiyeong and Hyung, Junha and Choo, Jaegul}, |
|
|
journal={arXiv preprint arXiv:2512.17040}, |
|
|
year={2025} |
|
|
} |
|
|
``` |