[ICML 2026] CameraNoise: Enabling Faithful Camera Control in Video Diffusion through Geometry-Flow-Guided Noise Warping
Haoyu Zhao, Jiaxi Gu, Haoran Chen, Qingping Zheng, Yeying Jin, Hongyi Yang, Junqi Cheng, Yuang Zhang, Zenghui Lu, Huan Yu, Jie Jiang, Peng Shu, Zuxuan Wu, Yu-Gang Jiang
Fudan University, Tencent.
CameraNoise-I2V Model
This repository hosts the CameraNoise-I2V model weights for image-to-video generation with faithful camera-motion control.
Given a reference image and a reference video, CameraNoise estimates the camera motion from the reference video, converts it into temporally coherent CameraNoise, and uses it to guide Wan2.1-I2V generation. This allows the generated video to follow the reference camera trajectory while preserving visual quality and temporal consistency.
Model Files
The model weights are organized by resolution:
CameraNoise-I2V/
1024x576/
cameranoise_i2v_wan2.1_1024x576_lora.safetensors
i2v_demo_results/
demo1
demo2
demo3
...
demo10
Installation
Please use the official CameraNoise GitHub repository for inference:
git clone https://github.com/gulucaptain/CameraNoise
cd CameraNoise
pip install -r requirements.txt
The following checkpoints are required:
VGGT checkpoint
QwenVL checkpoint
Wan2.1-I2V-14B-720P checkpoint
CameraNoise-I2V LoRA checkpoint
Prepare Inputs
Each demo should be placed in a separate folder under outputs/. Put the reference image and reference video in the inputs/ folder:
outputs/demo1/
inputs/
example.jpg # reference image
example.mp4 # reference video for camera motion
The script will automatically generate the image caption, camera motion conditions, CameraNoise, and final video.
Inference
576 x 1024 Model
python cameranoise_i2v.py \
--demo-dir outputs/demo1 \
--vggt-ckpt /path/to/VGGT-1B \
--cameranoise-config cameranoise_warping/configs/default.yaml \
--qwenvl-model-path /path/to/Qwen2-VL-7B-Instruct \
--model-root /path/to/Wan2.1-I2V-14B-720P \
--lora-path /path/to/CameraNoise-I2V/1024x576/cameranoise_lora.safetensors \
--height 576 \
--width 1024 \
--frames 49 \
--cfg 3.5 \
--device cuda \
--output-type single
CameraNoise Resolution
The spatial size of CameraNoise is automatically inferred from the output video resolution:
cameranoise_downscale_size = [height // 8, width // 8]
Recommended settings:
576 x 1024 -> [72, 128]
768 x 768 -> [96, 96]
You can also manually specify the CameraNoise size:
--cameranoise-downscale-size 72,128
Outputs
After inference, the generated files will be saved under the demo folder:
outputs/demo1/
conditions/
caption.txt
noises/
*_noises.npy
*_visualization.mp4
samples/
demo1.mp4
demo1_compare.mp4
manifest.json
Links
- Project Page: https://gulucaptain.github.io/CameraNoise/
- GitHub: https://github.com/gulucaptain/CameraNoise
- Hugging Face Model: https://huggingface.co/gulucaptain/CameraNoise-I2V
Citation
@inproceedings{zhao2026cameranoise,
title = {CameraNoise: Enabling Faithful Camera Control in Video Diffusion through Geometry-Flow-Guided Noise Warping},
author = {Zhao, Haoyu and Gu, Jiaxi and Chen, Haoran and Zheng, Qingping and Jin, Yeying and Yang, Hongyi and Cheng, Junqi and Zhang, Yuang and Lu, Zenghui and Yu, Huan and Jiang, Jie and Shu, Peng and Wu, Zuxuan and Jiang, Yu-Gang},
booktitle = {Proceedings of the Forty-third International Conference on Machine Learning},
year = {2026}
}
## Disclaimer
This model is released for research purposes. Please refer to the GitHub repository for the complete codebase, detailed installation instructions, and inference scripts.