Upload README.md with huggingface_hub

add5bc2 verified 5 days ago

7.09 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Wan-AI/Wan2.2-TI2V-5B
	- Wan-AI/Wan2.2-I2V-A14B
	pipeline_tag: video-to-video
	tags:
	- Diffusion
	- Video
	- V2V
	- IN2V
	---

	<table align="center">
	<tr>
	<td align="center" width="22%">
	<img src="docs/static/logos/lab_logo.png" alt="Lab logo" width="100%" />
	</td>
	<td align="center" width="56%">
	<h2 style="font-size:36px; margin:0;">ϕ-Noise:<br>Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation</h2>
	<a href="https://arxiv.org/abs/2605.24509">
	<img src="https://img.shields.io/badge/arXiv-paper-b31b1b?style=flat-square&logo=arxiv&logoColor=white" alt="arXiv" />
	</a>
	<a href="https://ofir1080.github.io/phi-noise/">
	<img src="https://img.shields.io/badge/Web-page-1f77b4?style=flat-square&logo=github&logoColor=white" alt="Web page" />
	</a>
	<a href="https://arxiv.org/pdf/2605.24509">
	<img src="https://img.shields.io/badge/PDF-download-0066cc?style=flat-square&logo=adobeacrobatreader&logoColor=white" alt="PDF" />
	</a>
	</td>
	<td align="center" width="22%">
	<img src="docs/static/logos/uni_logo.png" alt="University logo" width="100%" />
	</td>
	</tr>
	</table>

	### An official implementatiton of the paper. ###

	Φ-Noise enables motion and structure conditioning for diffusion-based video generation. By utilizing low-frequency components in either the spatial or temporal dimensions, it facilitates precise motion transfer and supports three key applications:
	- Image-to-video motion Transfer
	- Text-to-video Motion Transfer + Structural Conditioning
	- Cut-n-Drag (interactive user control over object trajectories and spatial placement)

	\| I2V Motion Transfer \| T2V Motion Transfer \| Cut n' Drag \|
	\| :---: \| :---: \| :---: \|
	\| <img src="docs/static/media/results/i2v.gif" alt="I2V Motion Transfer" width="90%"> \| <img src="docs/static/media/results/t2v.gif" alt="T2V Motion Transfer" width="90%"> \| <img src="docs/static/media/results/cnd.gif" alt="Cut n' Drag" width="100%"> \|


	### Contents ###
	- `phi_noise_utils.py`: core frequency-mixing utilities.
	- `video_processing_utils.py`: Video utilities: preprocessing and adjusting sizes/lengths.
	- `Wan2.2_phi-noise/`: A fork of [Wan2.2 official GitHub](https://github.com/Wan-Video/Wan2.2) with small adjustments for the integration of our method. \
	Note: You have to git-clone it from the root directory (`git clone git@github.com:ofir1080/Wan2.2_phi-noise.git`).


	### Highlights ###
	- Φ-Noise is training-free temporal conditioning via phase/magnitude mixing in frequency domain.
	- this code (`freq_mix_temporal` and `freq_mix_spatial` in [phi_noise_utils.py](phi_noise_utils.py#L1-L220) can be integrated easily with any diffusion-based video model.
	- We supply an example integration for Wan2.2 model [Wan2.2_phi-noise/generate.py](Wan2.2_phi-noise/generate.py#L1-L520).


	### Installation ###
	Φ-Noise uses [PyTorch](https://pytorch.org/) for frequecny decomposition (`torch.fft` module). \
	For installation instruction of Wan2.2, please refer to [Wan2.2/INSTALL.md](https://github.com/Wan-Video/Wan2.2/blob/main/INSTALL.md).

	### Usage ###

	#### Φ-Noise + Wan2.2 ####

	For a new input video, first preprocess it with `video_processing_utils.py` so the FPS, frame size, and clip length match the model requirements. This saves the preprocessed video in addition to the first frame (for I2V Motio Transfer).

	Run the Wan example script (multi-GPU via torch.distributed.run). Make sure both the workspace root and the Wan folder are on `PYTHONPATH` so `phi_noise_utils` and `wan` import correctly. Example commands (adjust `--nproc_per_node`, `--ulysses_size`, `CUDA_VISIBLE_DEVICES`, and `--ckpt_dir`):

	T2V Motion Trasfer + Structural Conditioning:
	```bash
	export PYTHONPATH=absolute-path/phi-noise/Wan2.2_phi-noise
	export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
	python -m torch.distributed.run \
	--nproc_per_node 8 --master_port 29501 Wan2.2_phi-noise/generate.py \
	--ulysses_size 8 --task t2v-A14B --size "832*480" --sample_steps 20 \
	--ckpt_dir /path/to/checkpoints --offload_model False --convert_model_dtype \
	--dit_fsdp --prompt "A yellow helicopter is flying in the beach. Camera is fixed and static. Fixed Background." \
	--pn_ref_path guidance_exmaples/preprocessed_14B-low_81f_duck.mp4 --pn_task t2v_mt \
	--pn_gamma 5 --pn_alpha 4
	```

	I2V Motion Trasfer:
	```bash
	export PYTHONPATH=absolute-path/to/phi-noise/Wan2.2_phi-noise
	export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
	python -m torch.distributed.run \
	--nproc_per_node 8 --master_port 29501 Wan2.2_phi-noise/generate.py \
	--ulysses_size 8 --task t2v-A14B --size "832*480" --sample_steps 20 \
	--ckpt_dir /path/to/checkpoints --offload_model False --convert_model_dtype \
	--dit_fsdp --prompt "The cat is turning its head towards the camera and after a second starts waving hello its right paw. Camera is fixed and static. Fixed Background." \
	--image "guidance_exmaples/mt-it2m/cat_in_nature.jpg" \
	--pn_ref_path guidance_exmaples/mt-it2m/preprocessed_14B-low_81f_woman_turning.mp4 \
	--pn_task i2v_mt \
	--pn_gamma 3 --pn_alpha 3
	```

	Cut n' Drag:
	```bash
	export PYTHONPATH=absolute-path/phi-noise/Wan2.2_phi-noise
	export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
	python -m torch.distributed.run \
	--nproc_per_node 8 --master_port 29501 Wan2.2_phi-noise/generate.py \
	--ulysses_size 8 --task i2v-A14B --size "832*480" --sample_steps 20 \
	--ckpt_dir /path/to/checkpoints --offload_model False --convert_model_dtype --dit_fsdp \
	--prompt "A flock of birds flies gracefully across the sky above a natural landscape." \
	--image "guidance_exmaples/cut_n_drag/preprocessed_14B-low_81f_birds_ff.png"\
	--pn_ref_path guidance_exmaples/cut_n_drag/preprocessed_14B-low_81f_birds.mp4 \
	--pn_task t2v_mt \
	--pn_gamma 30 --pn_alpha 3
	```
	Tip: To run with multiple gamma or alpha values, pass them with `#` separators, for example: `--pn_alpha arg1#arg2#arg3`.

	#### General Usage ####
	As utilities in your own code (recommended):

	```python
	from phi_noise_utils import freq_mix_temporal, freq_mix_spatial

	# temporal Φ-noise (for I2V-related tasks)
	latents = freq_mix_temporal(noise_latents, ref_latents, alpha=3, gamma=30.0) # recommended range values: gamma: alpha: [3-6], gamma: [30]

	# spatial Φ-noise (for T2V Motion Transfer + Structural Conditioning)
	mixed_latents = freq_mix_spatial(noise_latents, ref_latents, alpha=3, gamma=4.0, dims=("h","w")) # recommended range values: gamma: alpha: [3-4], gamma: [5-10]
	```


	### Citation ###
	```
	@article{abramovich2025phinoise,
	title = {ϕ-Noise: Training-Free Temporal Video Conditioning
	via Phase-Based Noise Manipulation},
	author = {Abramovich, Ofir and Cohen, Nadav Z. and
	Rosenthal, Adi and Shamir, Ariel},
	journal = {arXiv preprint},
	year = {2025},
	}
	```

	### Acknowledgments ###
	This repository uses a fork of [Wan2.2](https://github.com/Wan-Video/Wan2.2) codebase.

	### License ###
	This project is licensed under the Apache License 2.0.