| <h1 align="center">Time-to-Move</h1> | |
| <h2 align="center">Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising</h2> | |
| <p align="center"> | |
| <a href="https://www.linkedin.com/in/assaf-singer/">Assaf Singer</a><sup>†</sup> · | |
| <a href="https://rotsteinnoam.github.io/">Noam Rotstein</a><sup>†</sup> · | |
| <a href="https://www.linkedin.com/in/amir-mann-a890bb276/">Amir Mann</a> · | |
| <a href="https://ron.cs.technion.ac.il/">Ron Kimmel</a> · | |
| <a href="https://orlitany.github.io/">Or Litany</a> | |
| </p> | |
| <p align="center"><sup>†</sup> Equal contribution</p> | |
| <p align="center"> | |
| <a href="https://time-to-move.github.io/"> | |
| <img src="assets/logo_page.svg" alt="Project Page" width="125"> | |
| </a> | |
| <a href="https://arxiv.org/abs/2511.08633"> | |
| <img src="assets/logo_arxiv.svg" alt="Arxiv" width="125"> | |
| </a> | |
| <a href="https://arxiv.org/pdf/2511.08633"> | |
| <img src="assets/logo_paper.svg" alt="Paper" width="125"> | |
| </a> | |
| </p> | |
| <div align="center"> | |
| <img src="assets/teaser.gif" width="900" /><br/> | |
| <span style="color: inherit; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, 'Noto Sans', sans-serif;"> | |
| <big><strong>Warped</strong></big>         | |
| <big><strong>Ours</strong></big>          | |
| <big><strong>Warped</strong></big>         | |
| <big><strong>Ours</strong></big> | |
| </span> | |
| </div> | |
| <br> | |
| ## Table of Contents | |
| - [Inference](#inference) | |
| - [Dual Clock Denoising Guide](#dual-clock-denoising) | |
| - [Wan](#wan) | |
| - [CogVideoX](#cogvideox) | |
| - [Stable Video Diffusion](#stable-video-diffusion) | |
| - [Generate Your Own Cut-and-Drag Examples](#generate-your-own-cut-and-drag-examples) | |
| - [GUI guide](GUIs/README.md) | |
| - [TODO](#todo) | |
| - [BibTeX](#bibtex) | |
| ## Inference | |
| **Time-to-Move (TTM)** is a plug-and-play technique that can be integrated into any image-to-video diffusion model. | |
| We provide implementations for **Wan 2.2**, **CogVideoX**, and **Stable Video Diffusion (SVD)**. | |
| As expected, the stronger the base model, the better the resulting videos. | |
| Adapting TTM to new models and pipelines is straightforward and can typically be done in just a few hours. | |
| We **recommend using Wan**, which generally produces higher‑quality results and adheres more faithfully to user‑provided motion signals. | |
| For each model, you can use the [included examples](./examples/) or create your own as described in | |
| [Generate Your Own Cut-and-Drag Examples](#generate-your-own-cut-and-drag-examples). | |
| ### Dual Clock Denoising | |
| TTM depends on two hyperparameters that start different regions at different noise depths. In practice, we do not pass `tweak` and `tstrong` as raw timesteps. Instead we pass `tweak-index` and `tstrong-index`, which indicate the iteration at which each denoising phase begins out of the total `num_inference_steps` (50 for all models). | |
| Constraints: `0 ≤ tweak-index ≤ tstrong-index ≤ num_inference_steps`. | |
| * **tweak-index** — when the denoising process **outside the mask** begins. | |
| - Too low: scene deformations, object duplication, or unintended camera motion. | |
| - Too high: regions outside the mask look static (e.g., non-moving backgrounds). | |
| * **tstrong-index** — when the denoising process **within the mask** begins. In our experience, this depends on mask size and mask quality. | |
| - Too low: object may drift from the intended path. | |
| - Too high: object may look rigid or over-constrained. | |
| ### Wan | |
| To set up the environment for running Wan 2.2, follow the installation instructions in the official [Wan 2.2 repository](https://github.com/Wan-Video/Wan2.2). Our implementation builds on the [🤗 Diffusers Wan I2V pipeline](https://github.com/huggingface/diffusers/blob/345864eb852b528fd1f4b6ad087fa06e0470006b/src/diffusers/pipelines/wan/pipeline_wan_i2v.py) | |
| adapted for TTM using the I2V 14B backbone. | |
| #### Run inference (using the included Wan examples): | |
| ```bash | |
| python run_wan.py \ | |
| --input-path "./examples/cutdrag_wan_Monkey" \ | |
| --output-path "./outputs/wan_monkey.mp4" \ | |
| --tweak-index 3 \ | |
| --tstrong-index 7 | |
| ``` | |
| #### Good starting points: | |
| * Cut-and-Drag: tweak-index=3, tstrong-index=7 | |
| * Camera control: tweak-index=2, tstrong-index=5 | |
| <br> | |
| <details> | |
| <summary><big><strong>CogVideoX</strong></big></summary><br> | |
| To set up the environment for running CogVideoX, follow the installation instructions in the official [CogVideoX repository](https://github.com/zai-org/CogVideo). | |
| Our implementation builds on the [🤗 Diffusers CogVideoX I2V pipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py), which we adapt for Time-to-Move (TTM) using the CogVideoX-I2V 5B backbone. | |
| #### Run inference (on the included 49-frame CogVideoX example): | |
| ```bash | |
| python run_cog.py \ | |
| --input-path "./examples/cutdrag_cog_Monkey" \ | |
| --output-path "./outputs/cog_monkey.mp4" \ | |
| --tweak-index 4 \ | |
| --tstrong-index 9 | |
| ``` | |
| </details> | |
| <br> | |
| <details> | |
| <summary><big><strong>Stable Video Diffusion</strong></big></summary> | |
| <br> | |
| To set up the environment for running SVD, follow the installation instructions in the official [SVD repository](https://github.com/Stability-AI/generative-models). | |
| Our implementation builds on the [🤗 Diffusers SVD I2V pipeline](https://github.com/huggingface/diffusers/blob/8abc7aeb715c0149ee0a9982b2d608ce97f55215/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py#L147 | |
| ), which we adapt for Time-to-Move (TTM). | |
| #### To run inference (on the included 21-frame SVD example): | |
| ```bash | |
| python run_svd.py \ | |
| --input-path "./examples/cutdrag_svd_Fish" \ | |
| --output-path "./outputs/svd_fish.mp4" \ | |
| --tweak-index 16 \ | |
| --tstrong-index 21 | |
| ``` | |
| </details> | |
| <br> | |
| ## Generate Your Own Cut-and-Drag Examples | |
| We provide an easy-to-use GUI for creating cut-and-drag examples that can later be used for video generation in **Time-to-Move**. We recommend reading the [GUI guide](GUIs/README.md) before using it. | |
| <p align="center"> | |
| <img src="assets/gui.png" alt="Cut-and-Drag GUI Example" width="400"> | |
| </p> | |
| To get started quickly, create a new environment and run: | |
| ```bash | |
| pip install PySide6 opencv-python numpy imageio imageio-ffmpeg | |
| python GUIs/cut_and_drag.py | |
| ``` | |
| <br> | |
| ### TODO 🛠️ | |
| - [x] Wan 2.2 run code | |
| - [x] CogVideoX run code | |
| - [x] SVD run code | |
| - [x] Cut-and-Drag examples | |
| - [x] Camera-control examples | |
| - [x] Cut-and-Drag GUI | |
| - [x] Cut-and-Drag GUI guide | |
| - [ ] Evaluation code | |
| ## BibTeX | |
| ``` | |
| @misc{singer2025timetomovetrainingfreemotioncontrolled, | |
| title={Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising}, | |
| author={Assaf Singer and Noam Rotstein and Amir Mann and Ron Kimmel and Or Litany}, | |
| year={2025}, | |
| eprint={2511.08633}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2511.08633}, | |
| } | |
| ``` | |