--- license: other language: - en base_model: - Wan-AI/Wan2.1-T2V-1.3B-Diffusers pipeline_tag: text-to-video tags: - Any-Step - Text-to-Video - Image-to-Video - Video-to-Video --- # AnyFlow

🖥️ GitHub    |    🤗 Hugging Face    |    📑 Paper    |    🌐 Website

----- **AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation** In this repository, we present **AnyFlow**, the first any-step video diffusion framework built on flow maps. **AnyFlow** offers these key features: - ⚡ **Any-Step Generation**: Unlike traditional distilled models tied to fixed step budgets, **AnyFlow** enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. - 🔀 **Multiple Architectures**: **AnyFlow** supports any-step distillation for both **causal** and **bidirectional** video diffusion models. - 🎬 **Multiple Tasks**: **AnyFlow** supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. - 📈 **Scalable Performance**: **AnyFlow** is validated from **1.3B** up to **14B** parameters. This directory contains **AnyFlow-FAR-Wan2.1-1.3B-Diffusers** (a 1.3B causal video diffusion model) in Hugging Face Diffusers format, derived from the [**Wan2.1-T2V-1.3B-Diffusers**](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers) text-to-video backbone. ## Video Demos
## 🔥 Latest News!! * May 4, 2026: 👋 We've released the codebase and weights of AnyFlow. ## Quickstart ### Setup Environment **1️⃣ Create Conda Environment** ```bash conda create -n far python=3.10 conda activate far ``` **2️⃣ Install PyTorch and Dependencies** ```bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 pip install -r requirements.txt --no-build-isolation ``` ### Model Download | Model | Tasks | Resolution | Download Link | | ----- | ----- | ---------- | ------------- | | `AnyFlow-FAR-Wan2.1-1.3B-Diffusers` | T2V, I2V, V2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers) | | `AnyFlow-FAR-Wan2.1-14B-Diffusers` | T2V, I2V, V2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers) | | `AnyFlow-Wan2.1-T2V-14B-Diffusers` | T2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers) | | `AnyFlow-Wan2.1-T2V-1.3B-Diffusers` | T2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers) | Download models using 🤗 hf download: ``` pip install "huggingface_hub[cli]" hf download nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers --repo-type model --local-dir experiments/pretrained_models/AnyFlow-FAR-Wan2.1-1.3B-Diffusers ``` ### Run Text-to-Video Generation with Diffusers ```python import torch from diffusers.utils import export_to_video from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers" pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16) prompt = "CG game concept digital art, a majestic elephant with a vibrant tusk and sleek fur running swiftly towards a herd of its kind." video = pipeline( prompt=prompt, height=480, width=832, num_frames=81, num_inference_steps=4, generator=torch.Generator('cuda').manual_seed(0) ).frames[0] export_to_video(output, "output.mp4", fps=16) ``` ### Run Image-to-Video Generation with Diffusers ```python import torch from diffusers.utils import export_to_video from PIL import Image from torchvision import transforms from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers" pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16) # load image image_path = 'assets/example_image.jpg' prompt = 'A towering, battle-scarred humanoid robot walking through the skeletal remains of a city ruin.' image = Image.open(image_path).convert('RGB') image = transforms.ToTensor()(transforms.Resize([480, 832])(image)).unsqueeze(0).unsqueeze(0) video = pipeline( prompt=prompt, context_sequence={'raw': image}, height=480, width=832, num_frames=81, num_inference_steps=4, generator=torch.Generator('cuda').manual_seed(0) ).frames[0] export_to_video(output, "output.mp4", fps=16) ``` ### Run Video-to-Video Generation with Diffusers ```python import torch from diffusers.utils import export_to_video import decord from torchvision import transforms from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline decord.bridge.set_bridge('torch') model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers" pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16) # load video video_path = 'assets/example_video.mp4' prompt = "A focused trail runner's powerful strides through a dense, sun-dappled forest." video_reader = decord.VideoReader(video_path) frame_idxs = select_frame_indices(len(video_reader), video_reader.get_avg_fps(), target_fps=16)[:num_cond_frames] frames = video_reader.get_batch(frame_idxs) frames = (frames / 255.0).float().permute(0, 3, 1, 2).contiguous() frames = transforms.Resize([480, 832])(frames).unsqueeze(0) video = pipeline( prompt=prompt, context_sequence={'raw': frames}, height=480, width=832, num_frames=81, num_inference_steps=4, generator=torch.Generator('cuda').manual_seed(0) ).frames[0] export_to_video(output, "output.mp4", fps=16) ``` ## License This model is released under the NVIDIA One-Way Noncommercial License ([NSCLv1](LICENSE.md)). Under the NVIDIA One-Way Noncommercial License (NSCLv1), NVIDIA confirms: * Models are not for commercial use. * NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models. ## Citation If you find our work helpful, please cite us. ```bibtex @article{gu2026anyflow, title={AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation}, author={Gu, Yuchao and Fang, Guian and Jiang, Yuxin and Mao, Weijia and Han, Song and Cai, Han and Shou, Mike Zheng}, journal={arXiv preprint arXiv:2605.13724}, year={2026} } @article{gu2025long, title={Long-Context Autoregressive Video Modeling with Next-Frame Prediction}, author={Gu, Yuchao and Mao, weijia and Shou, Mike Zheng}, journal={arXiv preprint arXiv:2503.19325}, year={2025} } ``` ## Acknowledgements This codebase is built on [Diffusers](https://github.com/huggingface/diffusers). We also refer to implementations from [FAR](https://github.com/showlab/FAR), [Self-Forcing](https://github.com/guandeh17/Self-Forcing), and [TiM](https://github.com/WZDTHU/TiM). We thank the authors for open-sourcing their work.