Text-to-Video
Diffusers
Safetensors
English
FARWanAnyFlowPipeline
Any-Step
Text-to-Video
Image-to-Video
Video-to-Video
Instructions to use nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: other | |
| language: | |
| - en | |
| base_model: | |
| - Wan-AI/Wan2.1-T2V-1.3B-Diffusers | |
| pipeline_tag: text-to-video | |
| tags: | |
| - Any-Step | |
| - Text-to-Video | |
| - Image-to-Video | |
| - Video-to-Video | |
| # AnyFlow | |
| <p align="center"> | |
| 🖥️ <a href="https://github.com/NVlabs/AnyFlow">GitHub</a> | 🤗 <a href="https://huggingface.co/collections/nvidia/anyflow">Hugging Face</a> | 📑 <a href="https://arxiv.org/">Paper</a> | 🌐 <a href="https://nvlabs.github.io/AnyFlow">Website</a> | |
| <br> | |
| </p> | |
| ----- | |
| **AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation** | |
| In this repository, we present **AnyFlow**, the first any-step video diffusion framework built on flow maps. **AnyFlow** offers these key features: | |
| - ⚡ **Any-Step Generation**: Unlike traditional distilled models tied to fixed step budgets, **AnyFlow** enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. | |
| - 🔀 **Multiple Architectures**: **AnyFlow** supports any-step distillation for both **causal** and **bidirectional** video diffusion models. | |
| - 🎬 **Multiple Tasks**: **AnyFlow** supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. | |
| - 📈 **Scalable Performance**: **AnyFlow** is validated from **1.3B** up to **14B** parameters. | |
| This directory contains **AnyFlow-FAR-Wan2.1-1.3B-Diffusers** (a 1.3B causal video diffusion model) in Hugging Face Diffusers format, derived from the [**Wan2.1-T2V-1.3B-Diffusers**](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers) text-to-video backbone. | |
| ## Video Demos | |
| <div align="center"> | |
| <video width="80%" autoplay loop muted playsinline controls> | |
| <source src="https://nvlabs.github.io/AnyFlow/assets/videos/demo_video.m4v" type="video/mp4"> | |
| Your browser does not support the video tag. | |
| </video> | |
| </div> | |
| ## 🔥 Latest News!! | |
| * May 4, 2026: 👋 We've released the codebase and weights of AnyFlow. | |
| ## Quickstart | |
| ### Setup Environment | |
| **1️⃣ Create Conda Environment** | |
| ```bash | |
| conda create -n far python=3.10 | |
| conda activate far | |
| ``` | |
| **2️⃣ Install PyTorch and Dependencies** | |
| ```bash | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 | |
| pip install -r requirements.txt --no-build-isolation | |
| ``` | |
| ### Model Download | |
| | Model | Tasks | Resolution | Download Link | | |
| | ----- | ----- | ---------- | ------------- | | |
| | `AnyFlow-FAR-Wan2.1-1.3B-Diffusers` | T2V, I2V, V2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers) | | |
| | `AnyFlow-FAR-Wan2.1-14B-Diffusers` | T2V, I2V, V2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers) | | |
| | `AnyFlow-Wan2.1-T2V-14B-Diffusers` | T2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers) | | |
| | `AnyFlow-Wan2.1-T2V-1.3B-Diffusers` | T2V | 480P | 🤗 [Hugging Face](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers) | | |
| Download models using 🤗 hf download: | |
| ``` | |
| pip install "huggingface_hub[cli]" | |
| hf download nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers --repo-type model --local-dir experiments/pretrained_models/AnyFlow-FAR-Wan2.1-1.3B-Diffusers | |
| ``` | |
| ### Run Text-to-Video Generation with Diffusers | |
| ```python | |
| import torch | |
| from diffusers.utils import export_to_video | |
| from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline | |
| model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers" | |
| pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16) | |
| prompt = "CG game concept digital art, a majestic elephant with a vibrant tusk and sleek fur running swiftly towards a herd of its kind." | |
| video = pipeline( | |
| prompt=prompt, | |
| height=480, | |
| width=832, | |
| num_frames=81, | |
| num_inference_steps=4, | |
| generator=torch.Generator('cuda').manual_seed(0) | |
| ).frames[0] | |
| export_to_video(output, "output.mp4", fps=16) | |
| ``` | |
| ### Run Image-to-Video Generation with Diffusers | |
| ```python | |
| import torch | |
| from diffusers.utils import export_to_video | |
| from PIL import Image | |
| from torchvision import transforms | |
| from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline | |
| model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers" | |
| pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16) | |
| # load image | |
| image_path = 'assets/example_image.jpg' | |
| prompt = 'A towering, battle-scarred humanoid robot walking through the skeletal remains of a city ruin.' | |
| image = Image.open(image_path).convert('RGB') | |
| image = transforms.ToTensor()(transforms.Resize([480, 832])(image)).unsqueeze(0).unsqueeze(0) | |
| video = pipeline( | |
| prompt=prompt, | |
| context_sequence={'raw': image}, | |
| height=480, | |
| width=832, | |
| num_frames=81, | |
| num_inference_steps=4, | |
| generator=torch.Generator('cuda').manual_seed(0) | |
| ).frames[0] | |
| export_to_video(output, "output.mp4", fps=16) | |
| ``` | |
| ### Run Video-to-Video Generation with Diffusers | |
| ```python | |
| import torch | |
| from diffusers.utils import export_to_video | |
| import decord | |
| from torchvision import transforms | |
| from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline | |
| decord.bridge.set_bridge('torch') | |
| model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers" | |
| pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16) | |
| # load video | |
| video_path = 'assets/example_video.mp4' | |
| prompt = "A focused trail runner's powerful strides through a dense, sun-dappled forest." | |
| video_reader = decord.VideoReader(video_path) | |
| frame_idxs = select_frame_indices(len(video_reader), video_reader.get_avg_fps(), target_fps=16)[:num_cond_frames] | |
| frames = video_reader.get_batch(frame_idxs) | |
| frames = (frames / 255.0).float().permute(0, 3, 1, 2).contiguous() | |
| frames = transforms.Resize([480, 832])(frames).unsqueeze(0) | |
| video = pipeline( | |
| prompt=prompt, | |
| context_sequence={'raw': frames}, | |
| height=480, | |
| width=832, | |
| num_frames=81, | |
| num_inference_steps=4, | |
| generator=torch.Generator('cuda').manual_seed(0) | |
| ).frames[0] | |
| export_to_video(output, "output.mp4", fps=16) | |
| ``` | |
| ## License | |
| This model is released under the NVIDIA One-Way Noncommercial License ([NSCLv1](LICENSE.md)). | |
| Under the NVIDIA One-Way Noncommercial License (NSCLv1), NVIDIA confirms: | |
| * Models are not for commercial use. | |
| * NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models. | |
| ## Citation | |
| If you find our work helpful, please cite us. | |
| ```bibtex | |
| @article{gu2026anyflow, | |
| title={AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation}, | |
| author={Gu, Yuchao and Fang, Guian and Jiang, Yuxin and Mao, Weijia and Han, Song and Cai, Han and Shou, Mike Zheng}, | |
| journal={arXiv preprint arXiv:2605.13724}, | |
| year={2026} | |
| } | |
| @article{gu2025long, | |
| title={Long-Context Autoregressive Video Modeling with Next-Frame Prediction}, | |
| author={Gu, Yuchao and Mao, weijia and Shou, Mike Zheng}, | |
| journal={arXiv preprint arXiv:2503.19325}, | |
| year={2025} | |
| } | |
| ``` | |
| ## Acknowledgements | |
| This codebase is built on [Diffusers](https://github.com/huggingface/diffusers). We also refer to implementations from [FAR](https://github.com/showlab/FAR), [Self-Forcing](https://github.com/guandeh17/Self-Forcing), and [TiM](https://github.com/WZDTHU/TiM). We thank the authors for open-sourcing their work. | |