AnyFlow

🖥️ GitHub ｜ 🤗 Hugging Face ｜ 📑 Paper ｜ 🌐 Website

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

In this repository, we present AnyFlow, the first any-step video diffusion framework built on flow maps. AnyFlow offers these key features:

⚡ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added.
🔀 Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models.
🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model.
📈 Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters.

This directory contains AnyFlow-FAR-Wan2.1-1.3B-Diffusers (a 1.3B causal video diffusion model) in Hugging Face Diffusers format, derived from the Wan2.1-T2V-1.3B-Diffusers text-to-video backbone.

Video Demos

🔥 Latest News!!

May 4, 2026: 👋 We've released the codebase and weights of AnyFlow.

Quickstart

Setup Environment

1️⃣ Create Conda Environment

conda create -n far python=3.10
conda activate far

2️⃣ Install PyTorch and Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt --no-build-isolation

Model Download

Model	Tasks	Resolution	Download Link
`AnyFlow-FAR-Wan2.1-1.3B-Diffusers`	T2V, I2V, V2V	480P	🤗 Hugging Face
`AnyFlow-FAR-Wan2.1-14B-Diffusers`	T2V, I2V, V2V	480P	🤗 Hugging Face
`AnyFlow-Wan2.1-T2V-14B-Diffusers`	T2V	480P	🤗 Hugging Face
`AnyFlow-Wan2.1-T2V-1.3B-Diffusers`	T2V	480P	🤗 Hugging Face

Download models using 🤗 hf download:

pip install "huggingface_hub[cli]"

hf download nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers --repo-type model --local-dir experiments/pretrained_models/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

Run Text-to-Video Generation with Diffusers

import torch
from diffusers.utils import export_to_video

from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline

model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)

prompt = "CG game concept digital art, a majestic elephant with a vibrant tusk and sleek fur running swiftly towards a herd of its kind."

video = pipeline(
    prompt=prompt,
    height=480,
    width=832,
    num_frames=81,
    num_inference_steps=4,
    generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)

Run Image-to-Video Generation with Diffusers

import torch
from diffusers.utils import export_to_video
from PIL import Image
from torchvision import transforms

from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline

model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)

# load image
image_path = 'assets/example_image.jpg'
prompt = 'A towering, battle-scarred humanoid robot walking through the skeletal remains of a city ruin.'

image = Image.open(image_path).convert('RGB')
image = transforms.ToTensor()(transforms.Resize([480, 832])(image)).unsqueeze(0).unsqueeze(0)

video = pipeline(
    prompt=prompt,
    context_sequence={'raw': image},
    height=480,
    width=832,
    num_frames=81,
    num_inference_steps=4,
    generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)

Run Video-to-Video Generation with Diffusers

import torch
from diffusers.utils import export_to_video
import decord
from torchvision import transforms

from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline

decord.bridge.set_bridge('torch')

model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)

# load video
video_path = 'assets/example_video.mp4'
prompt = "A focused trail runner's powerful strides through a dense, sun-dappled forest."

video_reader = decord.VideoReader(video_path)
frame_idxs = select_frame_indices(len(video_reader), video_reader.get_avg_fps(), target_fps=16)[:num_cond_frames]
frames = video_reader.get_batch(frame_idxs)
frames = (frames / 255.0).float().permute(0, 3, 1, 2).contiguous()
frames = transforms.Resize([480, 832])(frames).unsqueeze(0)

video = pipeline(
    prompt=prompt,
    context_sequence={'raw': frames},
    height=480,
    width=832,
    num_frames=81,
    num_inference_steps=4,
    generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)

License

This model is released under the NVIDIA One-Way Noncommercial License (NSCLv1).

Under the NVIDIA One-Way Noncommercial License (NSCLv1), NVIDIA confirms:

Models are not for commercial use.
NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.

Citation

If you find our work helpful, please cite us.

@article{gu2026anyflow,
  title   = {AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation},
  author  = {...},
  journal = {...},
  year    = {2026}
}

@article{gu2025long,
    title={Long-Context Autoregressive Video Modeling with Next-Frame Prediction},
    author={Gu, Yuchao and Mao, weijia and Shou, Mike Zheng},
    journal={arXiv preprint arXiv:2503.19325},
    year={2025}
}

Acknowledgements

This codebase is built on Diffusers. We also refer to implementations from FAR, Self-Forcing, and TiM. We thank the authors for open-sourcing their work.

Downloads last month: -

Model tree for nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

Base model

Wan-AI/Wan2.1-T2V-1.3B-Diffusers

Finetuned

(9)

this model

Collection including nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

AnyFlow

Collection

Any-Step Video Diffusion Model with On-Policy Flow Map Distillation • 4 items • Updated about 6 hours ago • 3

Paper for nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published Mar 25, 2025 • 73