AnyFlow

πŸ–₯️ GitHub    |    πŸ€— Hugging Face    |    πŸ“‘ Paper    |    🌐 Website


AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

In this repository, we present AnyFlow, the first any-step video diffusion framework built on flow maps. AnyFlow offers these key features:

  • ⚑ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added.

  • πŸ”€ Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models.

  • 🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model.

  • πŸ“ˆ Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters.

This directory contains AnyFlow-FAR-Wan2.1-1.3B-Diffusers (a 1.3B causal video diffusion model) in Hugging Face Diffusers format, derived from the Wan2.1-T2V-1.3B-Diffusers text-to-video backbone.

Video Demos

πŸ”₯ Latest News!!

  • May 4, 2026: πŸ‘‹ We've released the codebase and weights of AnyFlow.

Quickstart

Setup Environment

1️⃣ Create Conda Environment

conda create -n far python=3.10
conda activate far

2️⃣ Install PyTorch and Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt --no-build-isolation

Model Download

Model Tasks Resolution Download Link
AnyFlow-FAR-Wan2.1-1.3B-Diffusers T2V, I2V, V2V 480P πŸ€— Hugging Face
AnyFlow-FAR-Wan2.1-14B-Diffusers T2V, I2V, V2V 480P πŸ€— Hugging Face
AnyFlow-Wan2.1-T2V-14B-Diffusers T2V 480P πŸ€— Hugging Face
AnyFlow-Wan2.1-T2V-1.3B-Diffusers T2V 480P πŸ€— Hugging Face

Download models using πŸ€— hf download:

pip install "huggingface_hub[cli]"

hf download nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers --repo-type model --local-dir experiments/pretrained_models/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

Run Text-to-Video Generation with Diffusers

import torch
from diffusers.utils import export_to_video

from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline

model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)

prompt = "CG game concept digital art, a majestic elephant with a vibrant tusk and sleek fur running swiftly towards a herd of its kind."

video = pipeline(
    prompt=prompt,
    height=480,
    width=832,
    num_frames=81,
    num_inference_steps=4,
    generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)

Run Image-to-Video Generation with Diffusers

import torch
from diffusers.utils import export_to_video
from PIL import Image
from torchvision import transforms

from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline

model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)

# load image
image_path = 'assets/example_image.jpg'
prompt = 'A towering, battle-scarred humanoid robot walking through the skeletal remains of a city ruin.'

image = Image.open(image_path).convert('RGB')
image = transforms.ToTensor()(transforms.Resize([480, 832])(image)).unsqueeze(0).unsqueeze(0)

video = pipeline(
    prompt=prompt,
    context_sequence={'raw': image},
    height=480,
    width=832,
    num_frames=81,
    num_inference_steps=4,
    generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)

Run Video-to-Video Generation with Diffusers

import torch
from diffusers.utils import export_to_video
import decord
from torchvision import transforms

from far.pipelines.pipeline_far_wan_anyflow import FARWanAnyFlowPipeline

decord.bridge.set_bridge('torch')

model_id = "nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers"
pipeline = FARWanAnyFlowPipeline.from_pretrained(model_path).to('cuda', dtype=torch.bfloat16)

# load video
video_path = 'assets/example_video.mp4'
prompt = "A focused trail runner's powerful strides through a dense, sun-dappled forest."

video_reader = decord.VideoReader(video_path)
frame_idxs = select_frame_indices(len(video_reader), video_reader.get_avg_fps(), target_fps=16)[:num_cond_frames]
frames = video_reader.get_batch(frame_idxs)
frames = (frames / 255.0).float().permute(0, 3, 1, 2).contiguous()
frames = transforms.Resize([480, 832])(frames).unsqueeze(0)

video = pipeline(
    prompt=prompt,
    context_sequence={'raw': frames},
    height=480,
    width=832,
    num_frames=81,
    num_inference_steps=4,
    generator=torch.Generator('cuda').manual_seed(0)
).frames[0]
export_to_video(output, "output.mp4", fps=16)

License

This model is released under the NVIDIA One-Way Noncommercial License (NSCLv1).

Under the NVIDIA One-Way Noncommercial License (NSCLv1), NVIDIA confirms:

  • Models are not for commercial use.
  • NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.

Citation

If you find our work helpful, please cite us.

@article{gu2026anyflow,
  title   = {AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation},
  author  = {...},
  journal = {...},
  year    = {2026}
}

@article{gu2025long,
    title={Long-Context Autoregressive Video Modeling with Next-Frame Prediction},
    author={Gu, Yuchao and Mao, weijia and Shou, Mike Zheng},
    journal={arXiv preprint arXiv:2503.19325},
    year={2025}
}

Acknowledgements

This codebase is built on Diffusers. We also refer to implementations from FAR, Self-Forcing, and TiM. We thank the authors for open-sourcing their work.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

Finetuned
(9)
this model

Collection including nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers

Paper for nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers