notevil13
/

Image-to-Video
Diffusers
English
video generation
File size: 5,069 Bytes
a8ba480
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
datasets:
- vita-video-gen/svi-benchmark
language:
- en
tags:
- video generation
pipeline_tag: image-to-video
library_name: diffusers
license: mit
project_page: https://stable-video-infinity.github.io/homepage/
papers:
- title: 'Stable Video Infinity: Infinite-Length Video Generation with Error Recycling'
  authors:
  - Wuyang Li
  - Wentao Pan
  - Po-Chien Luan
  - Yang Gao
  - Alexandre Alahi
  url: https://huggingface.co/papers/2510.09212
  conference: arXiv preprint, 2025
---

<div align="center">

<h1>Stable Video Infinity: Infinite-Length Video Generation with Error Recycling<h1>

<p align="center">   
<a href="https://huggingface.co/papers/2510.09212">     <img src="https://img.shields.io/badge/Paper-HuggingFace-red?logo=huggingface&logoColor=yellow" alt="Paper on Hugging Face"/>   </a>   
<a href="https://stable-video-infinity.github.io/homepage/">     <img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"/>   </a>   
<a href="https://github.com/vita-epfl/Stable-Video-Infinity">     <img src="https://img.shields.io/badge/SVI-GitHub-black?logo=github&logoColor=white" alt="SVI on GitHub"/>   </a>   
<a href="https://huggingface.co/datasets/vita-video-gen/svi-benchmark">     <img src="https://img.shields.io/badge/SVI_Dataset-Hugging%20Face-orange?logo=huggingface&logoColor=yellow" alt="SVI Dataset"/>   </a>   
<a href="https://huggingface.co/vita-video-gen/svi-model">     <img src="https://img.shields.io/badge/SVI_models-Hugging%20Face-FFCC00?logo=huggingface&logoColor=yellow" alt="SVI Models"/>   </a> </p> </div>

## 🎯 About This Repository
**Stable-Video-Infinity(SVI)** is able to generate ANY-length videos with high temporal consistency, plausible scene transitions, and controllable streaming storylines in ANY domains.
This repository contains the model weights of SVI Family.

## 🌟 Key Highlights
- **OpenSVI**: Everything is open-sourced: training & evaluation scripts, datasets, and more.
- **Infinite Length**: No inherent limit on video duration; generate arbitrarily long stories (see the 10‑minute “Tom and Jerry” demo).
- **Versatile**: Supports diverse in-the-wild generation tasks: multi-scene short films, single‑scene animations, skeleton-/audio-conditioned generation, cartoons, and more.
- **Efficient**: Only LoRA adapters are tuned, requiring very little training data: anyone can make their own SVI easily.
## 📦 Resources
| **Model** | **Task** | **Input** | **Output** | **Hugging Face Link** | **Comments** |
|-------|------|-------|--------|-------------------|------------------|
| **ALL** | Infinite possibility | Image + X | X video | [🤗 Folder](https://huggingface.co/vita-video-gen/svi-model/tree/main/version-1.0) |Family bucket! I want to play with all! |
| **SVI-Shot** | Single-scene generation | Image + Text prompt | Long video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-shot.safetensors?download=true) | Generate consistent long video with 1 text prompt. (This will never drift) |
| **SVI-Film** | Multi-scene generation | Image + Text prompt stream | Film-style video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-film.safetensors?download=true) |  Generate creative long video with 1 text prompt stream (5 second per text). |
| **SVI-Film (Transition)** |  Multi-scene generation | Image + Text prompt stream | Film-style video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-film-transitions.safetensors?download=true) |Generate creative long video with 1 text prompt stream. (More scene transitions due to the training data) |
| **SVI-Tom&Jerry** | Cartoon animation | Image  | Cartoon video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-tom.safetensors?download=true) | Generate creative long cartoon videos with 1 text prompt stream (This will never drift in our 20 min test)|
| **SVI-Talk** | Talking head | Image + Audio | Talking video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-talk.safetensors?download=true) |Generate long videos with audio-conditioned human speaking |
| **SVI-Dance** | Dancing animation | Image + Skeleton | Dance video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-dance.safetensors?download=true) | Generate long videos with skeleton-conditioned human dancing  |

Note: If you want to play with T2V, you can directly use SVI with an image generated by any T2I model!

## 📝 Citation
If you find our work helpful for your research, please consider citing our paper. Thank you so much!

```bibtex
@article{li2025stable,
      title={Stable Video Infinity: Infinite-Length Video Generation with Error Recycling}, 
      author={Wuyang Li and Wentao Pan and Po-Chien Luan and Yang Gao and Alexandre Alahi},
      journal={arXiv preprint arXiv: arXiv:2510.09212},
      year={2025},
      url={https://huggingface.co/papers/2510.09212},
}
```