Update model card: pipeline_tag, library_name, paper link, and content from GitHub
#4
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,13 +1,16 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
-
|
|
|
|
|
|
|
| 6 |
---
|
|
|
|
| 7 |
<div align="center">
|
| 8 |
<h1> Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation </h1>
|
| 9 |
|
| 10 |
-
<a href="https://arxiv.org/abs/2510.01284"><img src="https://img.shields.io/badge/arXiv%20paper-
|
|
|
|
| 11 |
<a href="https://aaxwaz.github.io/Ovi/"><img src="https://img.shields.io/badge/Project_page-More_visualizations-green"></a>
|
| 12 |
<a href="https://huggingface.co/chetwinlow1/Ovi"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=orange"></a>
|
| 13 |
<a href="https://huggingface.co/spaces/akhaliq/Ovi"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
|
|
@@ -36,6 +39,8 @@ Ovi is a veo-3 like, **video+audio generation model** that simultaneously genera
|
|
| 36 |
- **🎬 Video+Audio Generation**: Generate synchronized video and audio content simultaneously
|
| 37 |
- **📝 Flexible Input**: Supports text-only or text+image conditioning
|
| 38 |
- **⏱️ 5-second Videos**: Generates 5-second videos at 24 FPS, area of 720×720, at various aspect ratios (9:16, 16:9, 1:1, etc)
|
|
|
|
|
|
|
| 39 |
|
| 40 |
---
|
| 41 |
## 📋 Todo List
|
|
@@ -46,6 +51,9 @@ Ovi is a veo-3 like, **video+audio generation model** that simultaneously genera
|
|
| 46 |
- [x] Text or Text+Image as input
|
| 47 |
- [x] Gradio application code
|
| 48 |
- [x] Multi-GPU inference with or without the support of sequence parallel
|
|
|
|
|
|
|
|
|
|
| 49 |
- [x] Video creation example prompts and format
|
| 50 |
- [ ] Finetuned model with higher resolution
|
| 51 |
- [ ] Longer video generation
|
|
@@ -129,6 +137,9 @@ OR
|
|
| 129 |
# Optional can specific --output-dir to download to a specific directory
|
| 130 |
# but if a custom directory is used, the inference yaml has to be updated with the custom directory
|
| 131 |
python3 download_weights.py --output-dir <custom_dir>
|
|
|
|
|
|
|
|
|
|
| 132 |
```
|
| 133 |
|
| 134 |
## 🚀 Run Examples
|
|
@@ -156,6 +167,8 @@ slg_layer: 11 # Layer for applying SLG (Skip Layer Gu
|
|
| 156 |
|
| 157 |
# Multi-GPU and Performance
|
| 158 |
sp_size: 1 # Sequence parallelism size. Set equal to number of GPUs used
|
|
|
|
|
|
|
| 159 |
|
| 160 |
# Input Configuration
|
| 161 |
text_prompt: "/path/to/csv" or "your prompt here" # Text prompt OR path to CSV/TSV file with prompts
|
|
@@ -182,7 +195,19 @@ torchrun --nnodes 1 --nproc_per_node 8 inference.py --config-file ovi/configs/in
|
|
| 182 |
```
|
| 183 |
*Use this to run samples in parallel across multiple GPUs for faster processing.*
|
| 184 |
|
| 185 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
### Gradio
|
| 187 |
We provide a simple script to run our model in a gradio UI. It uses the `ckpt_dir` in `ovi/configs/inference/inference_fusion.yaml` to initialize the model
|
| 188 |
```bash
|
|
@@ -197,6 +222,12 @@ OR
|
|
| 197 |
|
| 198 |
# To enable an additional image generation model to generate first frames for I2V, cpu_offload is automatically enabled if image generation model is enabled
|
| 199 |
python3 gradio_app.py --use_image_gen
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
```
|
| 201 |
---
|
| 202 |
|
|
@@ -209,6 +240,30 @@ We would like to thank the following projects:
|
|
| 209 |
|
| 210 |
---
|
| 211 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 212 |
## ⭐ Citation
|
| 213 |
|
| 214 |
If Ovi is helpful, please help to ⭐ the repo.
|
|
@@ -227,4 +282,4 @@ If you find this project useful for your research, please consider citing our [p
|
|
| 227 |
primaryClass={cs.MM},
|
| 228 |
url={https://arxiv.org/abs/2510.01284},
|
| 229 |
}
|
| 230 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
pipeline_tag: any-to-any
|
| 6 |
+
library_name: diffusers
|
| 7 |
---
|
| 8 |
+
|
| 9 |
<div align="center">
|
| 10 |
<h1> Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation </h1>
|
| 11 |
|
| 12 |
+
<a href="https://arxiv.org/abs/2510.01284"><img src="https://img.shields.io/badge/arXiv%20paper-2510.01284-b31b1b.svg"></a>
|
| 13 |
+
<a href="https://github.com/character-ai/Ovi"><img src="https://img.shields.io/badge/Code-GitHub-181717.svg?logo=github"></a>
|
| 14 |
<a href="https://aaxwaz.github.io/Ovi/"><img src="https://img.shields.io/badge/Project_page-More_visualizations-green"></a>
|
| 15 |
<a href="https://huggingface.co/chetwinlow1/Ovi"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=orange"></a>
|
| 16 |
<a href="https://huggingface.co/spaces/akhaliq/Ovi"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
|
|
|
|
| 39 |
- **🎬 Video+Audio Generation**: Generate synchronized video and audio content simultaneously
|
| 40 |
- **📝 Flexible Input**: Supports text-only or text+image conditioning
|
| 41 |
- **⏱️ 5-second Videos**: Generates 5-second videos at 24 FPS, area of 720×720, at various aspect ratios (9:16, 16:9, 1:1, etc)
|
| 42 |
+
- **🎬 Create videos now on wavespeed.ai**: https://wavespeed.ai/models/character-ai/ovi/image-to-video & https://wavespeed.ai/models/character-ai/ovi/text-to-video
|
| 43 |
+
- **🎬 Create videos now on HuggingFace**: https://huggingface.co/spaces/akhaliq/Ovi
|
| 44 |
|
| 45 |
---
|
| 46 |
## 📋 Todo List
|
|
|
|
| 51 |
- [x] Text or Text+Image as input
|
| 52 |
- [x] Gradio application code
|
| 53 |
- [x] Multi-GPU inference with or without the support of sequence parallel
|
| 54 |
+
- [x] fp8 weights and improved memory efficiency (credits to [@rkfg](https://github.com/rkfg))
|
| 55 |
+
- [ ] Improve efficiency of Sequence Parallel implementation
|
| 56 |
+
- [ ] Implement Sharded inference with FSDP
|
| 57 |
- [x] Video creation example prompts and format
|
| 58 |
- [ ] Finetuned model with higher resolution
|
| 59 |
- [ ] Longer video generation
|
|
|
|
| 137 |
# Optional can specific --output-dir to download to a specific directory
|
| 138 |
# but if a custom directory is used, the inference yaml has to be updated with the custom directory
|
| 139 |
python3 download_weights.py --output-dir <custom_dir>
|
| 140 |
+
|
| 141 |
+
# Additionally, if you only have ~ 24Gb of GPU vram, please download the fp8 quantized version of the model, and follow the following instructions in sections below to run with fp8
|
| 142 |
+
wget -O "./ckpts/Ovi/model_fp8_e4m3fn.safetensors" "https://huggingface.co/rkfg/Ovi-fp8_quantized/resolve/main/model_fp8_e4m3fn.safetensors"
|
| 143 |
```
|
| 144 |
|
| 145 |
## 🚀 Run Examples
|
|
|
|
| 167 |
|
| 168 |
# Multi-GPU and Performance
|
| 169 |
sp_size: 1 # Sequence parallelism size. Set equal to number of GPUs used
|
| 170 |
+
cpu_offload: False # CPU offload, will largely reduce peak GPU VRAM but increase end to end runtime by ~20 seconds
|
| 171 |
+
fp8: False # load fp8 version of model, will have quality degradation and will not have speed up in inference time as it still uses bf16 matmuls, but can be paired with cpu_offload=True, to run model with 24Gb of GPU vram
|
| 172 |
|
| 173 |
# Input Configuration
|
| 174 |
text_prompt: "/path/to/csv" or "your prompt here" # Text prompt OR path to CSV/TSV file with prompts
|
|
|
|
| 195 |
```
|
| 196 |
*Use this to run samples in parallel across multiple GPUs for faster processing.*
|
| 197 |
|
| 198 |
+
### Memory & Performance Requirements
|
| 199 |
+
Below are approximate GPU memory requirements for different configurations. Sequence parallel implementation will be optimized in the future.
|
| 200 |
+
All End-to-End time calculated based on a 121 frame, 720x720 video, using 50 denoising steps. Minimum GPU vram requirement to run our model is **32Gb**, fp8 parameters is currently supported, reducing peak VRAM usage to **24Gb** with slight quality degradation.
|
| 201 |
+
|
| 202 |
+
| Sequence Parallel Size | FlashAttention-3 Enabled | CPU Offload | With Image Gen Model | Peak VRAM Required | End-to-End Time |
|
| 203 |
+
|-------------------------|---------------------------|-------------|-----------------------|---------------|-----------------|
|
| 204 |
+
| 1 | Yes | No | No | ~80 GB | ~83s |
|
| 205 |
+
| 1 | No | No | No | ~80 GB | ~96s |
|
| 206 |
+
| 1 | Yes | Yes | No | ~80 GB | ~105s |
|
| 207 |
+
| 1 | No | Yes | No | ~32 GB | ~118s |
|
| 208 |
+
| **1** | **Yes** | **Yes** | **Yes** | **~32 GB** | **~140s** |
|
| 209 |
+
| 4 | Yes | No | No | ~80 GB | ~55s |
|
| 210 |
+
| 8 | Yes | No | No | ~80 GB | ~40s |
|
| 211 |
### Gradio
|
| 212 |
We provide a simple script to run our model in a gradio UI. It uses the `ckpt_dir` in `ovi/configs/inference/inference_fusion.yaml` to initialize the model
|
| 213 |
```bash
|
|
|
|
| 222 |
|
| 223 |
# To enable an additional image generation model to generate first frames for I2V, cpu_offload is automatically enabled if image generation model is enabled
|
| 224 |
python3 gradio_app.py --use_image_gen
|
| 225 |
+
|
| 226 |
+
OR
|
| 227 |
+
|
| 228 |
+
# To run model with 24Gb GPU vram
|
| 229 |
+
python3 gradio_app.py --cpu_offload --fp8
|
| 230 |
+
|
| 231 |
```
|
| 232 |
---
|
| 233 |
|
|
|
|
| 240 |
|
| 241 |
---
|
| 242 |
|
| 243 |
+
## 🤝 Collaboration
|
| 244 |
+
|
| 245 |
+
We welcome all types of collaboration! Whether you have feedback, want to contribute, or have any questions, please feel free to reach out.
|
| 246 |
+
|
| 247 |
+
**Contact**: [Weimin Wang](https://linkedin.com/in/weimin-wang-will) for any issues or feedback.
|
| 248 |
+
|
| 249 |
+
|
| 250 |
+
## 🤝 Contributors
|
| 251 |
+
|
| 252 |
+
We thank all contributors who have helped improve Ovi!
|
| 253 |
+
|
| 254 |
+
<div align="center">
|
| 255 |
+
<a href="https://github.com/character-ai/Ovi/graphs/contributors">
|
| 256 |
+
<img src="https://contrib.rocks/image?repo=character-ai/Ovi" />
|
| 257 |
+
</a>
|
| 258 |
+
</div>
|
| 259 |
+
|
| 260 |
+
<br>
|
| 261 |
+
|
| 262 |
+
If you’ve contributed to this repository (code, documentation, issues, etc.), you’re automatically included in the [contributors list](https://github.com/character-ai/Ovi/graphs/contributors).
|
| 263 |
+
|
| 264 |
+
We deeply appreciate your support in advancing open multimodal generation research!
|
| 265 |
+
---
|
| 266 |
+
|
| 267 |
## ⭐ Citation
|
| 268 |
|
| 269 |
If Ovi is helpful, please help to ⭐ the repo.
|
|
|
|
| 282 |
primaryClass={cs.MM},
|
| 283 |
url={https://arxiv.org/abs/2510.01284},
|
| 284 |
}
|
| 285 |
+
```
|