Update model card: pipeline_tag, library_name, paper link, and content from GitHub

#4
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +60 -5
README.md CHANGED
@@ -1,13 +1,16 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
- pipeline_tag: image-to-video
 
 
6
  ---
 
7
  <div align="center">
8
  <h1> Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation </h1>
9
 
10
- <a href="https://arxiv.org/abs/2510.01284"><img src="https://img.shields.io/badge/arXiv%20paper-2509.08519-b31b1b.svg"></a>
 
11
  <a href="https://aaxwaz.github.io/Ovi/"><img src="https://img.shields.io/badge/Project_page-More_visualizations-green"></a>
12
  <a href="https://huggingface.co/chetwinlow1/Ovi"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=orange"></a>
13
  <a href="https://huggingface.co/spaces/akhaliq/Ovi"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
@@ -36,6 +39,8 @@ Ovi is a veo-3 like, **video+audio generation model** that simultaneously genera
36
  - **🎬 Video+Audio Generation**: Generate synchronized video and audio content simultaneously
37
  - **📝 Flexible Input**: Supports text-only or text+image conditioning
38
  - **⏱️ 5-second Videos**: Generates 5-second videos at 24 FPS, area of 720×720, at various aspect ratios (9:16, 16:9, 1:1, etc)
 
 
39
 
40
  ---
41
  ## 📋 Todo List
@@ -46,6 +51,9 @@ Ovi is a veo-3 like, **video+audio generation model** that simultaneously genera
46
  - [x] Text or Text+Image as input
47
  - [x] Gradio application code
48
  - [x] Multi-GPU inference with or without the support of sequence parallel
 
 
 
49
  - [x] Video creation example prompts and format
50
  - [ ] Finetuned model with higher resolution
51
  - [ ] Longer video generation
@@ -129,6 +137,9 @@ OR
129
  # Optional can specific --output-dir to download to a specific directory
130
  # but if a custom directory is used, the inference yaml has to be updated with the custom directory
131
  python3 download_weights.py --output-dir <custom_dir>
 
 
 
132
  ```
133
 
134
  ## 🚀 Run Examples
@@ -156,6 +167,8 @@ slg_layer: 11 # Layer for applying SLG (Skip Layer Gu
156
 
157
  # Multi-GPU and Performance
158
  sp_size: 1 # Sequence parallelism size. Set equal to number of GPUs used
 
 
159
 
160
  # Input Configuration
161
  text_prompt: "/path/to/csv" or "your prompt here" # Text prompt OR path to CSV/TSV file with prompts
@@ -182,7 +195,19 @@ torchrun --nnodes 1 --nproc_per_node 8 inference.py --config-file ovi/configs/in
182
  ```
183
  *Use this to run samples in parallel across multiple GPUs for faster processing.*
184
 
185
-
 
 
 
 
 
 
 
 
 
 
 
 
186
  ### Gradio
187
  We provide a simple script to run our model in a gradio UI. It uses the `ckpt_dir` in `ovi/configs/inference/inference_fusion.yaml` to initialize the model
188
  ```bash
@@ -197,6 +222,12 @@ OR
197
 
198
  # To enable an additional image generation model to generate first frames for I2V, cpu_offload is automatically enabled if image generation model is enabled
199
  python3 gradio_app.py --use_image_gen
 
 
 
 
 
 
200
  ```
201
  ---
202
 
@@ -209,6 +240,30 @@ We would like to thank the following projects:
209
 
210
  ---
211
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
  ## ⭐ Citation
213
 
214
  If Ovi is helpful, please help to ⭐ the repo.
@@ -227,4 +282,4 @@ If you find this project useful for your research, please consider citing our [p
227
  primaryClass={cs.MM},
228
  url={https://arxiv.org/abs/2510.01284},
229
  }
230
- ```
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
+ pipeline_tag: any-to-any
6
+ library_name: diffusers
7
  ---
8
+
9
  <div align="center">
10
  <h1> Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation </h1>
11
 
12
+ <a href="https://arxiv.org/abs/2510.01284"><img src="https://img.shields.io/badge/arXiv%20paper-2510.01284-b31b1b.svg"></a>
13
+ <a href="https://github.com/character-ai/Ovi"><img src="https://img.shields.io/badge/Code-GitHub-181717.svg?logo=github"></a>
14
  <a href="https://aaxwaz.github.io/Ovi/"><img src="https://img.shields.io/badge/Project_page-More_visualizations-green"></a>
15
  <a href="https://huggingface.co/chetwinlow1/Ovi"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=orange"></a>
16
  <a href="https://huggingface.co/spaces/akhaliq/Ovi"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
 
39
  - **🎬 Video+Audio Generation**: Generate synchronized video and audio content simultaneously
40
  - **📝 Flexible Input**: Supports text-only or text+image conditioning
41
  - **⏱️ 5-second Videos**: Generates 5-second videos at 24 FPS, area of 720×720, at various aspect ratios (9:16, 16:9, 1:1, etc)
42
+ - **🎬 Create videos now on wavespeed.ai**: https://wavespeed.ai/models/character-ai/ovi/image-to-video & https://wavespeed.ai/models/character-ai/ovi/text-to-video
43
+ - **🎬 Create videos now on HuggingFace**: https://huggingface.co/spaces/akhaliq/Ovi
44
 
45
  ---
46
  ## 📋 Todo List
 
51
  - [x] Text or Text+Image as input
52
  - [x] Gradio application code
53
  - [x] Multi-GPU inference with or without the support of sequence parallel
54
+ - [x] fp8 weights and improved memory efficiency (credits to [@rkfg](https://github.com/rkfg))
55
+ - [ ] Improve efficiency of Sequence Parallel implementation
56
+ - [ ] Implement Sharded inference with FSDP
57
  - [x] Video creation example prompts and format
58
  - [ ] Finetuned model with higher resolution
59
  - [ ] Longer video generation
 
137
  # Optional can specific --output-dir to download to a specific directory
138
  # but if a custom directory is used, the inference yaml has to be updated with the custom directory
139
  python3 download_weights.py --output-dir <custom_dir>
140
+
141
+ # Additionally, if you only have ~ 24Gb of GPU vram, please download the fp8 quantized version of the model, and follow the following instructions in sections below to run with fp8
142
+ wget -O "./ckpts/Ovi/model_fp8_e4m3fn.safetensors" "https://huggingface.co/rkfg/Ovi-fp8_quantized/resolve/main/model_fp8_e4m3fn.safetensors"
143
  ```
144
 
145
  ## 🚀 Run Examples
 
167
 
168
  # Multi-GPU and Performance
169
  sp_size: 1 # Sequence parallelism size. Set equal to number of GPUs used
170
+ cpu_offload: False # CPU offload, will largely reduce peak GPU VRAM but increase end to end runtime by ~20 seconds
171
+ fp8: False # load fp8 version of model, will have quality degradation and will not have speed up in inference time as it still uses bf16 matmuls, but can be paired with cpu_offload=True, to run model with 24Gb of GPU vram
172
 
173
  # Input Configuration
174
  text_prompt: "/path/to/csv" or "your prompt here" # Text prompt OR path to CSV/TSV file with prompts
 
195
  ```
196
  *Use this to run samples in parallel across multiple GPUs for faster processing.*
197
 
198
+ ### Memory & Performance Requirements
199
+ Below are approximate GPU memory requirements for different configurations. Sequence parallel implementation will be optimized in the future.
200
+ All End-to-End time calculated based on a 121 frame, 720x720 video, using 50 denoising steps. Minimum GPU vram requirement to run our model is **32Gb**, fp8 parameters is currently supported, reducing peak VRAM usage to **24Gb** with slight quality degradation.
201
+
202
+ | Sequence Parallel Size | FlashAttention-3 Enabled | CPU Offload | With Image Gen Model | Peak VRAM Required | End-to-End Time |
203
+ |-------------------------|---------------------------|-------------|-----------------------|---------------|-----------------|
204
+ | 1 | Yes | No | No | ~80 GB | ~83s |
205
+ | 1 | No | No | No | ~80 GB | ~96s |
206
+ | 1 | Yes | Yes | No | ~80 GB | ~105s |
207
+ | 1 | No | Yes | No | ~32 GB | ~118s |
208
+ | **1** | **Yes** | **Yes** | **Yes** | **~32 GB** | **~140s** |
209
+ | 4 | Yes | No | No | ~80 GB | ~55s |
210
+ | 8 | Yes | No | No | ~80 GB | ~40s |
211
  ### Gradio
212
  We provide a simple script to run our model in a gradio UI. It uses the `ckpt_dir` in `ovi/configs/inference/inference_fusion.yaml` to initialize the model
213
  ```bash
 
222
 
223
  # To enable an additional image generation model to generate first frames for I2V, cpu_offload is automatically enabled if image generation model is enabled
224
  python3 gradio_app.py --use_image_gen
225
+
226
+ OR
227
+
228
+ # To run model with 24Gb GPU vram
229
+ python3 gradio_app.py --cpu_offload --fp8
230
+
231
  ```
232
  ---
233
 
 
240
 
241
  ---
242
 
243
+ ## 🤝 Collaboration
244
+
245
+ We welcome all types of collaboration! Whether you have feedback, want to contribute, or have any questions, please feel free to reach out.
246
+
247
+ **Contact**: [Weimin Wang](https://linkedin.com/in/weimin-wang-will) for any issues or feedback.
248
+
249
+
250
+ ## 🤝 Contributors
251
+
252
+ We thank all contributors who have helped improve Ovi!
253
+
254
+ <div align="center">
255
+ <a href="https://github.com/character-ai/Ovi/graphs/contributors">
256
+ <img src="https://contrib.rocks/image?repo=character-ai/Ovi" />
257
+ </a>
258
+ </div>
259
+
260
+ <br>
261
+
262
+ If you’ve contributed to this repository (code, documentation, issues, etc.), you’re automatically included in the [contributors list](https://github.com/character-ai/Ovi/graphs/contributors).
263
+
264
+ We deeply appreciate your support in advancing open multimodal generation research!
265
+ ---
266
+
267
  ## ⭐ Citation
268
 
269
  If Ovi is helpful, please help to ⭐ the repo.
 
282
  primaryClass={cs.MM},
283
  url={https://arxiv.org/abs/2510.01284},
284
  }
285
+ ```