Update README.md
Browse files
README.md
CHANGED
|
@@ -8,16 +8,13 @@ pipeline_tag: text-to-audio
|
|
| 8 |
|
| 9 |
**Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation**
|
| 10 |
|
| 11 |
-
[GitHub Code](
|
| 12 |
|
| 13 |
## Overview
|
| 14 |
|
| 15 |
This repository packages the public inference checkpoint set for **Foley-Omni**.
|
| 16 |
The release focuses on **Video-to-Soundtrack (V2ST)** generation, where the model jointly generates synchronized **speech**, **sound effects**, and **music** from a video and optional text prompt.
|
| 17 |
|
| 18 |
-
The main model checkpoint in this release is an inference-only export from:
|
| 19 |
-
|
| 20 |
-
- `ckpts/v3_fintune_final/checkpoints/model-epoch-000010.pth`
|
| 21 |
|
| 22 |
## Repository Contents
|
| 23 |
|
|
@@ -68,28 +65,3 @@ This repository redistributes a small subset of files from the following upstrea
|
|
| 68 |
- **MMAudio**: audio VAE, vocoder, and Synchformer files
|
| 69 |
|
| 70 |
Please refer to the original upstream repositories for their licenses, usage terms, and project details.
|
| 71 |
-
|
| 72 |
-
## Quick Start
|
| 73 |
-
|
| 74 |
-
Use the code repository for inference scripts, configs, examples, and feature extraction tools:
|
| 75 |
-
|
| 76 |
-
- `inference_v2st.py`
|
| 77 |
-
- `inference_v2st.yaml`
|
| 78 |
-
- `examples/video_text_example.json`
|
| 79 |
-
- `data_process/convert_memmap_to_npy.py`
|
| 80 |
-
|
| 81 |
-
Download the packaged checkpoints with:
|
| 82 |
-
|
| 83 |
-
```bash
|
| 84 |
-
hf download CocoBro/Foley-Omni \
|
| 85 |
-
ckpts/Foley-Omni/v2st.pth \
|
| 86 |
-
ckpts/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth \
|
| 87 |
-
ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/special_tokens_map.json \
|
| 88 |
-
ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/spiece.model \
|
| 89 |
-
ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/tokenizer.json \
|
| 90 |
-
ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/tokenizer_config.json \
|
| 91 |
-
ckpts/mmaudio/ext_weights/v1-16.pth \
|
| 92 |
-
ckpts/mmaudio/ext_weights/best_netG.pt \
|
| 93 |
-
ckpts/mmaudio/ext_weights/synchformer_state_dict.pth \
|
| 94 |
-
--local-dir .
|
| 95 |
-
```
|
|
|
|
| 8 |
|
| 9 |
**Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation**
|
| 10 |
|
| 11 |
+
[GitHub Code](https://github.com/NJU-Speech/Foley-Omni) | [arXiv](https://arxiv.org/abs/2606.03672) | [Demo](https://ty0402.github.io/Foley-omni-Web/)
|
| 12 |
|
| 13 |
## Overview
|
| 14 |
|
| 15 |
This repository packages the public inference checkpoint set for **Foley-Omni**.
|
| 16 |
The release focuses on **Video-to-Soundtrack (V2ST)** generation, where the model jointly generates synchronized **speech**, **sound effects**, and **music** from a video and optional text prompt.
|
| 17 |
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
## Repository Contents
|
| 20 |
|
|
|
|
| 65 |
- **MMAudio**: audio VAE, vocoder, and Synchformer files
|
| 66 |
|
| 67 |
Please refer to the original upstream repositories for their licenses, usage terms, and project details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|