CocoBro commited on
Commit
5d3a675
·
verified ·
1 Parent(s): 13a9a53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -29
README.md CHANGED
@@ -8,16 +8,13 @@ pipeline_tag: text-to-audio
8
 
9
  **Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation**
10
 
11
- [GitHub Code](CODE_REPO_LINK) | [arXiv](ARXIV_LINK) | [Demo](DEMO_LINK)
12
 
13
  ## Overview
14
 
15
  This repository packages the public inference checkpoint set for **Foley-Omni**.
16
  The release focuses on **Video-to-Soundtrack (V2ST)** generation, where the model jointly generates synchronized **speech**, **sound effects**, and **music** from a video and optional text prompt.
17
 
18
- The main model checkpoint in this release is an inference-only export from:
19
-
20
- - `ckpts/v3_fintune_final/checkpoints/model-epoch-000010.pth`
21
 
22
  ## Repository Contents
23
 
@@ -68,28 +65,3 @@ This repository redistributes a small subset of files from the following upstrea
68
  - **MMAudio**: audio VAE, vocoder, and Synchformer files
69
 
70
  Please refer to the original upstream repositories for their licenses, usage terms, and project details.
71
-
72
- ## Quick Start
73
-
74
- Use the code repository for inference scripts, configs, examples, and feature extraction tools:
75
-
76
- - `inference_v2st.py`
77
- - `inference_v2st.yaml`
78
- - `examples/video_text_example.json`
79
- - `data_process/convert_memmap_to_npy.py`
80
-
81
- Download the packaged checkpoints with:
82
-
83
- ```bash
84
- hf download CocoBro/Foley-Omni \
85
- ckpts/Foley-Omni/v2st.pth \
86
- ckpts/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth \
87
- ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/special_tokens_map.json \
88
- ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/spiece.model \
89
- ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/tokenizer.json \
90
- ckpts/Wan2.2-TI2V-5B/google/umt5-xxl/tokenizer_config.json \
91
- ckpts/mmaudio/ext_weights/v1-16.pth \
92
- ckpts/mmaudio/ext_weights/best_netG.pt \
93
- ckpts/mmaudio/ext_weights/synchformer_state_dict.pth \
94
- --local-dir .
95
- ```
 
8
 
9
  **Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation**
10
 
11
+ [GitHub Code](https://github.com/NJU-Speech/Foley-Omni) | [arXiv](https://arxiv.org/abs/2606.03672) | [Demo](https://ty0402.github.io/Foley-omni-Web/)
12
 
13
  ## Overview
14
 
15
  This repository packages the public inference checkpoint set for **Foley-Omni**.
16
  The release focuses on **Video-to-Soundtrack (V2ST)** generation, where the model jointly generates synchronized **speech**, **sound effects**, and **music** from a video and optional text prompt.
17
 
 
 
 
18
 
19
  ## Repository Contents
20
 
 
65
  - **MMAudio**: audio VAE, vocoder, and Synchformer files
66
 
67
  Please refer to the original upstream repositories for their licenses, usage terms, and project details.