foggyforest commited on
Commit
0eee5e8
·
verified ·
1 Parent(s): 62d78a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -9
README.md CHANGED
@@ -13,7 +13,7 @@ tags:
13
 
14
  <h1 align="center">UniMoE-Audio</h1>
15
 
16
- **UniMoE-Audio is a unified framework that seamlessly combines speech and music generation. Powered by a novel Dynamic-Capacity Mixture-of-Experts architecture. **
17
 
18
  <div align="center" style="display: flex; justify-content: center; margin-top: 10px;">
19
  <a href="https://mukioxun.github.io/Uni-MoE-site/home.html"><img src="https://img.shields.io/badge/📰 -Website-228B22" style="margin-right: 5px;"></a>
@@ -21,7 +21,6 @@ tags:
21
  </div>
22
 
23
 
24
-
25
  ## Model Information
26
  - **Base Model**: Qwen2.5-VL with MoE extensions
27
  - **Audio Codec**: DAC (Descript Audio Codec) with 12 channels
@@ -42,21 +41,35 @@ tags:
42
  - [x] Technical Report: [UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE](https://arxiv.org/abs/2510.13344)
43
 
44
  ## Evaluation
 
45
  ### Speech Synthesis
46
  ![Speech Synthesis](./imgs/Speech_Generation.png)
 
47
  ### Text to Music Generation
48
  ![Text to Music Generation](./imgs/T2M.png)
 
49
  ### Video-Text to Music Generation
50
  ![Video-Text to Music Generation](./imgs/VT2M.png)
51
 
52
  ## Requirements
53
- We recommend using conda to install the environment.
54
- ```bash
55
- conda env create -f configs/enviroment.yml # add -n for your name
56
- conda activate unimoe-audio # default name
 
 
 
 
 
 
 
 
 
 
 
 
57
  ```
58
- A `dac model` is also required to be downloaded in '/path/to/UniMoE-Audio/utils/dac_model'.
59
- It will be automatically downloaded when running the first time.
60
 
61
 
62
  ## Usage
@@ -65,7 +78,7 @@ Here is a code snippet to show you how to use UniMoE-Audio with `transformers`
65
 
66
  ```python
67
  import torch
68
- import deepspeed_utils # This line is important, do not delete
69
  from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
70
 
71
  # Import from utils modules
 
13
 
14
  <h1 align="center">UniMoE-Audio</h1>
15
 
16
+ **UniMoE-Audio** is a unified framework that seamlessly combines speech and music generation. Powered by a novel Dynamic-Capacity Mixture-of-Experts architecture.
17
 
18
  <div align="center" style="display: flex; justify-content: center; margin-top: 10px;">
19
  <a href="https://mukioxun.github.io/Uni-MoE-site/home.html"><img src="https://img.shields.io/badge/📰 -Website-228B22" style="margin-right: 5px;"></a>
 
21
  </div>
22
 
23
 
 
24
  ## Model Information
25
  - **Base Model**: Qwen2.5-VL with MoE extensions
26
  - **Audio Codec**: DAC (Descript Audio Codec) with 12 channels
 
41
  - [x] Technical Report: [UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE](https://arxiv.org/abs/2510.13344)
42
 
43
  ## Evaluation
44
+
45
  ### Speech Synthesis
46
  ![Speech Synthesis](./imgs/Speech_Generation.png)
47
+
48
  ### Text to Music Generation
49
  ![Text to Music Generation](./imgs/T2M.png)
50
+
51
  ### Video-Text to Music Generation
52
  ![Video-Text to Music Generation](./imgs/VT2M.png)
53
 
54
  ## Requirements
55
+
56
+ Since we have used the Qwen2.5VL model, we advise you to install transformers>=4.53.1, or you might encounter the following error:
57
+ ```
58
+ KeyError: 'qwen2_vl'
59
+ ```
60
+ ## Quickstart
61
+
62
+ We use `qwen-vl-utils` to handle various types of visual input. You can install it using the following command:
63
+ ```
64
+ pip install qwen-vl-utils
65
+ ```
66
+
67
+
68
+ We use the Descript Audio Codec (DAC) for audio compression. You can install it using the following command:
69
+ ```
70
+ pip install descript-audio-codec
71
  ```
72
+ The model weight will be automatically downloaded on first run.
 
73
 
74
 
75
  ## Usage
 
78
 
79
  ```python
80
  import torch
81
+ import deepspeed_utils # This line is important, do not delete it
82
  from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
83
 
84
  # Import from utils modules