Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ tags:
|
|
| 13 |
|
| 14 |
<h1 align="center">UniMoE-Audio</h1>
|
| 15 |
|
| 16 |
-
**UniMoE-Audio
|
| 17 |
|
| 18 |
<div align="center" style="display: flex; justify-content: center; margin-top: 10px;">
|
| 19 |
<a href="https://mukioxun.github.io/Uni-MoE-site/home.html"><img src="https://img.shields.io/badge/📰 -Website-228B22" style="margin-right: 5px;"></a>
|
|
@@ -21,7 +21,6 @@ tags:
|
|
| 21 |
</div>
|
| 22 |
|
| 23 |
|
| 24 |
-
|
| 25 |
## Model Information
|
| 26 |
- **Base Model**: Qwen2.5-VL with MoE extensions
|
| 27 |
- **Audio Codec**: DAC (Descript Audio Codec) with 12 channels
|
|
@@ -42,21 +41,35 @@ tags:
|
|
| 42 |
- [x] Technical Report: [UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE](https://arxiv.org/abs/2510.13344)
|
| 43 |
|
| 44 |
## Evaluation
|
|
|
|
| 45 |
### Speech Synthesis
|
| 46 |

|
|
|
|
| 47 |
### Text to Music Generation
|
| 48 |

|
|
|
|
| 49 |
### Video-Text to Music Generation
|
| 50 |

|
| 51 |
|
| 52 |
## Requirements
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
```
|
| 58 |
-
|
| 59 |
-
It will be automatically downloaded when running the first time.
|
| 60 |
|
| 61 |
|
| 62 |
## Usage
|
|
@@ -65,7 +78,7 @@ Here is a code snippet to show you how to use UniMoE-Audio with `transformers`
|
|
| 65 |
|
| 66 |
```python
|
| 67 |
import torch
|
| 68 |
-
import deepspeed_utils # This line is important, do not delete
|
| 69 |
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
|
| 70 |
|
| 71 |
# Import from utils modules
|
|
|
|
| 13 |
|
| 14 |
<h1 align="center">UniMoE-Audio</h1>
|
| 15 |
|
| 16 |
+
**UniMoE-Audio** is a unified framework that seamlessly combines speech and music generation. Powered by a novel Dynamic-Capacity Mixture-of-Experts architecture.
|
| 17 |
|
| 18 |
<div align="center" style="display: flex; justify-content: center; margin-top: 10px;">
|
| 19 |
<a href="https://mukioxun.github.io/Uni-MoE-site/home.html"><img src="https://img.shields.io/badge/📰 -Website-228B22" style="margin-right: 5px;"></a>
|
|
|
|
| 21 |
</div>
|
| 22 |
|
| 23 |
|
|
|
|
| 24 |
## Model Information
|
| 25 |
- **Base Model**: Qwen2.5-VL with MoE extensions
|
| 26 |
- **Audio Codec**: DAC (Descript Audio Codec) with 12 channels
|
|
|
|
| 41 |
- [x] Technical Report: [UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE](https://arxiv.org/abs/2510.13344)
|
| 42 |
|
| 43 |
## Evaluation
|
| 44 |
+
|
| 45 |
### Speech Synthesis
|
| 46 |

|
| 47 |
+
|
| 48 |
### Text to Music Generation
|
| 49 |

|
| 50 |
+
|
| 51 |
### Video-Text to Music Generation
|
| 52 |

|
| 53 |
|
| 54 |
## Requirements
|
| 55 |
+
|
| 56 |
+
Since we have used the Qwen2.5VL model, we advise you to install transformers>=4.53.1, or you might encounter the following error:
|
| 57 |
+
```
|
| 58 |
+
KeyError: 'qwen2_vl'
|
| 59 |
+
```
|
| 60 |
+
## Quickstart
|
| 61 |
+
|
| 62 |
+
We use `qwen-vl-utils` to handle various types of visual input. You can install it using the following command:
|
| 63 |
+
```
|
| 64 |
+
pip install qwen-vl-utils
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
We use the Descript Audio Codec (DAC) for audio compression. You can install it using the following command:
|
| 69 |
+
```
|
| 70 |
+
pip install descript-audio-codec
|
| 71 |
```
|
| 72 |
+
The model weight will be automatically downloaded on first run.
|
|
|
|
| 73 |
|
| 74 |
|
| 75 |
## Usage
|
|
|
|
| 78 |
|
| 79 |
```python
|
| 80 |
import torch
|
| 81 |
+
import deepspeed_utils # This line is important, do not delete it
|
| 82 |
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
|
| 83 |
|
| 84 |
# Import from utils modules
|