Metacebertrunk commited on
Commit
9d10e60
Β·
verified Β·
1 Parent(s): 142a8e9

Fix modelscope bug

Browse files
Files changed (1) hide show
  1. README.md +12 -20
README.md CHANGED
@@ -1,21 +1,15 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- pipeline_tag: audio-to-audio
6
- ---
7
  # UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
8
 
9
  <p align="center">
10
  <a href="https://arxiv.org/abs/2510.20441">
11
  <img src="https://img.shields.io/badge/Paper-ArXiv-red.svg" alt="Paper">
12
  </a>
13
- <a href="https://hyyan2k.github.io/UniSE/">
14
- <img src="https://img.shields.io/badge/Demo-Page-blue.svg" alt="Demo">
15
- </a>
16
- <a href="https://huggingface.co/spaces/QuarkAudio/">
17
  <img src="https://img.shields.io/badge/Model-Hugging%20Face-yellow.svg" alt="Hugging Face">
18
  </a>
 
 
 
19
  </p>
20
 
21
  <p align="center">
@@ -29,7 +23,7 @@ pipeline_tag: audio-to-audio
29
  - πŸ”„ **End-to-End Compatible**: Integrates WavLM (feature extractor), BiCodec (discrete codec), and LM into one pipeline.
30
  - 🌍 **Multitask Support**: SE, SR, TSE, SS, and more β€” all in a single model.
31
 
32
- πŸ“„ **Paper**: [arXiv:2510.20441](https://arxiv.org/abs/2510.20441) | 🎀 **Listen**: [Demo Page](https://hyyan2k.github.io/UniSE/) | πŸ€— **Model**: [Hugging Face Spaces](https://huggingface.co/spaces/QuarkAudio/)
33
 
34
  ---
35
 
@@ -71,7 +65,7 @@ QuarkAudio-UniSE requires three additional **WavLM** and **BiCodec** pre-trained
71
  cd checkpoints
72
  bash download.sh
73
  ```
74
- Additionally, download WavLM-base-plus.pt from this [URL](https://huggingface.co/microsoft/wavlm-base-plus) and put it at `./ckpt/WavLM-base-plus.pt` .
75
 
76
  Alternatively, you can download them manually and place them in the `./model/bicodec/` directory.
77
 
@@ -91,14 +85,14 @@ python ./train.py --config conf/config.yaml
91
  | `speech_scp_path` | SCP of clean audio files |
92
  | `noise_scp_path` | SCP of noise audio files
93
  | `rir_scp_path` | SCP of rir audio files |
94
- | `mode` | Task type: `SE` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `TSE` (Target Speaker Extraction), `SS` (Speech Separation). |
95
 
96
 
97
  ## Inference
98
  + Quick start
99
  The main inference script is **`test.py`**. The inference process consists of two stages:
100
 
101
- 1. Extract hidden states from all **WavLM** layers and obtain a single representation by averaging them across layers.
102
  2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
103
 
104
  ### Running Inference
@@ -119,13 +113,10 @@ Command to run inference:
119
  python test.py
120
  ```
121
 
122
- ## Results
123
-
124
- Samples processed by UniSE can be found on our [Demo Page](https://github.com/hyyan2k/UniSE/).
125
 
126
  ## Model Checkpoints
127
 
128
- Our pretrained model is available on [Hugging Face](https://huggingface.co/spaces/QuarkAudio/).
129
 
130
  ## Hints
131
 
@@ -144,7 +135,8 @@ Our approach focuses on leveraging the LLM's comprehension capabilities to enabl
144
  url={https://arxiv.org/abs/2510.20441},
145
  }
146
  ```
147
-
148
 
149
  ## Contact
150
- For any questions, please contact: `yanhaoyin.yhy@alibaba-inc.com`
 
 
 
 
 
 
 
 
1
  # UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
2
 
3
  <p align="center">
4
  <a href="https://arxiv.org/abs/2510.20441">
5
  <img src="https://img.shields.io/badge/Paper-ArXiv-red.svg" alt="Paper">
6
  </a>
7
+ <a href="https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/">
 
 
 
8
  <img src="https://img.shields.io/badge/Model-Hugging%20Face-yellow.svg" alt="Hugging Face">
9
  </a>
10
+ <a href="https://www.modelscope.cn/models/QuarkAudio/QuarkAudio-UniSE/">
11
+ <img src="https://img.shields.io/badge/Model-%20%E9%AD%94%E6%90%AD-orange.svg" alt="ModelScope">
12
+ </a>
13
  </p>
14
 
15
  <p align="center">
 
23
  - πŸ”„ **End-to-End Compatible**: Integrates WavLM (feature extractor), BiCodec (discrete codec), and LM into one pipeline.
24
  - 🌍 **Multitask Support**: SE, SR, TSE, SS, and more β€” all in a single model.
25
 
26
+ πŸ“„ **Paper**: [arXiv:2510.20441](https://arxiv.org/abs/2510.20441) | πŸ€— **Model**: [Hugging Face Spaces]https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/)
27
 
28
  ---
29
 
 
65
  cd checkpoints
66
  bash download.sh
67
  ```
68
+ Additionally, download WavLM-Large.pt from this [URL](https://huggingface.co/microsoft/wavlm-base-plus) and put it at `./ckpt/WavLM-Large.pt` .
69
 
70
  Alternatively, you can download them manually and place them in the `./model/bicodec/` directory.
71
 
 
85
  | `speech_scp_path` | SCP of clean audio files |
86
  | `noise_scp_path` | SCP of noise audio files
87
  | `rir_scp_path` | SCP of rir audio files |
88
+ | `mode` | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `tse` (Target Speaker Extraction), `SS` (Speech Separation). |
89
 
90
 
91
  ## Inference
92
  + Quick start
93
  The main inference script is **`test.py`**. The inference process consists of two stages:
94
 
95
+ 1. Extract hidden states from all WavLM layers and obtain a single representation by averaging them across layers.
96
  2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
97
 
98
  ### Running Inference
 
113
  python test.py
114
  ```
115
 
 
 
 
116
 
117
  ## Model Checkpoints
118
 
119
+ Our pretrained model is available on [Hugging Face](https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/).
120
 
121
  ## Hints
122
 
 
135
  url={https://arxiv.org/abs/2510.20441},
136
  }
137
  ```
138
+
139
 
140
  ## Contact
141
+ For any questions, please contact: `yanhaoyin.yhy@alibaba-inc.com`
142
+