QuarkAudio
/

QuarkAudio-UniSE

Model card Files Files and versions

xet

Community

Metacebertrunk commited on 13 days ago

Commit

9d10e60

verified ·

1 Parent(s): 142a8e9

Fix modelscope bug

Browse files

Files changed (1) hide show

README.md +12 -20

README.md CHANGED Viewed

@@ -1,21 +1,15 @@
----
-license: apache-2.0
-language:
-- en
-pipeline_tag: audio-to-audio
----
 # UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
 <p align="center">
   <a href="https://arxiv.org/abs/2510.20441">
     <img src="https://img.shields.io/badge/Paper-ArXiv-red.svg" alt="Paper">
   </a>
-  <a href="https://hyyan2k.github.io/UniSE/">
-    <img src="https://img.shields.io/badge/Demo-Page-blue.svg" alt="Demo">
-  </a>
-  <a href="https://huggingface.co/spaces/QuarkAudio/">
     <img src="https://img.shields.io/badge/Model-Hugging%20Face-yellow.svg" alt="Hugging Face">
   </a>
 </p>
 <p align="center">
@@ -29,7 +23,7 @@ pipeline_tag: audio-to-audio
 - 🔄 **End-to-End Compatible**: Integrates WavLM (feature extractor), BiCodec (discrete codec), and LM into one pipeline.
 - 🌍 **Multitask Support**: SE, SR, TSE, SS, and more — all in a single model.
-📄 **Paper**: [arXiv:2510.20441](https://arxiv.org/abs/2510.20441) | 🎤 **Listen**: [Demo Page](https://hyyan2k.github.io/UniSE/) | 🤗 **Model**: [Hugging Face Spaces](https://huggingface.co/spaces/QuarkAudio/)
 ---
@@ -71,7 +65,7 @@ QuarkAudio-UniSE requires three additional **WavLM** and **BiCodec** pre-trained
 cd checkpoints
 bash download.sh
 ```
-Additionally, download WavLM-base-plus.pt from this [URL](https://huggingface.co/microsoft/wavlm-base-plus) and put it at `./ckpt/WavLM-base-plus.pt` .
 Alternatively, you can download them manually and place them in the `./model/bicodec/` directory.
@@ -91,14 +85,14 @@ python ./train.py --config conf/config.yaml
 | `speech_scp_path`        | SCP of clean audio files                                                       |
 | `noise_scp_path`        | SCP of noise audio files
  | `rir_scp_path`        | SCP of rir audio files                                                                       |
-| `mode`           | Task type: `SE` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `TSE` (Target Speaker Extraction), `SS` (Speech Separation). |
 ## Inference
 + Quick start
 The main inference script is **`test.py`**. The inference process consists of two stages:
-1. Extract hidden states from all **WavLM** layers and obtain a single representation by averaging them across layers.
 2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
 ### Running Inference
@@ -119,13 +113,10 @@ Command to run inference:
 python test.py
 ```
-## Results
-Samples processed by UniSE can be found on our [Demo Page](https://github.com/hyyan2k/UniSE/).
 ## Model Checkpoints
-Our pretrained model is available on [Hugging Face](https://huggingface.co/spaces/QuarkAudio/).
 ## Hints
@@ -144,7 +135,8 @@ Our approach focuses on leveraging the LLM's comprehension capabilities to enabl
       url={https://arxiv.org/abs/2510.20441},
 }
 ```
 ## Contact
-For any questions, please contact: `yanhaoyin.yhy@alibaba-inc.com`

 # UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
 <p align="center">
   <a href="https://arxiv.org/abs/2510.20441">
     <img src="https://img.shields.io/badge/Paper-ArXiv-red.svg" alt="Paper">
   </a>
+  <a href="https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/">
     <img src="https://img.shields.io/badge/Model-Hugging%20Face-yellow.svg" alt="Hugging Face">
   </a>
+  <a href="https://www.modelscope.cn/models/QuarkAudio/QuarkAudio-UniSE/">
+    <img src="https://img.shields.io/badge/Model-%20%E9%AD%94%E6%90%AD-orange.svg" alt="ModelScope">
+  </a>
 </p>
 <p align="center">
 - 🔄 **End-to-End Compatible**: Integrates WavLM (feature extractor), BiCodec (discrete codec), and LM into one pipeline.
 - 🌍 **Multitask Support**: SE, SR, TSE, SS, and more — all in a single model.
+📄 **Paper**: [arXiv:2510.20441](https://arxiv.org/abs/2510.20441)  | 🤗 **Model**: [Hugging Face Spaces]https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/)
 ---
 cd checkpoints
 bash download.sh
 ```
+Additionally, download WavLM-Large.pt from this [URL](https://huggingface.co/microsoft/wavlm-base-plus) and put it at `./ckpt/WavLM-Large.pt` .
 Alternatively, you can download them manually and place them in the `./model/bicodec/` directory.
 | `speech_scp_path`        | SCP of clean audio files                                                       |
 | `noise_scp_path`        | SCP of noise audio files
  | `rir_scp_path`        | SCP of rir audio files                                                                       |
+| `mode`           | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `tse` (Target Speaker Extraction), `SS` (Speech Separation). |
 ## Inference
 + Quick start
 The main inference script is **`test.py`**. The inference process consists of two stages:
+1. Extract hidden states from all WavLM layers and obtain a single representation by averaging them across layers.
 2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
 ### Running Inference
 python test.py
 ```
 ## Model Checkpoints
+Our pretrained model is available on [Hugging Face](https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/).
 ## Hints
       url={https://arxiv.org/abs/2510.20441},
 }
 ```
 ## Contact
+For any questions, please contact: `yanhaoyin.yhy@alibaba-inc.com`