ASLP-lab
/

LLaSE-G1

Audio-to-Audio

Model card Files Files and versions

xet

Community

ASLP-lab commited on Mar 3, 2025

Commit

87723f3

verified ·

1 Parent(s): 546e0c5

Update README.md

Browse files

Files changed (1) hide show

README.md +104 -112

README.md CHANGED Viewed

@@ -1,112 +1,104 @@
----
-license: apache-2.0
-language:
-- zh
-- en
-- es
-- de
-- fr
-pipeline_tag: audio-to-audio
----
-# LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
-## Introduction
-LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
-- **Noise Suppression (SE)**
-- **Target Speaker Extraction (TSE)**
-- **Packet Loss Concealment (PLC)**
-- **Acoustic Echo Cancellation (AEC)**
-- **Speech Separation (SS)**
-To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
-For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
-## Demo
-You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
-## Installation
-### 1. Clone the repository
-```bash
-git clone https://github.com/your-repo/LLaSE-G1.git
-cd LLaSE-G1
-```
-### 2. Create a Conda environment and install dependencies
-```bash
-conda create -n llase python=3.10
-conda activate llase
-pip install -r requirements.txt
-```
-### 3. Download Pretrained Models
-LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
-```bash
-bash ./ckpt/download.sh
-```
-Alternatively, you can download them manually and place them in the `./ckpt/` directory.
-## Inference
-The main inference script is **`inference.py`**. The inference process consists of two stages:
-1. Extract the 6th-layer features from WavLM.
-2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
-### Running Inference
-To run inference, configure the parameters in `./config/test.yml`:
-| Parameter        | Description                                                                                                                                                            |
-| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `infer_feat_too` | Whether to extract WavLM features during inference.                                                                                                                    |
-| `inference_time` | Number of inference iterations.                                                                                                                                        |
-| `feat_dir`       | Directory containing extracted features.                                                                                                                               |
-| `wav_dir`        | Directory of processed audio files.                                                                                                                                    |
-| `task`           | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
-Command to run inference:
-```bash
-bash inference.sh
-```
-## Results
-Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
-## Model Checkpoints
-Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
-## Citation
-If you find this work useful, please cite our paper:
-```bibtex
-@article{yourpaper2025,
-  title={LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement},
-  author={Your Name and Others},
-  journal={ACL},
-  year={2025}
-}
-```
-## License
-This project is released under the **Apache-2.0**.
-## Contact
-For any questions, please contact: `beaukang02@gmail.com`

+---
+license: apache-2.0
+language:
+- zh
+- en
+- es
+- de
+- fr
+pipeline_tag: audio-to-audio
+---
+# LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
+## Introduction
+LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
+- **Noise Suppression (SE)**
+- **Target Speaker Extraction (TSE)**
+- **Packet Loss Concealment (PLC)**
+- **Acoustic Echo Cancellation (AEC)**
+- **Speech Separation (SS)**
+To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
+For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
+## Demo
+You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
+## Installation
+### 1. Clone the repository
+```bash
+git clone https://github.com/your-repo/LLaSE-G1.git
+cd LLaSE-G1
+```
+### 2. Create a Conda environment and install dependencies
+```bash
+conda create -n llase python=3.10
+conda activate llase
+pip install -r requirements.txt
+```
+### 3. Download Pretrained Models
+LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
+```bash
+bash ./ckpt/download.sh
+```
+Alternatively, you can download them manually and place them in the `./ckpt/` directory.
+## Inference
+The main inference script is **`inference.py`**. The inference process consists of two stages:
+1. Extract the 6th-layer features from WavLM.
+2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
+### Running Inference
+To run inference, configure the parameters in `./config/test.yml`:
+| Parameter        | Description                                                                                                                                                            |
+| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `infer_feat_too` | Whether to extract WavLM features during inference.                                                                                                                    |
+| `inference_time` | Number of inference iterations.                                                                                                                                        |
+| `feat_dir`       | Directory containing extracted features.                                                                                                                               |
+| `wav_dir`        | Directory of processed audio files.                                                                                                                                    |
+| `task`           | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
+Command to run inference:
+```bash
+bash inference.sh
+```
+## Results
+Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
+## Model Checkpoints
+Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
+## Citation
+If you find this work useful, please cite our paper:
+## License
+This project is released under the **Apache-2.0**.
+## Contact
+For any questions, please contact: `beaukang02@gmail.com`