QuarkAudio
/

QuarkAudio-UniSE

Model card Files Files and versions

xet

Community

Update README.md

by liuyinghao - opened Dec 22, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-4

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -91,14 +91,14 @@ python ./train.py --config conf/config.yaml
 | `speech_scp_path`        | SCP of clean audio files                                                       |
 | `noise_scp_path`        | SCP of noise audio files
  | `rir_scp_path`        | SCP of rir audio files                                                                       |
-| `mode`           | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `se` (Target Speaker Extraction), `SS` (Speech Separation). |
 ## Inference
 + Quick start
 The main inference script is **`test.py`**. The inference process consists of two stages:
-1. Extract the 6th-layer features from WavLM.
 2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
 ### Running Inference
@@ -111,7 +111,7 @@ To run test.py, configure the parameters in `./conf/config.yaml`:
 | `enroll_duration` | Number of inference iterations.                                                                                                                                        |
 | `data_src_dir`        | Directory of processed audio files directory.                                                        |
 | `data_tgt_dir`        | Directory of processed audio files directory.                                                                                                                                    |
-| `mode`           | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `se` (Target Speaker Extraction), `SS` (Speech Separation). |
 Command to run inference:
@@ -121,7 +121,7 @@ python test.py
 ## Results
-Samples processed by LLaSE-G1 can be found on our [Demo Page](https://github.com/hyyan2k/UniSE/).
 ## Model Checkpoints

 | `speech_scp_path`        | SCP of clean audio files                                                       |
 | `noise_scp_path`        | SCP of noise audio files
  | `rir_scp_path`        | SCP of rir audio files                                                                       |
+| `mode`           | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `tse` (Target Speaker Extraction), `SS` (Speech Separation). |
 ## Inference
 + Quick start
 The main inference script is **`test.py`**. The inference process consists of two stages:
+1. Extract hidden states from all WavLM layers and obtain a single representation by averaging them across layers.
 2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
 ### Running Inference
 | `enroll_duration` | Number of inference iterations.                                                                                                                                        |
 | `data_src_dir`        | Directory of processed audio files directory.                                                        |
 | `data_tgt_dir`        | Directory of processed audio files directory.                                                                                                                                    |
+| `mode`           | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `tse` (Target Speaker Extraction), `SS` (Speech Separation). |
 Command to run inference:
 ## Results
+Samples processed by UniSE can be found on our [Demo Page](https://github.com/hyyan2k/UniSE/).
 ## Model Checkpoints