Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -91,14 +91,14 @@ python ./train.py --config conf/config.yaml
91
  | `speech_scp_path` | SCP of clean audio files |
92
  | `noise_scp_path` | SCP of noise audio files
93
  | `rir_scp_path` | SCP of rir audio files |
94
- | `mode` | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `se` (Target Speaker Extraction), `SS` (Speech Separation). |
95
 
96
 
97
  ## Inference
98
  + Quick start
99
  The main inference script is **`test.py`**. The inference process consists of two stages:
100
 
101
- 1. Extract the 6th-layer features from WavLM.
102
  2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
103
 
104
  ### Running Inference
@@ -111,7 +111,7 @@ To run test.py, configure the parameters in `./conf/config.yaml`:
111
  | `enroll_duration` | Number of inference iterations. |
112
  | `data_src_dir` | Directory of processed audio files directory. |
113
  | `data_tgt_dir` | Directory of processed audio files directory. |
114
- | `mode` | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `se` (Target Speaker Extraction), `SS` (Speech Separation). |
115
 
116
  Command to run inference:
117
 
@@ -121,7 +121,7 @@ python test.py
121
 
122
  ## Results
123
 
124
- Samples processed by LLaSE-G1 can be found on our [Demo Page](https://github.com/hyyan2k/UniSE/).
125
 
126
  ## Model Checkpoints
127
 
 
91
  | `speech_scp_path` | SCP of clean audio files |
92
  | `noise_scp_path` | SCP of noise audio files
93
  | `rir_scp_path` | SCP of rir audio files |
94
+ | `mode` | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `tse` (Target Speaker Extraction), `SS` (Speech Separation). |
95
 
96
 
97
  ## Inference
98
  + Quick start
99
  The main inference script is **`test.py`**. The inference process consists of two stages:
100
 
101
+ 1. Extract hidden states from all WavLM layers and obtain a single representation by averaging them across layers.
102
  2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **BiCodec**.
103
 
104
  ### Running Inference
 
111
  | `enroll_duration` | Number of inference iterations. |
112
  | `data_src_dir` | Directory of processed audio files directory. |
113
  | `data_tgt_dir` | Directory of processed audio files directory. |
114
+ | `mode` | Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `tse` (Target Speaker Extraction), `SS` (Speech Separation). |
115
 
116
  Command to run inference:
117
 
 
121
 
122
  ## Results
123
 
124
+ Samples processed by UniSE can be found on our [Demo Page](https://github.com/hyyan2k/UniSE/).
125
 
126
  ## Model Checkpoints
127