xxx123456 commited on
Commit
a5b40b7
Β·
verified Β·
1 Parent(s): 388a7df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -3
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ # πŸŽ™οΈ SimWhisper-Codec
4
+
5
+ ### Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding
6
+
7
+ <p>
8
+ <a href="https://zhangxinwhut.github.io/SimWhisper-Codec/"><img src="https://img.shields.io/badge/🎧_Demo-Online-brightgreen" alt="Demo"></a>
9
+ <a href="https://arxiv.org/pdf/2510.20504"><img src="https://img.shields.io/badge/Paper-Arxiv-red" alt="paper"></a>
10
+ <a href="https://huggingface.co/xxx123456/SimWhisper_Codec"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Page-yellow" alt="Hugging Face"></a>
11
+ </p>
12
+
13
+ *A semantic-first speech codec that achieves superior performance through architectural simplification rather than complex supervision.*
14
+
15
+ </div>
16
+
17
+ ---
18
+
19
+ ## ✨ Highlights
20
+
21
+ - πŸš€ **low Bitrate**: Only **1.1 kbps** at 16 kHz sampling rate
22
+ - πŸ”Š **High Quality Speech Reconstruction**: Achieving UTMOS 4.00 WER 2.75 (hubert-large-ls960-ft) sim 0.83 (wavlm_large_finetune) stoi 0.93 pesq-nb 3.29 pesq-wb 2.72 on librispeech-test-clean reconstruction (gt: WER 2.16 UTMOS 4.09)
23
+ - 🧊 **Frozen Encoder**: No fine-tuning of Whisper encoder required
24
+ - ⚑ **Simple & Efficient**: Architectural simplification over complex supervision
25
+
26
+ ## πŸ“Š Performance
27
+
28
+ | Model | Bitrate | WER ↓ | PESQ-NB ↑ | PESQ-WB ↑ | STOI ↑ | SIM ↑ | UTMOS ↑ |
29
+ |:------|:-------:|:-----:|:---------:|:---------:|:------:|:-----:|:-------:|
30
+ | XCodec2.0 | 0.8 kbps | 2.61 | 3.04 | 2.43 | 0.92 | 0.82 | **4.13** |
31
+ | XY-Tokenizer | 1.0 kbps | **2.46** | 3.00 | 2.41 | 0.91 | **0.84** | 3.98 |
32
+ | **SimWhisper-Codec** | 1.1 kbps | 2.75 | **3.29** | **2.72** | **0.93** | 0.83 | 4.00 |
33
+
34
+ *Evaluated on LibriSpeech test-clean*
35
+
36
+ ## πŸš€ Quick Start
37
+
38
+ ### Installation
39
+
40
+ ```bash
41
+ # Clone repository
42
+ git clone https://github.com/ZhangXinWhut/SimWhisper-Codec.git && cd SimWhisper-Codec
43
+
44
+ # Create and activate conda environment
45
+ conda create -n swcodec python=3.10 -y && conda activate swcodec
46
+
47
+ # Install dependencies
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ ## Available Models πŸ—‚οΈ
52
+
53
+ | Model Name | Hugging Face | Training Data |
54
+ |:----------:|:-------------:|:---------------:|
55
+ | SimWhisper-Codec | [πŸ€—](https://huggingface.co/xxx123456/SimWhisper_Codec) | LibriSpeech |
56
+
57
+
58
+ ### Download Model Weights
59
+
60
+ You need to download the SimWhisper-Codec model weights. You can find the weights in the [SimWhisper-Codec Hugging Face repository](https://huggingface.co/xxx123456/SimWhisper_Codec).
61
+
62
+ ```bash
63
+ mkdir -p ./weights && huggingface-cli download xxx123456/SimWhisper_Codec SimWhisperCodec.pt --local-dir ./weights/
64
+ ```
65
+
66
+ ### Inference
67
+
68
+ ```python
69
+ python inference.py --input_dir /path/to/LibriSpeech/test-clean
70
+ ```
71
+
72
+ The reconstructed audio files will be available in the `output_wavs/` directory.
73
+
74
+ ## πŸ™ Acknowledgements
75
+
76
+ Our codebase builds upon the [XY-Tokenizer](https://github.com/gyt1145028706/XY-Tokenizer). We thank the authors for their excellent work.
77
+
78
+ ## πŸ“ Citation
79
+
80
+ If you find this work useful in your research, please cite our paper:
81
+
82
+ ```
83
+ @misc{zhang2025speakingclearlysimplifiedwhisperbased,
84
+ title={Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding},
85
+ author={Xin Zhang and Lin Li and Xiangni Lu and Jianquan Liu and Kong Aik Lee},
86
+ year={2025},
87
+ eprint={2510.20504},
88
+ archivePrefix={arXiv},
89
+ primaryClass={cs.SD},
90
+ url={https://arxiv.org/abs/2510.20504},
91
+ }
92
+ ```
93
+
94
+ ---
95
+ license: apache-2.0
96
+ ---