Heinrich Dinkel commited on
Commit
7a87364
·
1 Parent(s): e23a1f0

updated README

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -1,3 +1,60 @@
1
  ---
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ pipeline_tag: audio-to-audio
4
+ tags:
5
+ - signal-processing
6
  license: apache-2.0
7
  ---
8
+
9
+
10
+ <div align="center">
11
+ <h1>
12
+ Dasheng Denoiser
13
+ </h1>
14
+ <p>
15
+ Official PyTorch inference code for the Interspeech 2025 paper: <br>
16
+ <b><em>Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders</em></b>
17
+ </p>
18
+ <a href="https://arxiv.org/abs/2506.11514"><img src="https://img.shields.io/badge/arxiv-2506.11514-red" alt="version"></a>
19
+ <a href="https://www.python.org"><img src="https://img.shields.io/badge/Python-3.10+-orange" alt="version"></a>
20
+ <a href="https://pytorch.org"><img src="https://img.shields.io/badge/PyTorch-2.0+-brightgreen" alt="python"></a>
21
+ <a href="https://www.apache.org/licenses/LICENSE-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="mit"></a>
22
+ <a href="https://github.com/xiaomi-research/dasheng-denoiser"><img src="https://img.shields.io/github/stars/xiaomi-research/dasheng-denoiser?style=social" alt="stars"></a>
23
+
24
+
25
+ </div>
26
+
27
+
28
+ # Installation and Usage
29
+
30
+ ```bash
31
+ uv pip install transformers torch torchaudio einops
32
+ ```
33
+
34
+ ```python
35
+ import torch
36
+ import torchaudio
37
+ from transformers import AutoModel
38
+ model = AutoModel.from_pretrained("mispeech/dasheng-denoiser", trust_remote_code=True)
39
+ model.eval()
40
+ # Load audio file (only 16kHz supported!)
41
+ audio, sr = torchaudio.load("path/to/audio.wav")
42
+ with torch.no_grad(), torch.autocast(device_type='cuda'):
43
+ enhanced = model(audio)
44
+ torchaudio.save("enhanced_audio.wav", enhanced, sr)
45
+ ```
46
+
47
+
48
+ # Acknowledgements
49
+ We referred to [Dasheng](https://github.com/XiaoMi/Dasheng) and [Vocos](https://github.com/gemelo-ai/vocos) to implement this.
50
+
51
+ # Citation
52
+
53
+ ```bibtex
54
+ @inproceedings{xingwei2025dashengdenoiser,
55
+ title={Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders},
56
+ author={Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan},
57
+ booktitle={Interspeech 2025},
58
+ year={2025}
59
+ }
60
+ ```