| | ---
|
| | library_name: transformers
|
| | pipeline_tag: audio-to-audio
|
| | tags:
|
| | - signal-processing
|
| | license: apache-2.0
|
| | ---
|
| |
|
| |
|
| | <div align="center">
|
| | <h1>
|
| | Dasheng Denoiser
|
| | </h1>
|
| | <p>
|
| | Official PyTorch inference code for the Interspeech 2025 paper: <br>
|
| | <b><em>Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders</em></b>
|
| | </p>
|
| | <a href="https://arxiv.org/abs/2506.11514"><img src="https://img.shields.io/badge/arxiv-2506.11514-red" alt="version"></a>
|
| | <a href="https://www.python.org"><img src="https://img.shields.io/badge/Python-3.10+-orange" alt="version"></a>
|
| | <a href="https://pytorch.org"><img src="https://img.shields.io/badge/PyTorch-2.0+-brightgreen" alt="python"></a>
|
| | <a href="https://www.apache.org/licenses/LICENSE-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="mit"></a>
|
| | <a href="https://github.com/xiaomi-research/dasheng-denoiser"><img src="https://img.shields.io/github/stars/xiaomi-research/dasheng-denoiser?style=social" alt="stars"></a>
|
| |
|
| |
|
| | </div>
|
| |
|
| |
|
| | # Installation and Usage
|
| |
|
| | ```bash
|
| | uv pip install transformers torch torchaudio einops
|
| | ```
|
| |
|
| | ```python
|
| | import torch
|
| | import torchaudio
|
| | from transformers import AutoModel
|
| | model = AutoModel.from_pretrained("mispeech/dasheng-denoiser", trust_remote_code=True)
|
| | model.eval()
|
| | # Load audio file (only 16kHz supported!)
|
| | audio, sr = torchaudio.load("path/to/audio.wav")
|
| | with torch.no_grad(), torch.autocast(device_type='cuda'):
|
| | enhanced = model(audio)
|
| | torchaudio.save("enhanced_audio.wav", enhanced, sr)
|
| | ```
|
| |
|
| |
|
| | # Acknowledgements
|
| | We referred to [Dasheng](https://github.com/XiaoMi/Dasheng) and [Vocos](https://github.com/gemelo-ai/vocos) to implement this.
|
| |
|
| | # Citation
|
| |
|
| | ```bibtex
|
| | @inproceedings{xingwei2025dashengdenoiser,
|
| | title={Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders},
|
| | author={Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan},
|
| | booktitle={Interspeech 2025},
|
| | year={2025}
|
| | }
|
| | ```
|
| |
|