|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: audio-to-audio |
|
|
--- |
|
|
|
|
|
LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement |
|
|
|
|
|
Demo Page: https://kevin-naticl.github.io/LLaSE-Demopage/ |
|
|
|
|
|
Github: https://github.com/Kevin-naticl/LLaSE |
|
|
|
|
|
Abstract |
|
|
Language Models (LMs) have shown strong capabilities in semantic understanding and contextual modeling, making them promising for speech enhancement. |
|
|
Building on SELM, our previous work that first introduced LMs to speech enhancement, we note that SELM and other existing generative |
|
|
speech enhancement approaches still face challenges, such as variations in timbre and content before and after enhancement. |
|
|
To address these limitations, we propose LLaSE, which utilizes continuous representations from WavLM and integrates a LLaMA |
|
|
backbone combined with the more powerful Xcodec decoder, significantly improving contextual modeling capabilities and enabling |
|
|
more accurate and stable enhancement. Experimental results demonstrate that LLaSE achieves state-of-the-art performance on speech enhancement, |
|
|
offering a robust and scalable solution for speech enhancement. |
|
|
|
|
|
|
|
|
|