LLaSE

File size: 1,122 Bytes

c6425a4
 
 
 
 
 
 
3bc8734
cd7ebca
3bc8734
 
c6425a4

---
license: apache-2.0
pipeline_tag: audio-to-audio
---

LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement

Demo Page: https://kevin-naticl.github.io/LLaSE-Demopage/

Github: https://github.com/Kevin-naticl/LLaSE

Abstract
Language Models (LMs) have shown strong capabilities in semantic understanding and contextual modeling, making them promising for speech enhancement. 
Building on SELM, our previous work that first introduced LMs to speech enhancement, we note that SELM and other existing generative 
speech enhancement approaches still face challenges, such as variations in timbre and content before and after enhancement. 
To address these limitations, we propose LLaSE, which utilizes continuous representations from WavLM and integrates a LLaMA 
backbone combined with the more powerful Xcodec decoder, significantly improving contextual modeling capabilities and enabling 
more accurate and stable enhancement. Experimental results demonstrate that LLaSE achieves state-of-the-art performance on speech enhancement, 
offering a robust and scalable solution for speech enhancement.