LavaSR / README.md
YatharthS's picture
Update README.md
b98dc8b verified
---
license: apache-2.0
pipeline_tag: audio-to-audio
---
LavaSR(v2) is a novel 50MB BWE(bandwidth extension) model along with the UL-UNAS denoiser. It can enhance nearly 5000 seconds of audio in just 1 second while exceeding the quality of 6gb large diffusion models.
### Details
* **Model Size:** 50mb for pytorch version.
* **Input Rate:** Any from 8-48khz.
* **Output Rate:** 48kHz
* **Inference Speed:** 20-80x realtime on CPU and 800-5000x realtime depending on GPU.
### Use cases
- Restore low quality audio datasets
- Enhance TTS or ASR model quality.
- Upscale poor quality voice calls.
### Benchmark Comparison
Please check out the repo for objective benchmarks: https://github.com/ysharma3501/LavaSR
| Model | Speed on GPU(bs=1) | Size | Input range| Quality |
| :--- | :--- | :--- | :--- | :--- |
| **LavaSR v2** | **5000x** | **50MB** | **Any from 8-48khz** | **Highest** |
| AudioSR | < 1x realtime | ~3gb+ | ~2-16khz | Medium |
| AP-BWE(previous formal fastest) | < 400x realtime | ~200MB+ | 8khz/12khz/16khz | High |
| NovaSR(previous informal fastest) | <3600x realtime | ~50KB+ | 16khz | Low |
### Usage
Usage instructions can be found here: https://github.com/ysharma3501/LavaSR
### Final notes
The model and code are licensed under the Apache-2.0 license. See LICENSE for details.
Stars/Likes would be appreciated, thank you.
Email: yatharthsharma3501@gmail.com