| | --- |
| | license: apache-2.0 |
| | pipeline_tag: audio-to-audio |
| | --- |
| | |
| | LavaSR(v2) is a novel 50MB BWE(bandwidth extension) model along with the UL-UNAS denoiser. It can enhance nearly 5000 seconds of audio in just 1 second while exceeding the quality of 6gb large diffusion models. |
| |
|
| | ### Details |
| | * **Model Size:** 50mb for pytorch version. |
| | * **Input Rate:** Any from 8-48khz. |
| | * **Output Rate:** 48kHz |
| | * **Inference Speed:** 20-80x realtime on CPU and 800-5000x realtime depending on GPU. |
| |
|
| | ### Use cases |
| | - Restore low quality audio datasets |
| | - Enhance TTS or ASR model quality. |
| | - Upscale poor quality voice calls. |
| |
|
| | ### Benchmark Comparison |
| |
|
| | Please check out the repo for objective benchmarks: https://github.com/ysharma3501/LavaSR |
| |
|
| | | Model | Speed on GPU(bs=1) | Size | Input range| Quality | |
| | | :--- | :--- | :--- | :--- | :--- | |
| | | **LavaSR v2** | **5000x** | **50MB** | **Any from 8-48khz** | **Highest** | |
| | | AudioSR | < 1x realtime | ~3gb+ | ~2-16khz | Medium | |
| | | AP-BWE(previous formal fastest) | < 400x realtime | ~200MB+ | 8khz/12khz/16khz | High | |
| | | NovaSR(previous informal fastest) | <3600x realtime | ~50KB+ | 16khz | Low | |
| |
|
| | ### Usage |
| | Usage instructions can be found here: https://github.com/ysharma3501/LavaSR |
| |
|
| | ### Final notes |
| |
|
| | The model and code are licensed under the Apache-2.0 license. See LICENSE for details. |
| |
|
| | Stars/Likes would be appreciated, thank you. |
| |
|
| | Email: yatharthsharma3501@gmail.com |