LavaSR / README.md
YatharthS's picture
Update README.md
b98dc8b verified
metadata
license: apache-2.0
pipeline_tag: audio-to-audio

LavaSR(v2) is a novel 50MB BWE(bandwidth extension) model along with the UL-UNAS denoiser. It can enhance nearly 5000 seconds of audio in just 1 second while exceeding the quality of 6gb large diffusion models.

Details

  • Model Size: 50mb for pytorch version.
  • Input Rate: Any from 8-48khz.
  • Output Rate: 48kHz
  • Inference Speed: 20-80x realtime on CPU and 800-5000x realtime depending on GPU.

Use cases

  • Restore low quality audio datasets
  • Enhance TTS or ASR model quality.
  • Upscale poor quality voice calls.

Benchmark Comparison

Please check out the repo for objective benchmarks: https://github.com/ysharma3501/LavaSR

Model Speed on GPU(bs=1) Size Input range Quality
LavaSR v2 5000x 50MB Any from 8-48khz Highest
AudioSR < 1x realtime ~3gb+ ~2-16khz Medium
AP-BWE(previous formal fastest) < 400x realtime ~200MB+ 8khz/12khz/16khz High
NovaSR(previous informal fastest) <3600x realtime ~50KB+ 16khz Low

Usage

Usage instructions can be found here: https://github.com/ysharma3501/LavaSR

Final notes

The model and code are licensed under the Apache-2.0 license. See LICENSE for details.

Stars/Likes would be appreciated, thank you.

Email: yatharthsharma3501@gmail.com