LavaSR / README.md

Update README.md

b98dc8b verified 2 days ago

1.39 kB

license: apache-2.0
pipeline_tag: audio-to-audio

LavaSR(v2) is a novel 50MB BWE(bandwidth extension) model along with the UL-UNAS denoiser. It can enhance nearly 5000 seconds of audio in just 1 second while exceeding the quality of 6gb large diffusion models.

Details

Model Size: 50mb for pytorch version.
Input Rate: Any from 8-48khz.
Output Rate: 48kHz
Inference Speed: 20-80x realtime on CPU and 800-5000x realtime depending on GPU.

Use cases

Restore low quality audio datasets
Enhance TTS or ASR model quality.
Upscale poor quality voice calls.

Benchmark Comparison

Please check out the repo for objective benchmarks: https://github.com/ysharma3501/LavaSR

Model	Speed on GPU(bs=1)	Size	Input range	Quality
LavaSR v2	5000x	50MB	Any from 8-48khz	Highest
AudioSR	< 1x realtime	~3gb+	~2-16khz	Medium
AP-BWE(previous formal fastest)	< 400x realtime	~200MB+	8khz/12khz/16khz	High
NovaSR(previous informal fastest)	<3600x realtime	~50KB+	16khz	Low

Usage

Usage instructions can be found here: https://github.com/ysharma3501/LavaSR

Final notes

The model and code are licensed under the Apache-2.0 license. See LICENSE for details.

Stars/Likes would be appreciated, thank you.

Email: yatharthsharma3501@gmail.com