YatharthS
/

LavaSR

Model card Files Files and versions

LavaSR / README.md

YatharthS's picture

Update README.md

b98dc8b verified 3 days ago

|

history blame contribute delete

1.39 kB

	---
	license: apache-2.0
	pipeline_tag: audio-to-audio
	---

	LavaSR(v2) is a novel 50MB BWE(bandwidth extension) model along with the UL-UNAS denoiser. It can enhance nearly 5000 seconds of audio in just 1 second while exceeding the quality of 6gb large diffusion models.

	### Details
	* Model Size: 50mb for pytorch version.
	* Input Rate: Any from 8-48khz.
	* Output Rate: 48kHz
	* Inference Speed: 20-80x realtime on CPU and 800-5000x realtime depending on GPU.

	### Use cases
	- Restore low quality audio datasets
	- Enhance TTS or ASR model quality.
	- Upscale poor quality voice calls.

	### Benchmark Comparison

	Please check out the repo for objective benchmarks: https://github.com/ysharma3501/LavaSR

	\| Model \| Speed on GPU(bs=1) \| Size \| Input range\| Quality \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| LavaSR v2 \| 5000x \| 50MB \| Any from 8-48khz \| Highest \|
	\| AudioSR \| < 1x realtime \| ~3gb+ \| ~2-16khz \| Medium \|
	\| AP-BWE(previous formal fastest) \| < 400x realtime \| ~200MB+ \| 8khz/12khz/16khz \| High \|
	\| NovaSR(previous informal fastest) \| <3600x realtime \| ~50KB+ \| 16khz \| Low \|

	### Usage
	Usage instructions can be found here: https://github.com/ysharma3501/LavaSR

	### Final notes

	The model and code are licensed under the Apache-2.0 license. See LICENSE for details.

	Stars/Likes would be appreciated, thank you.

	Email: yatharthsharma3501@gmail.com