Duplicated from YatharthS/FlashSR

SuperPauly
/

FlashSR

Model card Files Files and versions

FlashSR / README.md

SuperPauly's picture

Duplicate from YatharthS/FlashSR

d3a3b56 about 2 months ago

|

history blame contribute delete

1.24 kB

	---
	license: apache-2.0
	pipeline_tag: audio-to-audio
	tags:
	- pytorch
	- audio
	- upsampling
	---
	# FlashSR

	FlashSR is a 2MB audio super-resolution model based on the HierSpeech++'s upsampler architecture. It upscales 16kHz audio to 48kHz at speeds ranging from 200x to 400x real-time.

	### Details
	* Model Size: 2MB for pytorch version, 500KB for onnx version
	* Input Rate: 16kHz
	* Output Rate: 48kHz
	* Inference Speed: 200x - 400x real-time depending on gpu and dtype

	### Performance Summary
	FlashSR is designed for high-speed frequency reconstruction. It offers a significantly lower computational footprint compared to alternatives such as Resemble-Enhance and ClearerVoice, while maintaining similar output quality.



	### Benchmark Comparison

	\| Model \| Speed \| Size \|
	\| :--- \| :--- \| :--- \|
	\| FlashSR \| 200x - 400x realtime \| 2MB/500KB \|
	\| Resemble-Enhance \| < 20x realtime \| ~700MB+ \|
	\| ClearerVoice \| < 20x realtime \| ~200MB+ \|

	### Usage
	Usage instructions for onnx/pytorch and source code are available on GitHub:
	https://github.com/ysharma3501/FlashSR

	### Credits
	Thanks to the authors of HierSpeech++ as this was based on it's 48khz upsampler and [Xenova](https://github.com/xenova/) for onnx code.