| license: apache-2.0 | |
| pipeline_tag: audio-to-audio | |
| tags: | |
| - pytorch | |
| - audio | |
| - upsampling | |
| # FlashSR | |
| FlashSR is a 2MB audio super-resolution model based on the HierSpeech++'s upsampler architecture. It upscales 16kHz audio to 48kHz at speeds ranging from 200x to 400x real-time. | |
| ### Details | |
| * **Model Size:** 2MB for pytorch version, 500KB for onnx version | |
| * **Input Rate:** 16kHz | |
| * **Output Rate:** 48kHz | |
| * **Inference Speed:** 200x - 400x real-time depending on gpu and dtype | |
| ### Performance Summary | |
| FlashSR is designed for high-speed frequency reconstruction. It offers a significantly lower computational footprint compared to alternatives such as Resemble-Enhance and ClearerVoice, while maintaining similar output quality. | |
| ### Benchmark Comparison | |
| | Model | Speed | Size | | |
| | :--- | :--- | :--- | | |
| | **FlashSR** | **200x - 400x realtime** | **2MB/500KB** | | |
| | Resemble-Enhance | < 20x realtime | ~700MB+ | | |
| | ClearerVoice | < 20x realtime | ~200MB+ | | |
| ### Usage | |
| Usage instructions for onnx/pytorch and source code are available on GitHub: | |
| https://github.com/ysharma3501/FlashSR | |
| ### Credits | |
| Thanks to the authors of **HierSpeech++** as this was based on it's 48khz upsampler and [Xenova](https://github.com/xenova/) for onnx code. |