| | --- |
| | license: apache-2.0 |
| | pipeline_tag: audio-to-audio |
| | tags: |
| | - pytorch |
| | - audio |
| | - upsampling |
| | --- |
| | # FlashSR |
| |
|
| | FlashSR is a 2MB audio super-resolution model based on the HierSpeech++'s upsampler architecture. It upscales 16kHz audio to 48kHz at speeds ranging from 200x to 400x real-time. |
| |
|
| | ### Details |
| | * **Model Size:** 2MB |
| | * **Input Rate:** 16kHz |
| | * **Output Rate:** 48kHz |
| | * **Inference Speed:** 200x - 400x real-time depending on gpu and dtype |
| |
|
| | ### Performance Summary |
| | FlashSR is designed for high-speed frequency reconstruction. It offers a significantly lower computational footprint compared to alternatives such as Resemble-Enhance and ClearerVoice, while maintaining similar output quality. |
| |
|
| |
|
| |
|
| | ### Benchmark Comparison |
| |
|
| | | Model | Speed | Size | |
| | | :--- | :--- | :--- | |
| | | **FlashSR** | **200x - 400x realtime** | **2MB** | |
| | | Resemble-Enhance | < 20x realtime | ~700MB+ | |
| | | ClearerVoice | < 20x realtime | ~200MB+ | |
| |
|
| | ### Usage |
| | Usage instructions and source code are available on GitHub: |
| | https://github.com/ysharma3501/FlashSR |
| |
|
| | ### Credits |
| | Thanks to the authors of **HierSpeech++** as this was based on it's 48khz upsampler. |