File size: 1,242 Bytes
80ef68b
1732986
80ef68b
 
 
 
 
ee7c892
1732986
ee7c892
b6d4337
ee7c892
1732986
3e19cc9
1732986
 
 
 
 
 
 
 
 
 
 
 
 
3e19cc9
1732986
 
 
 
3e19cc9
1732986
 
 
3e19cc9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
- pytorch
- audio
- upsampling
---
# FlashSR

FlashSR is a 2MB audio super-resolution model based on the HierSpeech++'s upsampler architecture. It upscales 16kHz audio to 48kHz at speeds ranging from 200x to 400x real-time.

### Details
* **Model Size:** 2MB for pytorch version, 500KB for onnx version
* **Input Rate:** 16kHz
* **Output Rate:** 48kHz
* **Inference Speed:** 200x - 400x real-time depending on gpu and dtype

### Performance Summary
FlashSR is designed for high-speed frequency reconstruction. It offers a significantly lower computational footprint compared to alternatives such as Resemble-Enhance and ClearerVoice, while maintaining similar output quality.



### Benchmark Comparison

| Model | Speed | Size |
| :--- | :--- | :--- |
| **FlashSR** | **200x - 400x realtime** | **2MB/500KB** |
| Resemble-Enhance | < 20x realtime | ~700MB+ |
| ClearerVoice | < 20x realtime | ~200MB+ |

### Usage
Usage instructions for onnx/pytorch and source code are available on GitHub:
https://github.com/ysharma3501/FlashSR

### Credits
Thanks to the authors of **HierSpeech++** as this was based on it's 48khz upsampler and [Xenova](https://github.com/xenova/) for onnx code.