Commit ·
d3a3b56
0
Parent(s):
Duplicate from YatharthS/FlashSR
Browse filesCo-authored-by: Yatharth Sharma <YatharthS@users.noreply.huggingface.co>
- .gitattributes +35 -0
- README.md +37 -0
- onnx/model.onnx +3 -0
- upsampler.pth +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: audio-to-audio
|
| 4 |
+
tags:
|
| 5 |
+
- pytorch
|
| 6 |
+
- audio
|
| 7 |
+
- upsampling
|
| 8 |
+
---
|
| 9 |
+
# FlashSR
|
| 10 |
+
|
| 11 |
+
FlashSR is a 2MB audio super-resolution model based on the HierSpeech++'s upsampler architecture. It upscales 16kHz audio to 48kHz at speeds ranging from 200x to 400x real-time.
|
| 12 |
+
|
| 13 |
+
### Details
|
| 14 |
+
* **Model Size:** 2MB for pytorch version, 500KB for onnx version
|
| 15 |
+
* **Input Rate:** 16kHz
|
| 16 |
+
* **Output Rate:** 48kHz
|
| 17 |
+
* **Inference Speed:** 200x - 400x real-time depending on gpu and dtype
|
| 18 |
+
|
| 19 |
+
### Performance Summary
|
| 20 |
+
FlashSR is designed for high-speed frequency reconstruction. It offers a significantly lower computational footprint compared to alternatives such as Resemble-Enhance and ClearerVoice, while maintaining similar output quality.
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
### Benchmark Comparison
|
| 25 |
+
|
| 26 |
+
| Model | Speed | Size |
|
| 27 |
+
| :--- | :--- | :--- |
|
| 28 |
+
| **FlashSR** | **200x - 400x realtime** | **2MB/500KB** |
|
| 29 |
+
| Resemble-Enhance | < 20x realtime | ~700MB+ |
|
| 30 |
+
| ClearerVoice | < 20x realtime | ~200MB+ |
|
| 31 |
+
|
| 32 |
+
### Usage
|
| 33 |
+
Usage instructions for onnx/pytorch and source code are available on GitHub:
|
| 34 |
+
https://github.com/ysharma3501/FlashSR
|
| 35 |
+
|
| 36 |
+
### Credits
|
| 37 |
+
Thanks to the authors of **HierSpeech++** as this was based on it's 48khz upsampler and [Xenova](https://github.com/xenova/) for onnx code.
|
onnx/model.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e255c76b227f16f7f392cc43677c38bd2c5aa129f042a2ba3eb03fb29e470c7a
|
| 3 |
+
size 498624
|
upsampler.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:62c70874ac4efeb4dc9c8aa9dc0a611a951e1c36292abeb4c406d7fb91e0eefc
|
| 3 |
+
size 1715101
|