Update README.md
Browse files
README.md
CHANGED
|
@@ -4,12 +4,16 @@ datasets:
|
|
| 4 |
pipeline_tag: text-to-speech
|
| 5 |
---
|
| 6 |
|
| 7 |
-
This is a basic audio diffusion model using Unet. I've uploaded the weights and training code.
|
|
|
|
|
|
|
|
|
|
| 8 |
   
|
| 12 |
|
|
|
|
| 13 |
The images found in the files are sample{epoch}_{sample#}_{digit}.jpg. They also have corresponding audio files.
|
| 14 |
The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!)
|
| 15 |
|
|
|
|
| 4 |
pipeline_tag: text-to-speech
|
| 5 |
---
|
| 6 |
|
| 7 |
+
This is a basic audio diffusion model using Unet. I've uploaded the weights and training code.
|
| 8 |
+
The sample method of the model is used to generate whatever spoken digit you want.
|
| 9 |
+
I used the awesome code provided by HuggingFace audio diffusers to generate Mel-spectrograms which were then used to train the model.
|
| 10 |
+
For the model code I used the denoising-diffusion-pytorch repo found at https://github.com/lucidrains/denoising-diffusion-pytorch
|
| 11 |
   
|
| 15 |
|
| 16 |
+
|
| 17 |
The images found in the files are sample{epoch}_{sample#}_{digit}.jpg. They also have corresponding audio files.
|
| 18 |
The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!)
|
| 19 |
|