Spaces:
Runtime error
Runtime error
update README
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ Audio can be represented as images by transforming to a [mel spectrogram](https:
|
|
| 11 |
A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
|
| 12 |
|
| 13 |
## Generate Mel spectrogram dataset from directory of audio files
|
| 14 |
-
|
| 15 |
|
| 16 |
```bash
|
| 17 |
python src/audio_to_images.py \
|
|
@@ -21,7 +21,7 @@ python src/audio_to_images.py \
|
|
| 21 |
--output_dir data-test
|
| 22 |
```
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
```bash
|
| 27 |
python src/audio_to_images.py \
|
|
@@ -31,7 +31,7 @@ python src/audio_to_images.py \
|
|
| 31 |
--push_to_hub teticio\audio-diffusion-256
|
| 32 |
```
|
| 33 |
## Train model
|
| 34 |
-
|
| 35 |
|
| 36 |
```bash
|
| 37 |
accelerate launch --config_file accelerate_local.yaml \
|
|
@@ -48,7 +48,7 @@ accelerate launch --config_file accelerate_local.yaml \
|
|
| 48 |
--mixed_precision no
|
| 49 |
```
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
```bash
|
| 54 |
accelerate launch --config_file accelerate_local.yaml \
|
|
@@ -65,7 +65,7 @@ accelerate launch --config_file accelerate_local.yaml \
|
|
| 65 |
--mixed_precision no
|
| 66 |
```
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
```bash
|
| 71 |
accelerate launch --config_file accelerate_sagemaker.yaml \
|
|
|
|
| 11 |
A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
|
| 12 |
|
| 13 |
## Generate Mel spectrogram dataset from directory of audio files
|
| 14 |
+
#### Training can be run with Mel spectrograms of resolution 64x64 on a single commercial grade GPU (e.g. RTX 2080 Ti). The `hop_length` should be set to 1024 for better results.
|
| 15 |
|
| 16 |
```bash
|
| 17 |
python src/audio_to_images.py \
|
|
|
|
| 21 |
--output_dir data-test
|
| 22 |
```
|
| 23 |
|
| 24 |
+
#### Generate dataset of 256x256 Mel spectrograms and push to hub (you will need to be authenticated with `huggingface-cli login`).
|
| 25 |
|
| 26 |
```bash
|
| 27 |
python src/audio_to_images.py \
|
|
|
|
| 31 |
--push_to_hub teticio\audio-diffusion-256
|
| 32 |
```
|
| 33 |
## Train model
|
| 34 |
+
#### Run training on local machine.
|
| 35 |
|
| 36 |
```bash
|
| 37 |
accelerate launch --config_file accelerate_local.yaml \
|
|
|
|
| 48 |
--mixed_precision no
|
| 49 |
```
|
| 50 |
|
| 51 |
+
#### Run training on local machine with `batch_size` of 1 and `gradient_accumulation_steps` 16 to compensate, so that 256x256 resolution model fits on commercial grade GPU.
|
| 52 |
|
| 53 |
```bash
|
| 54 |
accelerate launch --config_file accelerate_local.yaml \
|
|
|
|
| 65 |
--mixed_precision no
|
| 66 |
```
|
| 67 |
|
| 68 |
+
#### Run training on SageMaker.
|
| 69 |
|
| 70 |
```bash
|
| 71 |
accelerate launch --config_file accelerate_sagemaker.yaml \
|