change RM
Browse files
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# Musimple:Text2Music with DiT Made simple
|
| 2 |
|
|
|
|
|
|
|
| 3 |
## Introduction
|
| 4 |
|
| 5 |
This repository provides a simple and clear implementation of a **Text-to-Music Generation** pipeline using a **DiT (Diffusion Transformer)** model. The codebase includes key components such as **model training**, **inference**, and **evaluation**. We use the **GTZAN dataset** as an example to demonstrate a minimal, working pipeline for text-conditioned music generation.
|
|
@@ -47,10 +49,8 @@ Next, convert the audio files into an HDF5 format using the gtzan2h5.py script:
|
|
| 47 |
python gtzan2h5.py --root_dir /path/to/audio/files --output_h5_file /path/to/output.h5 --config_path bigvgan_v2_22khz_80band_256x/config.json --sr 22050
|
| 48 |
```
|
| 49 |
|
| 50 |
-
Preprocessed Data
|
| 51 |
If this process seems cumbersome, don’t worry! **We have already preprocessed the dataset**, and you can find it in the **musimple/dataset** directory. You can download and use this data directly to skip the preprocessing steps.
|
| 52 |
|
| 53 |
-
Data Breakdown
|
| 54 |
In this preprocessing stage, there are two main parts:
|
| 55 |
|
| 56 |
Text to Latent Transformation: We use a Sentence Transformer to convert text labels into latent representations.
|
|
@@ -66,7 +66,6 @@ cd Musimple
|
|
| 66 |
python train.py
|
| 67 |
```
|
| 68 |
|
| 69 |
-
Configurable Parameters
|
| 70 |
All training-related parameters can be adjusted in the configuration file located at:
|
| 71 |
```
|
| 72 |
./config/train.yaml
|
|
|
|
| 1 |
# Musimple:Text2Music with DiT Made simple
|
| 2 |
|
| 3 |
+
Due to repository size limitations, the complete dataset and checkpoints are available on Hugging Face: [https://huggingface.co/ZheqiDAI/Musimple](https://huggingface.co/ZheqiDAI/Musimple).
|
| 4 |
+
|
| 5 |
## Introduction
|
| 6 |
|
| 7 |
This repository provides a simple and clear implementation of a **Text-to-Music Generation** pipeline using a **DiT (Diffusion Transformer)** model. The codebase includes key components such as **model training**, **inference**, and **evaluation**. We use the **GTZAN dataset** as an example to demonstrate a minimal, working pipeline for text-conditioned music generation.
|
|
|
|
| 49 |
python gtzan2h5.py --root_dir /path/to/audio/files --output_h5_file /path/to/output.h5 --config_path bigvgan_v2_22khz_80band_256x/config.json --sr 22050
|
| 50 |
```
|
| 51 |
|
|
|
|
| 52 |
If this process seems cumbersome, don’t worry! **We have already preprocessed the dataset**, and you can find it in the **musimple/dataset** directory. You can download and use this data directly to skip the preprocessing steps.
|
| 53 |
|
|
|
|
| 54 |
In this preprocessing stage, there are two main parts:
|
| 55 |
|
| 56 |
Text to Latent Transformation: We use a Sentence Transformer to convert text labels into latent representations.
|
|
|
|
| 66 |
python train.py
|
| 67 |
```
|
| 68 |
|
|
|
|
| 69 |
All training-related parameters can be adjusted in the configuration file located at:
|
| 70 |
```
|
| 71 |
./config/train.yaml
|