ZheqiDAI
/

Musimple

ZheqiDAI commited on Oct 18, 2024

Commit

2cf4fe7

1 Parent(s): 00bc5f3

change RM

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # Musimple:Text2Music with DiT Made simple
 ## Introduction
 This repository provides a simple and clear implementation of a **Text-to-Music Generation** pipeline using a **DiT (Diffusion Transformer)** model. The codebase includes key components such as **model training**, **inference**, and **evaluation**. We use the **GTZAN dataset** as an example to demonstrate a minimal, working pipeline for text-conditioned music generation.
@@ -47,10 +49,8 @@ Next, convert the audio files into an HDF5 format using the gtzan2h5.py script:
 python gtzan2h5.py --root_dir /path/to/audio/files --output_h5_file /path/to/output.h5 --config_path bigvgan_v2_22khz_80band_256x/config.json --sr 22050
 ```
-Preprocessed Data
 If this process seems cumbersome, don’t worry! **We have already preprocessed the dataset**, and you can find it in the **musimple/dataset** directory. You can download and use this data directly to skip the preprocessing steps.
-Data Breakdown
 In this preprocessing stage, there are two main parts:
 Text to Latent Transformation: We use a Sentence Transformer to convert text labels into latent representations.
@@ -66,7 +66,6 @@ cd Musimple
 python train.py
 ```
-Configurable Parameters
 All training-related parameters can be adjusted in the configuration file located at:
 ```
 ./config/train.yaml

 # Musimple:Text2Music with DiT Made simple
+Due to repository size limitations, the complete dataset and checkpoints are available on Hugging Face: [https://huggingface.co/ZheqiDAI/Musimple](https://huggingface.co/ZheqiDAI/Musimple).
 ## Introduction
 This repository provides a simple and clear implementation of a **Text-to-Music Generation** pipeline using a **DiT (Diffusion Transformer)** model. The codebase includes key components such as **model training**, **inference**, and **evaluation**. We use the **GTZAN dataset** as an example to demonstrate a minimal, working pipeline for text-conditioned music generation.
 python gtzan2h5.py --root_dir /path/to/audio/files --output_h5_file /path/to/output.h5 --config_path bigvgan_v2_22khz_80band_256x/config.json --sr 22050
 ```
 If this process seems cumbersome, don’t worry! **We have already preprocessed the dataset**, and you can find it in the **musimple/dataset** directory. You can download and use this data directly to skip the preprocessing steps.
 In this preprocessing stage, there are two main parts:
 Text to Latent Transformation: We use a Sentence Transformer to convert text labels into latent representations.
 python train.py
 ```
 All training-related parameters can be adjusted in the configuration file located at:
 ```
 ./config/train.yaml