ZheqiDAI commited on
Commit
2cf4fe7
·
1 Parent(s): 00bc5f3

change RM

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -1,5 +1,7 @@
1
  # Musimple:Text2Music with DiT Made simple
2
 
 
 
3
  ## Introduction
4
 
5
  This repository provides a simple and clear implementation of a **Text-to-Music Generation** pipeline using a **DiT (Diffusion Transformer)** model. The codebase includes key components such as **model training**, **inference**, and **evaluation**. We use the **GTZAN dataset** as an example to demonstrate a minimal, working pipeline for text-conditioned music generation.
@@ -47,10 +49,8 @@ Next, convert the audio files into an HDF5 format using the gtzan2h5.py script:
47
  python gtzan2h5.py --root_dir /path/to/audio/files --output_h5_file /path/to/output.h5 --config_path bigvgan_v2_22khz_80band_256x/config.json --sr 22050
48
  ```
49
 
50
- Preprocessed Data
51
  If this process seems cumbersome, don’t worry! **We have already preprocessed the dataset**, and you can find it in the **musimple/dataset** directory. You can download and use this data directly to skip the preprocessing steps.
52
 
53
- Data Breakdown
54
  In this preprocessing stage, there are two main parts:
55
 
56
  Text to Latent Transformation: We use a Sentence Transformer to convert text labels into latent representations.
@@ -66,7 +66,6 @@ cd Musimple
66
  python train.py
67
  ```
68
 
69
- Configurable Parameters
70
  All training-related parameters can be adjusted in the configuration file located at:
71
  ```
72
  ./config/train.yaml
 
1
  # Musimple:Text2Music with DiT Made simple
2
 
3
+ Due to repository size limitations, the complete dataset and checkpoints are available on Hugging Face: [https://huggingface.co/ZheqiDAI/Musimple](https://huggingface.co/ZheqiDAI/Musimple).
4
+
5
  ## Introduction
6
 
7
  This repository provides a simple and clear implementation of a **Text-to-Music Generation** pipeline using a **DiT (Diffusion Transformer)** model. The codebase includes key components such as **model training**, **inference**, and **evaluation**. We use the **GTZAN dataset** as an example to demonstrate a minimal, working pipeline for text-conditioned music generation.
 
49
  python gtzan2h5.py --root_dir /path/to/audio/files --output_h5_file /path/to/output.h5 --config_path bigvgan_v2_22khz_80band_256x/config.json --sr 22050
50
  ```
51
 
 
52
  If this process seems cumbersome, don’t worry! **We have already preprocessed the dataset**, and you can find it in the **musimple/dataset** directory. You can download and use this data directly to skip the preprocessing steps.
53
 
 
54
  In this preprocessing stage, there are two main parts:
55
 
56
  Text to Latent Transformation: We use a Sentence Transformer to convert text labels into latent representations.
 
66
  python train.py
67
  ```
68
 
 
69
  All training-related parameters can be adjusted in the configuration file located at:
70
  ```
71
  ./config/train.yaml