MiloMusic / YuE /finetune /README.md
futurespyhi
1.add YuE 2.modify .gitignore 3.modify requirements.txt
15389e6
# YuE Finetuning Guide
This guide walks you through the process of finetuning the YuE model using your own data.
## Table of Contents
1. [Data Preparation](#step-1-data-preparation)
2. [Training Data Configuration](#step-2-training-data-configuration)
3. [Model Finetuning](#step-3-model-finetuning)
## Requirements
- Python 3.10 is recommended
- PyTorch 2.4 is recommended
- CUDA 12.1+ is recommended
```bash
git clone https://github.com/multimodal-art-projection/YuE.git
cd YuE/finetune/
conda create -n yue-ft python=3.10
conda activate yue-ft
pip install -r requirements.txt
```
## Step 1: Data Preparation
### Required Data Structure
Your data should be organized in the following structure:
```
example/
β”œβ”€β”€ jsonl/ # Source JSONL files
β”œβ”€β”€ mmap/ # Generated Megatron binary files
└── npy/ # Discrete audio codes (numpy arrays) from xcodec
```
### JSONL File Format
Each JSONL file should contain entries in the following format:
```json
{
"id": "1",
"codec": "example/npy/dummy.npy", // Raw audio codes
"vocals_codec": "example/npy/dummy.Vocals.npy", // Vocal track codes
"instrumental_codec": "example/npy/dummy.Instrumental.npy", // Instrumental track codes
"audio_length_in_sec": 85.16, // Audio duration in seconds
"msa": [ // Music Structure Analysis
{
"start": 0,
"end": 13.93,
"label": "intro"
}
],
"genres": "male, youth, powerful, charismatic, rock, punk", // Tags for gender, age, genre, mood, timbre
"splitted_lyrics": {
"segmented_lyrics": [
{
"offset": 0,
"duration": 13.93,
"codec_frame_start": 0,
"codec_frame_end": 696,
"line_content": "[intro]\n\n"
}
]
}
}
```
### Converting to Megatron Binary Format
1. Navigate to the finetune directory:
```bash
cd finetune/
```
2. Run the preprocessing script:
```bash
# For Chain-of-Thought (CoT) dataset
bash scripts/preprocess_data.sh dummy cot $TOKENIZER_MODEL
# For In-Context Learning (ICL) dataset
bash scripts/preprocess_data.sh dummy icl_cot $TOKENIZER_MODEL
```
> **Note**: For music structure analysis and track separation, refer to [openl2s](https://github.com/a43992899/openl2s).
## Step 2: Training Data Configuration
### Counting Dataset Tokens
1. Navigate to the finetune directory:
```bash
cd finetune/
```
2. Run the token counting script:
```bash
bash scripts/count_tokens.sh ./example/mmap/
```
The results will be saved in `finetune/count_token_logs/`. This process may take several minutes for large datasets.
### Configuring Data Mixture
1. Create a configuration file (e.g., `finetune/example/dummy_data_mixture_cfg.yml`) with the following parameters:
- `TOKEN_COUNT_LOG_DIR`: Directory containing token count logs
- `GLOBAL_BATCH_SIZE`: Total batch size for training
- `SEQ_LEN`: Maximum context window size
- `{NUM}_ROUND`: Number of times to repeat each dataset
2. Generate training parameters:
```bash
cd finetune/
python core/parse_mixture.py -c example/dummy_data_mixture_cfg.yml
```
The script will output:
- `DATA_PATH`: Paths to your training data (copy to training script)
- `TRAIN_ITERS`: Number of training iterations
- Total token count
## Step 3: Model Finetuning
YuE supports finetuning using LoRA (Low-Rank Adaptation), which significantly reduces memory requirements while maintaining performance.
### Configuring the Finetuning Script
1. Edit the `scripts/run_finetune.sh` script to configure your finetuning run:
```bash
# Update data paths
# Accepted formats for DATA_PATH:
# 1) a single path: "/path/to/data"
# 2) multiple datasets with weights: "100 /path/to/data1 200 /path/to/data2 ..."
# You can copy DATA_PATH from the output of core/parse_mixture.py in Step 2
DATA_PATH="data1-weight /path/to/data1 data2-weight /path/to/data2"
DATA_CACHE_PATH="/path/to/your/cache"
# Set comma-separated list of proportions for train/val/test split
DATA_SPLIT="900,50,50"
# Set model paths
TOKENIZER_MODEL_PATH="/path/to/tokenizer"
MODEL_NAME="m-a-p/YuE-s1-7B-anneal-en-cot" # or your local model path
MODEL_CACHE_DIR="/path/to/model/cache"
OUTPUT_DIR="/path/to/save/finetuned/model"
# Configure LoRA parameters (optional)
LORA_R=64 # Rank of the LoRA update matrices
LORA_ALPHA=32 # Scaling factor for the LoRA update
LORA_DROPOUT=0.1 # Dropout probability for LoRA layers
```
2. Adjust training hyperparameters as needed:
```bash
# Training hyperparameters
PER_DEVICE_TRAIN_BATCH_SIZE=1
NUM_TRAIN_EPOCHS=10
```
### Running the Finetuning Process
```bash
cd finetune/
bash scripts/run_finetune.sh
```
For help with configuring the script:
```bash
bash scripts/run_finetune.sh --help
```
### Monitoring Training
If you've enabled WandB logging (via `USE_WANDB=true`), you can monitor your training progress in real-time through the WandB dashboard.
### Using the Finetuned Model
After training completes, your model will be saved to the specified `OUTPUT_DIR`. You can use this model for inference or further finetuning.