Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.3.0
YuE Finetuning Guide
This guide walks you through the process of finetuning the YuE model using your own data.
Table of Contents
Requirements
- Python 3.10 is recommended
- PyTorch 2.4 is recommended
- CUDA 12.1+ is recommended
git clone https://github.com/multimodal-art-projection/YuE.git
cd YuE/finetune/
conda create -n yue-ft python=3.10
conda activate yue-ft
pip install -r requirements.txt
Step 1: Data Preparation
Required Data Structure
Your data should be organized in the following structure:
example/
βββ jsonl/ # Source JSONL files
βββ mmap/ # Generated Megatron binary files
βββ npy/ # Discrete audio codes (numpy arrays) from xcodec
JSONL File Format
Each JSONL file should contain entries in the following format:
{
"id": "1",
"codec": "example/npy/dummy.npy", // Raw audio codes
"vocals_codec": "example/npy/dummy.Vocals.npy", // Vocal track codes
"instrumental_codec": "example/npy/dummy.Instrumental.npy", // Instrumental track codes
"audio_length_in_sec": 85.16, // Audio duration in seconds
"msa": [ // Music Structure Analysis
{
"start": 0,
"end": 13.93,
"label": "intro"
}
],
"genres": "male, youth, powerful, charismatic, rock, punk", // Tags for gender, age, genre, mood, timbre
"splitted_lyrics": {
"segmented_lyrics": [
{
"offset": 0,
"duration": 13.93,
"codec_frame_start": 0,
"codec_frame_end": 696,
"line_content": "[intro]\n\n"
}
]
}
}
Converting to Megatron Binary Format
- Navigate to the finetune directory:
cd finetune/
- Run the preprocessing script:
# For Chain-of-Thought (CoT) dataset
bash scripts/preprocess_data.sh dummy cot $TOKENIZER_MODEL
# For In-Context Learning (ICL) dataset
bash scripts/preprocess_data.sh dummy icl_cot $TOKENIZER_MODEL
Note: For music structure analysis and track separation, refer to openl2s.
Step 2: Training Data Configuration
Counting Dataset Tokens
- Navigate to the finetune directory:
cd finetune/
- Run the token counting script:
bash scripts/count_tokens.sh ./example/mmap/
The results will be saved in finetune/count_token_logs/. This process may take several minutes for large datasets.
Configuring Data Mixture
Create a configuration file (e.g.,
finetune/example/dummy_data_mixture_cfg.yml) with the following parameters:TOKEN_COUNT_LOG_DIR: Directory containing token count logsGLOBAL_BATCH_SIZE: Total batch size for trainingSEQ_LEN: Maximum context window size{NUM}_ROUND: Number of times to repeat each dataset
Generate training parameters:
cd finetune/
python core/parse_mixture.py -c example/dummy_data_mixture_cfg.yml
The script will output:
DATA_PATH: Paths to your training data (copy to training script)TRAIN_ITERS: Number of training iterations- Total token count
Step 3: Model Finetuning
YuE supports finetuning using LoRA (Low-Rank Adaptation), which significantly reduces memory requirements while maintaining performance.
Configuring the Finetuning Script
- Edit the
scripts/run_finetune.shscript to configure your finetuning run:
# Update data paths
# Accepted formats for DATA_PATH:
# 1) a single path: "/path/to/data"
# 2) multiple datasets with weights: "100 /path/to/data1 200 /path/to/data2 ..."
# You can copy DATA_PATH from the output of core/parse_mixture.py in Step 2
DATA_PATH="data1-weight /path/to/data1 data2-weight /path/to/data2"
DATA_CACHE_PATH="/path/to/your/cache"
# Set comma-separated list of proportions for train/val/test split
DATA_SPLIT="900,50,50"
# Set model paths
TOKENIZER_MODEL_PATH="/path/to/tokenizer"
MODEL_NAME="m-a-p/YuE-s1-7B-anneal-en-cot" # or your local model path
MODEL_CACHE_DIR="/path/to/model/cache"
OUTPUT_DIR="/path/to/save/finetuned/model"
# Configure LoRA parameters (optional)
LORA_R=64 # Rank of the LoRA update matrices
LORA_ALPHA=32 # Scaling factor for the LoRA update
LORA_DROPOUT=0.1 # Dropout probability for LoRA layers
- Adjust training hyperparameters as needed:
# Training hyperparameters
PER_DEVICE_TRAIN_BATCH_SIZE=1
NUM_TRAIN_EPOCHS=10
Running the Finetuning Process
cd finetune/
bash scripts/run_finetune.sh
For help with configuring the script:
bash scripts/run_finetune.sh --help
Monitoring Training
If you've enabled WandB logging (via USE_WANDB=true), you can monitor your training progress in real-time through the WandB dashboard.
Using the Finetuned Model
After training completes, your model will be saved to the specified OUTPUT_DIR. You can use this model for inference or further finetuning.