MiloMusic / YuE /finetune /README.md
futurespyhi
1.add YuE 2.modify .gitignore 3.modify requirements.txt
15389e6

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

YuE Finetuning Guide

This guide walks you through the process of finetuning the YuE model using your own data.

Table of Contents

  1. Data Preparation
  2. Training Data Configuration
  3. Model Finetuning

Requirements

  • Python 3.10 is recommended
  • PyTorch 2.4 is recommended
  • CUDA 12.1+ is recommended
git clone https://github.com/multimodal-art-projection/YuE.git
cd YuE/finetune/
conda create -n yue-ft python=3.10
conda activate yue-ft
pip install -r requirements.txt

Step 1: Data Preparation

Required Data Structure

Your data should be organized in the following structure:

example/
β”œβ”€β”€ jsonl/     # Source JSONL files
β”œβ”€β”€ mmap/      # Generated Megatron binary files
└── npy/       # Discrete audio codes (numpy arrays) from xcodec

JSONL File Format

Each JSONL file should contain entries in the following format:

{
    "id": "1",
    "codec": "example/npy/dummy.npy",                    // Raw audio codes
    "vocals_codec": "example/npy/dummy.Vocals.npy",      // Vocal track codes
    "instrumental_codec": "example/npy/dummy.Instrumental.npy",  // Instrumental track codes
    "audio_length_in_sec": 85.16,                        // Audio duration in seconds
    "msa": [                                             // Music Structure Analysis
        {
            "start": 0,
            "end": 13.93,
            "label": "intro"
        }
    ],
    "genres": "male, youth, powerful, charismatic, rock, punk",  // Tags for gender, age, genre, mood, timbre
    "splitted_lyrics": {
        "segmented_lyrics": [
            {
                "offset": 0,
                "duration": 13.93,
                "codec_frame_start": 0,
                "codec_frame_end": 696,
                "line_content": "[intro]\n\n"
            }
        ]
    }
}

Converting to Megatron Binary Format

  1. Navigate to the finetune directory:
cd finetune/
  1. Run the preprocessing script:
# For Chain-of-Thought (CoT) dataset
bash scripts/preprocess_data.sh dummy cot $TOKENIZER_MODEL

# For In-Context Learning (ICL) dataset
bash scripts/preprocess_data.sh dummy icl_cot $TOKENIZER_MODEL

Note: For music structure analysis and track separation, refer to openl2s.

Step 2: Training Data Configuration

Counting Dataset Tokens

  1. Navigate to the finetune directory:
cd finetune/
  1. Run the token counting script:
bash scripts/count_tokens.sh ./example/mmap/

The results will be saved in finetune/count_token_logs/. This process may take several minutes for large datasets.

Configuring Data Mixture

  1. Create a configuration file (e.g., finetune/example/dummy_data_mixture_cfg.yml) with the following parameters:

    • TOKEN_COUNT_LOG_DIR: Directory containing token count logs
    • GLOBAL_BATCH_SIZE: Total batch size for training
    • SEQ_LEN: Maximum context window size
    • {NUM}_ROUND: Number of times to repeat each dataset
  2. Generate training parameters:

cd finetune/
python core/parse_mixture.py -c example/dummy_data_mixture_cfg.yml

The script will output:

  • DATA_PATH: Paths to your training data (copy to training script)
  • TRAIN_ITERS: Number of training iterations
  • Total token count

Step 3: Model Finetuning

YuE supports finetuning using LoRA (Low-Rank Adaptation), which significantly reduces memory requirements while maintaining performance.

Configuring the Finetuning Script

  1. Edit the scripts/run_finetune.sh script to configure your finetuning run:
# Update data paths
# Accepted formats for DATA_PATH:
#   1) a single path: "/path/to/data"
#   2) multiple datasets with weights: "100 /path/to/data1 200 /path/to/data2 ..."
# You can copy DATA_PATH from the output of core/parse_mixture.py in Step 2
DATA_PATH="data1-weight /path/to/data1 data2-weight /path/to/data2"
DATA_CACHE_PATH="/path/to/your/cache"

# Set comma-separated list of proportions for train/val/test split
DATA_SPLIT="900,50,50"

# Set model paths
TOKENIZER_MODEL_PATH="/path/to/tokenizer"
MODEL_NAME="m-a-p/YuE-s1-7B-anneal-en-cot"  # or your local model path
MODEL_CACHE_DIR="/path/to/model/cache"
OUTPUT_DIR="/path/to/save/finetuned/model"

# Configure LoRA parameters (optional)
LORA_R=64              # Rank of the LoRA update matrices
LORA_ALPHA=32          # Scaling factor for the LoRA update
LORA_DROPOUT=0.1       # Dropout probability for LoRA layers
  1. Adjust training hyperparameters as needed:
# Training hyperparameters
PER_DEVICE_TRAIN_BATCH_SIZE=1
NUM_TRAIN_EPOCHS=10

Running the Finetuning Process

cd finetune/
bash scripts/run_finetune.sh

For help with configuring the script:

bash scripts/run_finetune.sh --help

Monitoring Training

If you've enabled WandB logging (via USE_WANDB=true), you can monitor your training progress in real-time through the WandB dashboard.

Using the Finetuned Model

After training completes, your model will be saved to the specified OUTPUT_DIR. You can use this model for inference or further finetuning.