Spaces:
Sleeping
Sleeping
| # YuE Finetuning Guide | |
| This guide walks you through the process of finetuning the YuE model using your own data. | |
| ## Table of Contents | |
| 1. [Data Preparation](#step-1-data-preparation) | |
| 2. [Training Data Configuration](#step-2-training-data-configuration) | |
| 3. [Model Finetuning](#step-3-model-finetuning) | |
| ## Requirements | |
| - Python 3.10 is recommended | |
| - PyTorch 2.4 is recommended | |
| - CUDA 12.1+ is recommended | |
| ```bash | |
| git clone https://github.com/multimodal-art-projection/YuE.git | |
| cd YuE/finetune/ | |
| conda create -n yue-ft python=3.10 | |
| conda activate yue-ft | |
| pip install -r requirements.txt | |
| ``` | |
| ## Step 1: Data Preparation | |
| ### Required Data Structure | |
| Your data should be organized in the following structure: | |
| ``` | |
| example/ | |
| βββ jsonl/ # Source JSONL files | |
| βββ mmap/ # Generated Megatron binary files | |
| βββ npy/ # Discrete audio codes (numpy arrays) from xcodec | |
| ``` | |
| ### JSONL File Format | |
| Each JSONL file should contain entries in the following format: | |
| ```json | |
| { | |
| "id": "1", | |
| "codec": "example/npy/dummy.npy", // Raw audio codes | |
| "vocals_codec": "example/npy/dummy.Vocals.npy", // Vocal track codes | |
| "instrumental_codec": "example/npy/dummy.Instrumental.npy", // Instrumental track codes | |
| "audio_length_in_sec": 85.16, // Audio duration in seconds | |
| "msa": [ // Music Structure Analysis | |
| { | |
| "start": 0, | |
| "end": 13.93, | |
| "label": "intro" | |
| } | |
| ], | |
| "genres": "male, youth, powerful, charismatic, rock, punk", // Tags for gender, age, genre, mood, timbre | |
| "splitted_lyrics": { | |
| "segmented_lyrics": [ | |
| { | |
| "offset": 0, | |
| "duration": 13.93, | |
| "codec_frame_start": 0, | |
| "codec_frame_end": 696, | |
| "line_content": "[intro]\n\n" | |
| } | |
| ] | |
| } | |
| } | |
| ``` | |
| ### Converting to Megatron Binary Format | |
| 1. Navigate to the finetune directory: | |
| ```bash | |
| cd finetune/ | |
| ``` | |
| 2. Run the preprocessing script: | |
| ```bash | |
| # For Chain-of-Thought (CoT) dataset | |
| bash scripts/preprocess_data.sh dummy cot $TOKENIZER_MODEL | |
| # For In-Context Learning (ICL) dataset | |
| bash scripts/preprocess_data.sh dummy icl_cot $TOKENIZER_MODEL | |
| ``` | |
| > **Note**: For music structure analysis and track separation, refer to [openl2s](https://github.com/a43992899/openl2s). | |
| ## Step 2: Training Data Configuration | |
| ### Counting Dataset Tokens | |
| 1. Navigate to the finetune directory: | |
| ```bash | |
| cd finetune/ | |
| ``` | |
| 2. Run the token counting script: | |
| ```bash | |
| bash scripts/count_tokens.sh ./example/mmap/ | |
| ``` | |
| The results will be saved in `finetune/count_token_logs/`. This process may take several minutes for large datasets. | |
| ### Configuring Data Mixture | |
| 1. Create a configuration file (e.g., `finetune/example/dummy_data_mixture_cfg.yml`) with the following parameters: | |
| - `TOKEN_COUNT_LOG_DIR`: Directory containing token count logs | |
| - `GLOBAL_BATCH_SIZE`: Total batch size for training | |
| - `SEQ_LEN`: Maximum context window size | |
| - `{NUM}_ROUND`: Number of times to repeat each dataset | |
| 2. Generate training parameters: | |
| ```bash | |
| cd finetune/ | |
| python core/parse_mixture.py -c example/dummy_data_mixture_cfg.yml | |
| ``` | |
| The script will output: | |
| - `DATA_PATH`: Paths to your training data (copy to training script) | |
| - `TRAIN_ITERS`: Number of training iterations | |
| - Total token count | |
| ## Step 3: Model Finetuning | |
| YuE supports finetuning using LoRA (Low-Rank Adaptation), which significantly reduces memory requirements while maintaining performance. | |
| ### Configuring the Finetuning Script | |
| 1. Edit the `scripts/run_finetune.sh` script to configure your finetuning run: | |
| ```bash | |
| # Update data paths | |
| # Accepted formats for DATA_PATH: | |
| # 1) a single path: "/path/to/data" | |
| # 2) multiple datasets with weights: "100 /path/to/data1 200 /path/to/data2 ..." | |
| # You can copy DATA_PATH from the output of core/parse_mixture.py in Step 2 | |
| DATA_PATH="data1-weight /path/to/data1 data2-weight /path/to/data2" | |
| DATA_CACHE_PATH="/path/to/your/cache" | |
| # Set comma-separated list of proportions for train/val/test split | |
| DATA_SPLIT="900,50,50" | |
| # Set model paths | |
| TOKENIZER_MODEL_PATH="/path/to/tokenizer" | |
| MODEL_NAME="m-a-p/YuE-s1-7B-anneal-en-cot" # or your local model path | |
| MODEL_CACHE_DIR="/path/to/model/cache" | |
| OUTPUT_DIR="/path/to/save/finetuned/model" | |
| # Configure LoRA parameters (optional) | |
| LORA_R=64 # Rank of the LoRA update matrices | |
| LORA_ALPHA=32 # Scaling factor for the LoRA update | |
| LORA_DROPOUT=0.1 # Dropout probability for LoRA layers | |
| ``` | |
| 2. Adjust training hyperparameters as needed: | |
| ```bash | |
| # Training hyperparameters | |
| PER_DEVICE_TRAIN_BATCH_SIZE=1 | |
| NUM_TRAIN_EPOCHS=10 | |
| ``` | |
| ### Running the Finetuning Process | |
| ```bash | |
| cd finetune/ | |
| bash scripts/run_finetune.sh | |
| ``` | |
| For help with configuring the script: | |
| ```bash | |
| bash scripts/run_finetune.sh --help | |
| ``` | |
| ### Monitoring Training | |
| If you've enabled WandB logging (via `USE_WANDB=true`), you can monitor your training progress in real-time through the WandB dashboard. | |
| ### Using the Finetuned Model | |
| After training completes, your model will be saved to the specified `OUTPUT_DIR`. You can use this model for inference or further finetuning. |