muse / train /README.md
Jacong's picture
Upload 96 files
aa9be1e verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

Training Framework

The training code uses ms-swift, a scalable lightweight infrastructure for fine-tuning large language models.

Model Configuration

MODEL_PATH Parameter

The MODEL_PATH in train.sh should point to the base model. Download the model from HuggingFace:

# Download the model using huggingface_hub
huggingface-cli download bolshyC/qwen3-0.6B-music --local-dir ./qwen3-0.6B-music

Then modify MODEL_PATH in train.sh to point to the local path:

MODEL_PATH="./qwen3-0.6B-music"  # or absolute path

Dataset Configuration

--dataset Parameter

Note: The current script train.sh uses train_demo.jsonl (for demonstration purposes). For actual training, you need to use the full dataset.

Actual Training Data

For actual training, please use the following two files from the HuggingFace dataset:

  • train_cn.jsonl - Chinese training data
  • train_en.jsonl - English training data

Usage

  1. Download the dataset from HuggingFace:
# Using huggingface_hub to download
huggingface-cli download bolshyC/Muse_train train_cn.jsonl --local-dir ./data
huggingface-cli download bolshyC/Muse_train train_en.jsonl --local-dir ./data
  1. Modify the --dataset parameter in train.sh:
# If using Chinese data only
--dataset 'data/train_cn.jsonl'

# If using both Chinese and English data (comma-separated, no spaces)
--dataset 'data/train_cn.jsonl,data/train_en.jsonl'

Note: In ms-swift, multiple dataset files should be comma-separated without spaces.

Building Custom Training Data

If you want to build your own training dataset, you need to encode audio files into discrete tokens using MuCodec.

Audio Encoding

Use train/encode_audio.py to encode audio files into discrete tokens:

  1. Prepare input data file: Create a JSONL file where each line contains a dictionary with an audio file path:

    {"path": "path/to/audio1.wav"}
    {"path": "path/to/audio2.mp3"}
    
  2. Modify paths in encode_audio.py:

    • Set DATA_PATH to your input JSONL file path
    • Set SAVE_DIR to the directory where encoded tokens will be saved
  3. Run encoding:

    python train/encode_audio.py
    

The script will:

  • Load audio files from the paths specified in the JSONL file
  • Encode each audio file into discrete tokens using MuCodec
  • Save the encoded tokens as .pt files in the SAVE_DIR directory
  • Skip files that have already been encoded

Note: The audio files should be in WAV or MP3 format and will be automatically resampled to 48kHz if needed.

Training Performance

Training Time

On 8Γ— H200 GPUs, training one epoch takes approximately 150 minutes.