A newer version of the Gradio SDK is available:
6.6.0
Training Framework
The training code uses ms-swift, a scalable lightweight infrastructure for fine-tuning large language models.
Model Configuration
MODEL_PATH Parameter
The MODEL_PATH in train.sh should point to the base model. Download the model from HuggingFace:
# Download the model using huggingface_hub
huggingface-cli download bolshyC/qwen3-0.6B-music --local-dir ./qwen3-0.6B-music
Then modify MODEL_PATH in train.sh to point to the local path:
MODEL_PATH="./qwen3-0.6B-music" # or absolute path
Dataset Configuration
--dataset Parameter
Note: The current script train.sh uses train_demo.jsonl (for demonstration purposes). For actual training, you need to use the full dataset.
Actual Training Data
For actual training, please use the following two files from the HuggingFace dataset:
train_cn.jsonl- Chinese training datatrain_en.jsonl- English training data
Usage
- Download the dataset from HuggingFace:
# Using huggingface_hub to download
huggingface-cli download bolshyC/Muse_train train_cn.jsonl --local-dir ./data
huggingface-cli download bolshyC/Muse_train train_en.jsonl --local-dir ./data
- Modify the
--datasetparameter intrain.sh:
# If using Chinese data only
--dataset 'data/train_cn.jsonl'
# If using both Chinese and English data (comma-separated, no spaces)
--dataset 'data/train_cn.jsonl,data/train_en.jsonl'
Note: In ms-swift, multiple dataset files should be comma-separated without spaces.
Building Custom Training Data
If you want to build your own training dataset, you need to encode audio files into discrete tokens using MuCodec.
Audio Encoding
Use train/encode_audio.py to encode audio files into discrete tokens:
Prepare input data file: Create a JSONL file where each line contains a dictionary with an audio file path:
{"path": "path/to/audio1.wav"} {"path": "path/to/audio2.mp3"}Modify paths in
encode_audio.py:- Set
DATA_PATHto your input JSONL file path - Set
SAVE_DIRto the directory where encoded tokens will be saved
- Set
Run encoding:
python train/encode_audio.py
The script will:
- Load audio files from the paths specified in the JSONL file
- Encode each audio file into discrete tokens using MuCodec
- Save the encoded tokens as
.ptfiles in theSAVE_DIRdirectory - Skip files that have already been encoded
Note: The audio files should be in WAV or MP3 format and will be automatically resampled to 48kHz if needed.
Training Performance
Training Time
On 8Γ H200 GPUs, training one epoch takes approximately 150 minutes.