## Training Framework

The training code uses [ms-swift](https://github.com/modelscope/ms-swift), a scalable lightweight infrastructure for fine-tuning large language models.

## Model Configuration

### `MODEL_PATH` Parameter

The `MODEL_PATH` in `train.sh` should point to the base model. Download the model from [HuggingFace](https://huggingface.co/datasets/bolshyC/qwen3-0.6B-music):

```bash
# Download the model using huggingface_hub
huggingface-cli download bolshyC/qwen3-0.6B-music --local-dir ./qwen3-0.6B-music
```

Then modify `MODEL_PATH` in `train.sh` to point to the local path:
```bash
MODEL_PATH="./qwen3-0.6B-music"  # or absolute path
```

## Dataset Configuration

### `--dataset` Parameter

**Note:** The current script `train.sh` uses `train_demo.jsonl` (for demonstration purposes). For actual training, you need to use the full dataset.

### Actual Training Data

For actual training, please use the following two files from the [HuggingFace dataset](https://huggingface.co/datasets/bolshyC/Muse_train):

- **`train_cn.jsonl`** - Chinese training data
- **`train_en.jsonl`** - English training data

### Usage

1. Download the dataset from HuggingFace:
```bash
# Using huggingface_hub to download
huggingface-cli download bolshyC/Muse_train train_cn.jsonl --local-dir ./data
huggingface-cli download bolshyC/Muse_train train_en.jsonl --local-dir ./data
```

2. Modify the `--dataset` parameter in `train.sh`:
```bash
# If using Chinese data only
--dataset 'data/train_cn.jsonl'

# If using both Chinese and English data (comma-separated, no spaces)
--dataset 'data/train_cn.jsonl,data/train_en.jsonl'
```

**Note:** In ms-swift, multiple dataset files should be comma-separated without spaces.

## Building Custom Training Data

If you want to build your own training dataset, you need to encode audio files into discrete tokens using MuCodec.

### Audio Encoding

Use `train/encode_audio.py` to encode audio files into discrete tokens:

1. **Prepare input data file**: Create a JSONL file where each line contains a dictionary with an audio file path:
   ```json
   {"path": "path/to/audio1.wav"}
   {"path": "path/to/audio2.mp3"}
   ```

2. **Modify paths in `encode_audio.py`**:
   - Set `DATA_PATH` to your input JSONL file path
   - Set `SAVE_DIR` to the directory where encoded tokens will be saved

3. **Run encoding**:
   ```bash
   python train/encode_audio.py
   ```

The script will:
- Load audio files from the paths specified in the JSONL file
- Encode each audio file into discrete tokens using MuCodec
- Save the encoded tokens as `.pt` files in the `SAVE_DIR` directory
- Skip files that have already been encoded

**Note:** The audio files should be in WAV or MP3 format and will be automatically resampled to 48kHz if needed.

## Training Performance

### Training Time

On 8× H200 GPUs, training one epoch takes approximately **150 minutes**.