# BitCPM Training Example

This project provides scripts for continue pretraining (CPT) and supervised fine-tuning (SFT) of **BitCPM-CANN-1B-unquantized**.

## File Description

CPT and SFT each have a pair of scripts (training script + launch script) and share DeepSpeed configuration files:

| File | Description |
| --- | --- |
| `run.sh` | Launch script for CPT with hyperparameter configuration |
| `run_sft.sh` | Launch script for SFT with hyperparameter configuration |
| `train.py` | Continue pretrain script based on HuggingFace Trainer + DeepSpeed |
| `train_sft.py` | Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed |
| `ds_config.json` | DeepSpeed ZeRO-3 configuration (with CPU offload) |
| `ds_config_z2.json` | DeepSpeed ZeRO-2 configuration (used by default) |
| `requirements.txt` | Python dependency list |

## Environment Setup

### Docker Image

Use the following Huawei NPU image on 910C:

```
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspeed-llm-2.3.0-a3-arm
```

Other Huawei NPU images may also work but have not been fully tested.

For GPU environments, there are no special image requirements — just install `requirements.txt` directly.

### Install Dependencies

After entering the container, install the Python dependencies:

```bash
pip install -r requirements.txt
```

## Continue Pretrain (CPT)

### Dataset

The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.

### Usage

Modify the path configuration in `run.sh`:

```bash
MODEL_PATH="/path/to/BitCPM-CANN-1B-unquantized/"
DATA_PATH="/path/to/c4-pro/data/your_file.parquet"
```

Then start training:

```bash
bash run.sh
```

## Supervised Fine-Tuning (SFT)

### Dataset

The test dataset used is [UltraChat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), stored in parquet format after downloading.

### Usage

Modify the path configuration in `run_sft.sh`:

```bash
MODEL_PATH="/path/to/BitCPM-CANN-1B-unquantized/"
DATA_PATH="/path/to/ultrachat_200k/data/your_file.parquet"
```

Then start training:

```bash
bash run_sft.sh
```

## Training Results Reference

> **Note:** BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when training on open-source datasets.

Below are the loss curves from smoke tests on GPU and NPU for both CPT and SFT tasks. The results are highly consistent across GPU and NPU, indicating that users can continue pre-training or fine-tuning on various compute devices:

|  | GPU | NPU |
| --- | --- | --- |
| **CPT** | ![GPU Pretrain Loss](gpu_pretrain_loss.png) | ![NPU Pretrain Loss](npu_pretrain_loss.png) |
| **SFT** | ![GPU SFT Loss](gpu_sft_loss.png) | ![NPU SFT Loss](npu_sft_loss.png) |

Training log CSV files (corresponding to the loss curves above):

| CSV File | Corresponding Loss Curve |
| --- | --- |
| [gpu_pretrain.csv](gpu_pretrain.csv) | GPU CPT |
| [npu_pretrain.csv](npu_pretrain.csv) | NPU CPT |
| [gpu_sft.csv](gpu_sft.csv) | GPU SFT |
| [npu_sft.csv](npu_sft.csv) | NPU SFT |

---

These scripts provide a convenient, ready-to-use toolkit for QAT-aware continued pre-training and fine-tuning of BitCPM-CANN models, so you can quickly adapt the model to your own data and tasks while preserving ternary quantization constraints.