File size: 6,336 Bytes
8a5049d
 
 
 
 
 
 
 
 
 
 
 
 
 
68beec3
8a5049d
 
 
 
 
cc733b5
8a5049d
b7d219e
8a5049d
b7d219e
8a5049d
cc733b5
8a5049d
cc733b5
8a5049d
cc733b5
8a5049d
b7d219e
e99b139
cc733b5
 
e99b139
cc733b5
e99b139
cc733b5
 
 
e99b139
cc733b5
 
 
8a5049d
cc733b5
31610ea
cc733b5
8a5049d
 
 
 
b7d219e
8a5049d
 
 
 
 
 
 
cc733b5
 
8a5049d
e99b139
cc733b5
e99b139
cc733b5
e99b139
 
8a5049d
 
 
 
 
e99b139
 
b7d219e
e99b139
cc733b5
e99b139
8a5049d
 
b7d219e
8a5049d
 
cc733b5
8a5049d
cc733b5
8a5049d
 
 
 
b7d219e
8a5049d
 
e99b139
b7d219e
e99b139
cc733b5
 
b7d219e
 
 
 
e99b139
8a5049d
b7d219e
8a5049d
b7d219e
 
e99b139
8a5049d
b7d219e
e99b139
8a5049d
 
31610ea
8a5049d
b7d219e
8a5049d
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
library_name: transformers
---
<div align="center">
<img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img> 
</div>

<p align="center">
<a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
<a href="https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf" target="_blank">Technical Report</a> 
</p>
<p align="center">
πŸ‘‹ Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
</p>

## Overview

BitCPM-CANN-1B-unquantized is the **unquantized QAT (Quantization-Aware Training) checkpoint** of BitCPM-CANN-1B, designed for **continued pre-training and fine-tuning**. It preserves full-precision latent weights with ternary fake quantizers (weights β†’ {-1, 0, 1} with group-wise scaling, trained via STE) defined in `modeling.py`, enabling the model to keep learning under quantization constraints. For technical details, see our [Technical Report](https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf).

> ⚠️ **This model is NOT for direct inference.** For inference, use the pseudo-quantized version: [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B).

## Continued Pre-training & Fine-tuning

The **only requirement** is that the forward pass must go through the bundled `modeling.py` (which contains the ternary fake quantizer). Load with `trust_remote_code=True` and do NOT replace or bypass the model's forward logic.

### Option 1: DeepSpeed (Recommended)

We provide ready-to-use training scripts in the [example](https://huggingface.co/openbmb/BitCPM-CANN-1B-unquantized/tree/main/example) directory (using the 1B model as an example):

- **Continued pre-training**: `example/run.sh` + `example/train.py`
- **SFT (Supervised Fine-tuning)**: `example/run_sft.sh` + `example/train_sft.py`

Quick start:

```bash
# Continued pre-training
cd example && bash run.sh

# Supervised fine-tuning
cd example && bash run_sft.sh
```

### Option 2: HuggingFace-compatible Frameworks

Any framework that supports HuggingFace model loading with custom code can be used, such as **LLaMA Factory**, **HuggingFace Trainer**, etc. The key is to ensure `trust_remote_code=True`:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

path = 'openbmb/BitCPM-CANN-1B-unquantized'
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Use with your preferred framework (LLaMA Factory, HF Trainer, etc.)
# The ternary fake quantizer in modeling.py is applied automatically during forward pass.
```

## Post-Training Conversion

After training, use `qat-convert.py` to fuse the fake quantizer and produce inference-ready pseudo-quantized weights:

```bash
python qat-convert.py \
    --input_bin <path-to-finetuned-pytorch.bin> \
    --output <path-to-output-pseudo-quantized-pytorch.bin> \
    --quant_type ternary \
    --group_size -1
```

The converted model can be loaded for inference in the same way as [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B)β€”no special quantization libraries required.

## Workflow

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  BitCPM-CANN-1B-unquantized  β”‚   ← This model (QAT checkpoint + fake quantizer in modeling.py)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό  Train (DeepSpeed / LLaMA Factory / HF Trainer / ...)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Fine-tuned checkpoint          β”‚   ← Still contains un-fused QAT parameters
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό  python qat-convert.py --quant_type ternary --group_size -1
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Pseudo-quantized model          β”‚   ← Ready for inference (same format as BitCPM-CANN-1B)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## BitCPM-CANN Model Family

| Model | HuggingFace (Inference) | HuggingFace (Fine-tuning) |
|-------|-------------------------|---------------------------|
| BitCPM-CANN-0.5B | [openbmb/BitCPM-CANN-0.5B](https://huggingface.co/openbmb/BitCPM-CANN-0.5B) | [openbmb/BitCPM-CANN-0.5B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-0.5B-unquantized) |
| BitCPM-CANN-1B | [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B) | [openbmb/BitCPM-CANN-1B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-1B-unquantized) |
| BitCPM-CANN-3B | [openbmb/BitCPM-CANN-3B](https://huggingface.co/openbmb/BitCPM-CANN-3B) | [openbmb/BitCPM-CANN-3B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-3B-unquantized) |
| BitCPM-CANN-8B | [openbmb/BitCPM-CANN-8B](https://huggingface.co/openbmb/BitCPM-CANN-8B) | [openbmb/BitCPM-CANN-8B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-8B-unquantized) |

## Statement
- As a language model, BitCPM-CANN generates content by learning from a vast amount of text. 
- However, it does not possess the ability to comprehend or express personal opinions or value judgments. 
- Any content generated by BitCPM-CANN does not represent the viewpoints or positions of the model developers. 
- Therefore, when using content generated by BitCPM-CANN, users should take full responsibility for evaluating and verifying it on their own.

## LICENSE
- This repository and BitCPM-CANN models are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. 

## Citation
- Please cite our technical report if you find our work valuable.

```bibtex
@article{bitcpmcann,
  title={{BitCPM-CANN}: Native 1.58-Bit Large Language Model Training on Ascend NPU},
  author={BitCPM Team},
  year={2026}
}
```