guanwenyu1995 commited on
Commit
92f9cb1
Β·
verified Β·
1 Parent(s): fcf820f

Update README naming from BitCPM4 to BitCPM

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -20,9 +20,9 @@ library_name: transformers
20
 
21
  ## Overview
22
 
23
- BitCPM4-CANN-3B-unquantized is the **unquantized QAT (Quantization-Aware Training) checkpoint** of BitCPM4-CANN-3B, designed for **continued pre-training and fine-tuning**. It preserves full-precision latent weights with ternary fake quantizers (weights β†’ {-1, 0, 1} with group-wise scaling, trained via STE) defined in `modeling.py`, enabling the model to keep learning under quantization constraints. For technical details, see our [Technical Report](https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf).
24
 
25
- > ⚠️ **This model is NOT for direct inference.** For inference, use the pseudo-quantized version: [openbmb/BitCPM4-CANN-3B](https://huggingface.co/openbmb/BitCPM4-CANN-3B).
26
 
27
  ## Continued Pre-training & Fine-tuning
28
 
@@ -30,7 +30,7 @@ The **only requirement** is that the forward pass must go through the bundled `m
30
 
31
  ### Option 1: DeepSpeed (Recommended)
32
 
33
- We provide ready-to-use training scripts in the [example](https://huggingface.co/openbmb/BitCPM4-CANN-3B-unquantized/tree/main/example) directory (using the 1B model as an example):
34
 
35
  - **Continued pre-training**: `example/run.sh` + `example/train.py`
36
  - **SFT (Supervised Fine-tuning)**: `example/run_sft.sh` + `example/train_sft.py`
@@ -52,7 +52,7 @@ Any framework that supports HuggingFace model loading with custom code can be us
52
  ```python
53
  from transformers import AutoModelForCausalLM, AutoTokenizer
54
 
55
- path = 'openbmb/BitCPM4-CANN-3B-unquantized'
56
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
57
  model = AutoModelForCausalLM.from_pretrained(
58
  path,
@@ -76,13 +76,13 @@ python qat-convert.py \
76
  --group_size -1
77
  ```
78
 
79
- The converted model can be loaded for inference in the same way as [openbmb/BitCPM4-CANN-3B](https://huggingface.co/openbmb/BitCPM4-CANN-3B)β€”no special quantization libraries required.
80
 
81
  ## Workflow
82
 
83
  ```
84
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
85
- β”‚ BitCPM4-CANN-3B-unquantized β”‚ ← This model (QAT checkpoint + fake quantizer in modeling.py)
86
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
87
  β”‚
88
  β–Ό Train (DeepSpeed / LLaMA Factory / HF Trainer / ...)
@@ -92,33 +92,33 @@ The converted model can be loaded for inference in the same way as [openbmb/BitC
92
  β”‚
93
  β–Ό python qat-convert.py --quant_type ternary --group_size -1
94
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
95
- β”‚ Pseudo-quantized model β”‚ ← Ready for inference (same format as BitCPM4-CANN-3B)
96
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
97
  ```
98
 
99
- ## BitCPM4-CANN Model Family
100
 
101
  | Model | HuggingFace (Inference) | HuggingFace (Fine-tuning) |
102
  |-------|-------------------------|---------------------------|
103
- | BitCPM4-CANN-0.5B | [openbmb/BitCPM4-CANN-0.5B](https://huggingface.co/openbmb/BitCPM4-CANN-0.5B) | [openbmb/BitCPM4-CANN-0.5B-unquantized](https://huggingface.co/openbmb/BitCPM4-CANN-0.5B-unquantized) |
104
- | BitCPM4-CANN-1B | [openbmb/BitCPM4-CANN-1B](https://huggingface.co/openbmb/BitCPM4-CANN-1B) | [openbmb/BitCPM4-CANN-1B-unquantized](https://huggingface.co/openbmb/BitCPM4-CANN-1B-unquantized) |
105
- | BitCPM4-CANN-3B | [openbmb/BitCPM4-CANN-3B](https://huggingface.co/openbmb/BitCPM4-CANN-3B) | [openbmb/BitCPM4-CANN-3B-unquantized](https://huggingface.co/openbmb/BitCPM4-CANN-3B-unquantized) |
106
- | BitCPM4-CANN-8B | [openbmb/BitCPM4-CANN-8B](https://huggingface.co/openbmb/BitCPM4-CANN-8B) | [openbmb/BitCPM4-CANN-8B-unquantized](https://huggingface.co/openbmb/BitCPM4-CANN-8B-unquantized) |
107
 
108
  ## Statement
109
- - As a language model, BitCPM4-CANN generates content by learning from a vast amount of text.
110
  - However, it does not possess the ability to comprehend or express personal opinions or value judgments.
111
- - Any content generated by BitCPM4-CANN does not represent the viewpoints or positions of the model developers.
112
- - Therefore, when using content generated by BitCPM4-CANN, users should take full responsibility for evaluating and verifying it on their own.
113
 
114
  ## LICENSE
115
- - This repository and BitCPM4-CANN models are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
116
 
117
  ## Citation
118
  - Please cite our technical report if you find our work valuable.
119
 
120
  ```bibtex
121
- @article{bitcpm4cann,
122
  title={{BitCPM-CANN}: Native 1.58-Bit Large Language Model Training on Ascend NPU},
123
  author={BitCPM Team},
124
  year={2026}
 
20
 
21
  ## Overview
22
 
23
+ BitCPM-CANN-3B-unquantized is the **unquantized QAT (Quantization-Aware Training) checkpoint** of BitCPM-CANN-3B, designed for **continued pre-training and fine-tuning**. It preserves full-precision latent weights with ternary fake quantizers (weights β†’ {-1, 0, 1} with group-wise scaling, trained via STE) defined in `modeling.py`, enabling the model to keep learning under quantization constraints. For technical details, see our [Technical Report](https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf).
24
 
25
+ > ⚠️ **This model is NOT for direct inference.** For inference, use the pseudo-quantized version: [openbmb/BitCPM-CANN-3B](https://huggingface.co/openbmb/BitCPM-CANN-3B).
26
 
27
  ## Continued Pre-training & Fine-tuning
28
 
 
30
 
31
  ### Option 1: DeepSpeed (Recommended)
32
 
33
+ We provide ready-to-use training scripts in the [example](https://huggingface.co/openbmb/BitCPM-CANN-3B-unquantized/tree/main/example) directory (using the 1B model as an example):
34
 
35
  - **Continued pre-training**: `example/run.sh` + `example/train.py`
36
  - **SFT (Supervised Fine-tuning)**: `example/run_sft.sh` + `example/train_sft.py`
 
52
  ```python
53
  from transformers import AutoModelForCausalLM, AutoTokenizer
54
 
55
+ path = 'openbmb/BitCPM-CANN-3B-unquantized'
56
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
57
  model = AutoModelForCausalLM.from_pretrained(
58
  path,
 
76
  --group_size -1
77
  ```
78
 
79
+ The converted model can be loaded for inference in the same way as [openbmb/BitCPM-CANN-3B](https://huggingface.co/openbmb/BitCPM-CANN-3B)β€”no special quantization libraries required.
80
 
81
  ## Workflow
82
 
83
  ```
84
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
85
+ β”‚ BitCPM-CANN-3B-unquantized β”‚ ← This model (QAT checkpoint + fake quantizer in modeling.py)
86
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
87
  β”‚
88
  β–Ό Train (DeepSpeed / LLaMA Factory / HF Trainer / ...)
 
92
  β”‚
93
  β–Ό python qat-convert.py --quant_type ternary --group_size -1
94
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
95
+ β”‚ Pseudo-quantized model β”‚ ← Ready for inference (same format as BitCPM-CANN-3B)
96
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
97
  ```
98
 
99
+ ## BitCPM-CANN Model Family
100
 
101
  | Model | HuggingFace (Inference) | HuggingFace (Fine-tuning) |
102
  |-------|-------------------------|---------------------------|
103
+ | BitCPM-CANN-0.5B | [openbmb/BitCPM-CANN-0.5B](https://huggingface.co/openbmb/BitCPM-CANN-0.5B) | [openbmb/BitCPM-CANN-0.5B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-0.5B-unquantized) |
104
+ | BitCPM-CANN-1B | [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B) | [openbmb/BitCPM-CANN-1B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-1B-unquantized) |
105
+ | BitCPM-CANN-3B | [openbmb/BitCPM-CANN-3B](https://huggingface.co/openbmb/BitCPM-CANN-3B) | [openbmb/BitCPM-CANN-3B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-3B-unquantized) |
106
+ | BitCPM-CANN-8B | [openbmb/BitCPM-CANN-8B](https://huggingface.co/openbmb/BitCPM-CANN-8B) | [openbmb/BitCPM-CANN-8B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-8B-unquantized) |
107
 
108
  ## Statement
109
+ - As a language model, BitCPM-CANN generates content by learning from a vast amount of text.
110
  - However, it does not possess the ability to comprehend or express personal opinions or value judgments.
111
+ - Any content generated by BitCPM-CANN does not represent the viewpoints or positions of the model developers.
112
+ - Therefore, when using content generated by BitCPM-CANN, users should take full responsibility for evaluating and verifying it on their own.
113
 
114
  ## LICENSE
115
+ - This repository and BitCPM-CANN models are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
116
 
117
  ## Citation
118
  - Please cite our technical report if you find our work valuable.
119
 
120
  ```bibtex
121
+ @article{bitcpmcann,
122
  title={{BitCPM-CANN}: Native 1.58-Bit Large Language Model Training on Ascend NPU},
123
  author={BitCPM Team},
124
  year={2026}