guanwenyu1995 commited on
Commit
be98f84
Β·
verified Β·
1 Parent(s): 596de3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -20,9 +20,9 @@ library_name: transformers
20
 
21
  ## Overview
22
 
23
- BitCPM4-CANN-1B-unquantized is the **unquantized QAT (Quantization-Aware Training) checkpoint** of BitCPM4-CANN-1B, designed for **continued pre-training and fine-tuning**. It preserves full-precision latent weights with ternary fake quantizers (weights β†’ {-1, 0, 1} with group-wise scaling, trained via STE) defined in `modeling.py`, enabling the model to keep learning under quantization constraints. For technical details, see our [Technical Report](https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf).
24
 
25
- > ⚠️ **This model is NOT for direct inference.** For inference, use the pseudo-quantized version: [openbmb/BitCPM4-CANN-1B](https://huggingface.co/openbmb/BitCPM4-CANN-1B).
26
 
27
  ## Continued Pre-training & Fine-tuning
28
 
@@ -30,7 +30,7 @@ The **only requirement** is that the forward pass must go through the bundled `m
30
 
31
  ### Option 1: DeepSpeed (Recommended)
32
 
33
- We provide ready-to-use training scripts in the [example](https://huggingface.co/openbmb/BitCPM4-CANN-1B-unquantized/tree/main/example) directory (using the 1B model as an example):
34
 
35
  - **Continued pre-training**: `example/run.sh` + `example/train.py`
36
  - **SFT (Supervised Fine-tuning)**: `example/run_sft.sh` + `example/train_sft.py`
@@ -52,7 +52,7 @@ Any framework that supports HuggingFace model loading with custom code can be us
52
  ```python
53
  from transformers import AutoModelForCausalLM, AutoTokenizer
54
 
55
- path = 'openbmb/BitCPM4-CANN-1B-unquantized'
56
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
57
  model = AutoModelForCausalLM.from_pretrained(
58
  path,
@@ -76,13 +76,13 @@ python qat-convert.py \
76
  --group_size -1
77
  ```
78
 
79
- The converted model can be loaded for inference in the same way as [openbmb/BitCPM4-CANN-1B](https://huggingface.co/openbmb/BitCPM4-CANN-1B)β€”no special quantization libraries required.
80
 
81
  ## Workflow
82
 
83
  ```
84
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
85
- β”‚ BitCPM4-CANN-1B-unquantized β”‚ ← This model (QAT checkpoint + fake quantizer in modeling.py)
86
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
87
  β”‚
88
  β–Ό Train (DeepSpeed / LLaMA Factory / HF Trainer / ...)
@@ -92,7 +92,7 @@ The converted model can be loaded for inference in the same way as [openbmb/BitC
92
  β”‚
93
  β–Ό python qat-convert.py --quant_type ternary --group_size -1
94
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
95
- β”‚ Pseudo-quantized model β”‚ ← Ready for inference (same format as BitCPM4-CANN-1B)
96
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
97
  ```
98
 
 
20
 
21
  ## Overview
22
 
23
+ BitCPM4-CANN-8B-unquantized is the **unquantized QAT (Quantization-Aware Training) checkpoint** of BitCPM4-CANN-8B, designed for **continued pre-training and fine-tuning**. It preserves full-precision latent weights with ternary fake quantizers (weights β†’ {-1, 0, 1} with group-wise scaling, trained via STE) defined in `modeling.py`, enabling the model to keep learning under quantization constraints. For technical details, see our [Technical Report](https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf).
24
 
25
+ > ⚠️ **This model is NOT for direct inference.** For inference, use the pseudo-quantized version: [openbmb/BitCPM4-CANN-8B](https://huggingface.co/openbmb/BitCPM4-CANN-8B).
26
 
27
  ## Continued Pre-training & Fine-tuning
28
 
 
30
 
31
  ### Option 1: DeepSpeed (Recommended)
32
 
33
+ We provide ready-to-use training scripts in the [example](https://huggingface.co/openbmb/BitCPM4-CANN-8B-unquantized/tree/main/example) directory (using the 1B model as an example):
34
 
35
  - **Continued pre-training**: `example/run.sh` + `example/train.py`
36
  - **SFT (Supervised Fine-tuning)**: `example/run_sft.sh` + `example/train_sft.py`
 
52
  ```python
53
  from transformers import AutoModelForCausalLM, AutoTokenizer
54
 
55
+ path = 'openbmb/BitCPM4-CANN-8B-unquantized'
56
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
57
  model = AutoModelForCausalLM.from_pretrained(
58
  path,
 
76
  --group_size -1
77
  ```
78
 
79
+ The converted model can be loaded for inference in the same way as [openbmb/BitCPM4-CANN-8B](https://huggingface.co/openbmb/BitCPM4-CANN-8B)β€”no special quantization libraries required.
80
 
81
  ## Workflow
82
 
83
  ```
84
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
85
+ β”‚ BitCPM4-CANN-8B-unquantized β”‚ ← This model (QAT checkpoint + fake quantizer in modeling.py)
86
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
87
  β”‚
88
  β–Ό Train (DeepSpeed / LLaMA Factory / HF Trainer / ...)
 
92
  β”‚
93
  β–Ό python qat-convert.py --quant_type ternary --group_size -1
94
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
95
+ β”‚ Pseudo-quantized model β”‚ ← Ready for inference (same format as BitCPM4-CANN-8B)
96
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
97
  ```
98