--- license: apache-2.0 language: - zh - en pipeline_tag: text-generation library_name: transformers ---

GitHub Repo | Technical Report

👋 Join us on Discord and WeChat

## Overview BitCPM-CANN-1B-unquantized is the **unquantized QAT (Quantization-Aware Training) checkpoint** of BitCPM-CANN-1B, designed for **continued pre-training and fine-tuning**. It preserves full-precision latent weights with ternary fake quantizers (weights → {-1, 0, 1} with group-wise scaling, trained via STE) defined in `modeling.py`, enabling the model to keep learning under quantization constraints. For technical details, see our [Technical Report](https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf). > ⚠️ **This model is NOT for direct inference.** For inference, use the pseudo-quantized version: [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B). ## Continued Pre-training & Fine-tuning The **only requirement** is that the forward pass must go through the bundled `modeling.py` (which contains the ternary fake quantizer). Load with `trust_remote_code=True` and do NOT replace or bypass the model's forward logic. ### Option 1: DeepSpeed (Recommended) We provide ready-to-use training scripts in the [example](https://huggingface.co/openbmb/BitCPM-CANN-1B-unquantized/tree/main/example) directory (using the 1B model as an example): - **Continued pre-training**: `example/run.sh` + `example/train.py` - **SFT (Supervised Fine-tuning)**: `example/run_sft.sh` + `example/train_sft.py` Quick start: ```bash # Continued pre-training cd example && bash run.sh # Supervised fine-tuning cd example && bash run_sft.sh ``` ### Option 2: HuggingFace-compatible Frameworks Any framework that supports HuggingFace model loading with custom code can be used, such as **LLaMA Factory**, **HuggingFace Trainer**, etc. The key is to ensure `trust_remote_code=True`: ```python from transformers import AutoModelForCausalLM, AutoTokenizer path = 'openbmb/BitCPM-CANN-1B-unquantized' tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( path, torch_dtype=torch.bfloat16, trust_remote_code=True ) # Use with your preferred framework (LLaMA Factory, HF Trainer, etc.) # The ternary fake quantizer in modeling.py is applied automatically during forward pass. ``` ## Post-Training Conversion After training, use `qat-convert.py` to fuse the fake quantizer and produce inference-ready pseudo-quantized weights: ```bash python qat-convert.py \ --input_bin \ --output \ --quant_type ternary \ --group_size -1 ``` The converted model can be loaded for inference in the same way as [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B)—no special quantization libraries required. ## Workflow ``` ┌─────────────────────────────────┐ │ BitCPM-CANN-1B-unquantized │ ← This model (QAT checkpoint + fake quantizer in modeling.py) └───────────────┬─────────────────┘ │ ▼ Train (DeepSpeed / LLaMA Factory / HF Trainer / ...) ┌─────────────────────────────────┐ │ Fine-tuned checkpoint │ ← Still contains un-fused QAT parameters └───────────────┬─────────────────┘ │ ▼ python qat-convert.py --quant_type ternary --group_size -1 ┌─────────────────────────────────┐ │ Pseudo-quantized model │ ← Ready for inference (same format as BitCPM-CANN-1B) └─────────────────────────────────┘ ``` ## BitCPM-CANN Model Family | Model | HuggingFace (Inference) | HuggingFace (Fine-tuning) | |-------|-------------------------|---------------------------| | BitCPM-CANN-0.5B | [openbmb/BitCPM-CANN-0.5B](https://huggingface.co/openbmb/BitCPM-CANN-0.5B) | [openbmb/BitCPM-CANN-0.5B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-0.5B-unquantized) | | BitCPM-CANN-1B | [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B) | [openbmb/BitCPM-CANN-1B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-1B-unquantized) | | BitCPM-CANN-3B | [openbmb/BitCPM-CANN-3B](https://huggingface.co/openbmb/BitCPM-CANN-3B) | [openbmb/BitCPM-CANN-3B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-3B-unquantized) | | BitCPM-CANN-8B | [openbmb/BitCPM-CANN-8B](https://huggingface.co/openbmb/BitCPM-CANN-8B) | [openbmb/BitCPM-CANN-8B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-8B-unquantized) | ## Statement - As a language model, BitCPM-CANN generates content by learning from a vast amount of text. - However, it does not possess the ability to comprehend or express personal opinions or value judgments. - Any content generated by BitCPM-CANN does not represent the viewpoints or positions of the model developers. - Therefore, when using content generated by BitCPM-CANN, users should take full responsibility for evaluating and verifying it on their own. ## LICENSE - This repository and BitCPM-CANN models are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. ## Citation - Please cite our technical report if you find our work valuable. ```bibtex @article{bitcpmcann, title={{BitCPM-CANN}: Native 1.58-Bit Large Language Model Training on Ascend NPU}, author={BitCPM Team}, year={2026} } ```