YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
OpenPangu-Embedded-1B — HiF8 W8A8 QAT Submission
IEEE ICME 2026 Low Bit-width Large Model Quantization Challenge
This repository contains the quantized model weights and source code for our HiF8 W8A8 Quantization-Aware Training (QAT) submission on OpenPangu-Embedded-1B.
Repository Structure
pangu_pretrain_submit/
├── model.safetensors # Quantized model weights (HiF8 W8A8, lr=1e-5)
├── config.json # Model configuration
├── tokenizer.* # Tokenizer files
│
└── code/ # Complete source code
├── REPRODUCE.md # Reproduction guide (this file summarized below)
├── pangu1b_hif8.yml # Training conda environment spec
├── pangu1b_eval.yml # Evaluation conda environment spec
├── Dockerfile # Container environment
│
├── pangu_hif8_pretrain/
│ ├── train.py # Main training script
│ ├── hif8.py # HiF8 amax tracking & scale management
│ ├── run_train_max_quant_lr1e5.sh # ★ HiF8 QAT training (submission)
│ └── run_train_bf16_lr1e5.sh # BF16 baseline training
│
├── HiFloat8/
│ ├── hif8_cuda/ # HiF8 CUDA quantization library
│ └── hif8_npu/ # HiF8 NPU quantization library
│
├── evaluate_benchmarks/
│ ├── run_eval.sh # Benchmark evaluation runner
│ ├── compare_results.py # Multi-model comparison table
│ └── results/summary_*.txt # Evaluation result summaries
│
└── data_visualize/
├── log_visualizer.py # Training curve visualization
└── *.log # Training loss logs
Method
We apply HiF8 W8A8 Quantization-Aware Training to OpenPangu-Embedded-1B with the following key improvements over the baseline:
| Parameter | Value | Rationale |
|---|---|---|
| amax algorithm | max (over history) |
Prevents saturation by guaranteeing scale ≥ any historical peak |
| amax history length | 64 steps | Wider window for stable scale estimation |
| BF16 warmup steps | 500 | Stabilizes pretrained weights before introducing quantization noise |
| HiF8 max value (fwd/bwd) | 15 | Maps peak values to highest-precision tier (3 mantissa bits) |
| Learning rate | 1e-5 | Reduces catastrophic forgetting of pretrained commonsense knowledge |
| Global batch size | 1024 | |
| Training steps | 10,000 | |
| High-precision layers | 5 | First 5 layers kept in BF16 |
Training Loss Comparison
The chart below compares training loss curves between the HiF8 QAT model and the BF16 baseline over 10,000 steps.
Conclusion: The HiF8 QAT training loss tracks the BF16 baseline extremely closely throughout training. The average absolute percentage error (APE) across all 10,000 steps is 0.11%, and 0.12% over the final 1,000 steps. This demonstrates that the max-algorithm amax scaling with a 64-step history window effectively eliminates quantization saturation, producing a training trajectory nearly identical to full-precision BF16.
Benchmark Results
Baseline: bf16_lr1e5 (BF16 training with lr=1e-5, same data/steps)
| Task | BF16 (lr=1e-5) | HiF8 QAT (lr=1e-5) | Drop |
|---|---|---|---|
| MMLU (5-shot) | 43.36% | 43.17% | 0.43% ✓ |
| GSM8K (5-shot) | 1.59% | 1.29% | — (noise) |
| MATH500 (4-shot) | 0.50% | 0.46% | — (noise) |
| HellaSwag (10-shot) | 41.10% | 40.86% | 0.58% ✓ |
| ARC-Easy (25-shot) | 51.56% | 51.01% | 1.06% |
| ARC-Challenge (25-shot) | 36.77% | 36.69% | 0.22% ✓ |
GSM8K and MATH500 drops are statistically insignificant (absolute difference < 0.3%).
Reproduction Instructions
1. Environment Setup
# Training environment
conda env create -f code/pangu1b_hif8.yml
conda activate pangu1b_hif8
# Evaluation environment
conda env create -f code/pangu1b_eval.yml
conda activate pangu1b_eval
pip install lm-eval==0.4.11 ray==2.55.1 "antlr4-python3-runtime==4.11" sympy math_verify
2. Prepare Model & Data
# Place base model weights at:
code/pangu_hif8_pretrain/models/openPangu-Embedded-1B/
# Dataset is loaded automatically from HuggingFace (FineWeb sample-10BT)
# or set --cache_dir to a local path in the training script
3. Train BF16 Baseline
bash code/pangu_hif8_pretrain/run_train_bf16_lr1e5.sh
# Output: checkpoints/bf16_lr1e5/final/
# Runtime: ~21 hours on 8× NVIDIA H100 80GB
4. Train HiF8 QAT (this submission)
bash code/pangu_hif8_pretrain/run_train_max_quant_lr1e5.sh
# Output: checkpoints/max_quant_lr1e5/final/
# Runtime: ~21 hours on 8× NVIDIA H100 80GB
5. Evaluate
cd code/evaluate_benchmarks
# Evaluate a single model
bash run_eval.sh max_quant_lr1e5
# Generate full comparison table
python compare_results.py
Hardware
- GPUs: 8× NVIDIA H100 80GB
- Training time: ~21 hours per run
- Framework: PyTorch + torchrun (DDP)
- Downloads last month
- 36
