# Reproduction Guide — HiF8 QAT for OpenPangu-Embedded-1B IEEE ICME 2026 Low Bit-width Large Model Quantization Challenge submission. ## Environment Setup ### Training environment (pangu1b_hif8) ```bash conda env create -f pangu1b_hif8.yml conda activate pangu1b_hif8 ``` ### Evaluation environment (pangu1b_eval) ```bash conda env create -f pangu1b_eval.yml conda activate pangu1b_eval pip install lm-eval==0.4.11 ray==2.55.1 "antlr4-python3-runtime==4.11" sympy math_verify ``` ## Quantized Model The submitted quantized checkpoint corresponds to **max_quant_lr1e5** (HiF8 W8A8 QAT, lr=1e-5). Place the checkpoint at: ``` checkpoints/max_quant_lr1e5/final/ ``` ## Reproduce Training ### Step 1 — BF16 baseline (lr=1e-5) ```bash bash pangu_hif8_pretrain/run_train_bf16_lr1e5.sh ``` Output: `checkpoints/bf16_lr1e5/final/` ### Step 2 — HiF8 QAT (max_quant_lr1e5, lr=1e-5) ```bash bash pangu_hif8_pretrain/run_train_max_quant_lr1e5.sh ``` Output: `checkpoints/max_quant_lr1e5/final/` Key quantization settings: | Parameter | Value | |-----------|-------| | Quantization | W8A8 HiF8 | | amax algorithm | `max` (over history window) | | amax history length | 64 steps | | BF16 warmup steps | 500 | | HiF8 max value (fwd/bwd) | 15 | | Learning rate | 1e-5 | | Global batch size | 1024 | | Max steps | 10000 | | High-precision layers | 5 | Training takes approximately **21 hours** on 8× NVIDIA H100 80GB. ## Reproduce Evaluation ```bash cd evaluate_benchmarks # Evaluate a single model bash run_eval.sh max_quant_lr1e5 # Generate comparison table (baseline: bf16_lr1e5) python compare_results.py ``` Benchmarks: MMLU (5-shot), GSM8K (5-shot), MATH500/minerva_math (4-shot), HellaSwag (10-shot), ARC-Easy (25-shot), ARC-Challenge (25-shot). ## Key Results (max_quant_lr1e5 vs bf16_lr1e5 baseline) | Task | BF16 (lr=1e-5) | HiF8 QAT (lr=1e-5) | Drop | |------|---------------|---------------------|------| | MMLU (5-shot) | 43.36% | 43.17% | 0.43% | | GSM8K (5-shot) | 1.59% | 1.29% | — (noise) | | MATH500 (4-shot) | 0.50% | 0.46% | — (noise) | | HellaSwag (10-shot) | 41.10% | 40.86% | 0.58% | | ARC-Easy (25-shot) | 51.56% | 51.01% | 1.06% | | ARC-Challenge (25-shot) | 36.77% | 36.69% | 0.22% |