yycheng0122
/

pangu_pretrain_submit

Model card Files Files and versions

pangu_pretrain_submit / code /REPRODUCE.md

yycheng0122's picture

Upload folder using huggingface_hub

0d00bbe verified 17 days ago

|

history blame contribute delete

2.22 kB

	# Reproduction Guide — HiF8 QAT for OpenPangu-Embedded-1B

	IEEE ICME 2026 Low Bit-width Large Model Quantization Challenge submission.

	## Environment Setup

	### Training environment (pangu1b_hif8)
	```bash
	conda env create -f pangu1b_hif8.yml
	conda activate pangu1b_hif8
	```

	### Evaluation environment (pangu1b_eval)
	```bash
	conda env create -f pangu1b_eval.yml
	conda activate pangu1b_eval
	pip install lm-eval==0.4.11 ray==2.55.1 "antlr4-python3-runtime==4.11" sympy math_verify
	```

	## Quantized Model

	The submitted quantized checkpoint corresponds to max_quant_lr1e5 (HiF8 W8A8 QAT, lr=1e-5).

	Place the checkpoint at:
	```
	checkpoints/max_quant_lr1e5/final/
	```

	## Reproduce Training

	### Step 1 — BF16 baseline (lr=1e-5)
	```bash
	bash pangu_hif8_pretrain/run_train_bf16_lr1e5.sh
	```
	Output: `checkpoints/bf16_lr1e5/final/`

	### Step 2 — HiF8 QAT (max_quant_lr1e5, lr=1e-5)
	```bash
	bash pangu_hif8_pretrain/run_train_max_quant_lr1e5.sh
	```
	Output: `checkpoints/max_quant_lr1e5/final/`

	Key quantization settings:
	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Quantization \| W8A8 HiF8 \|
	\| amax algorithm \| `max` (over history window) \|
	\| amax history length \| 64 steps \|
	\| BF16 warmup steps \| 500 \|
	\| HiF8 max value (fwd/bwd) \| 15 \|
	\| Learning rate \| 1e-5 \|
	\| Global batch size \| 1024 \|
	\| Max steps \| 10000 \|
	\| High-precision layers \| 5 \|

	Training takes approximately 21 hours on 8× NVIDIA H100 80GB.

	## Reproduce Evaluation

	```bash
	cd evaluate_benchmarks

	# Evaluate a single model
	bash run_eval.sh max_quant_lr1e5

	# Generate comparison table (baseline: bf16_lr1e5)
	python compare_results.py
	```

	Benchmarks: MMLU (5-shot), GSM8K (5-shot), MATH500/minerva_math (4-shot),
	HellaSwag (10-shot), ARC-Easy (25-shot), ARC-Challenge (25-shot).

	## Key Results (max_quant_lr1e5 vs bf16_lr1e5 baseline)

	\| Task \| BF16 (lr=1e-5) \| HiF8 QAT (lr=1e-5) \| Drop \|
	\|------\|---------------\|---------------------\|------\|
	\| MMLU (5-shot) \| 43.36% \| 43.17% \| 0.43% \|
	\| GSM8K (5-shot) \| 1.59% \| 1.29% \| — (noise) \|
	\| MATH500 (4-shot) \| 0.50% \| 0.46% \| — (noise) \|
	\| HellaSwag (10-shot) \| 41.10% \| 40.86% \| 0.58% \|
	\| ARC-Easy (25-shot) \| 51.56% \| 51.01% \| 1.06% \|
	\| ARC-Challenge (25-shot) \| 36.77% \| 36.69% \| 0.22% \|