OpenPangu-Embedded-1B — HiF8 W8A8 QAT Submission

IEEE ICME 2026 Low Bit-width Large Model Quantization Challenge

This repository contains the quantized model weights and source code for our HiF8 W8A8 Quantization-Aware Training (QAT) submission on OpenPangu-Embedded-1B.

Repository Structure

pangu_pretrain_submit/
├── model.safetensors                         # Quantized model weights (HiF8 W8A8, lr=1e-5)
├── config.json                               # Model configuration
├── tokenizer.*                               # Tokenizer files
│
└── code/                                     # Complete source code
    ├── REPRODUCE.md                          # Reproduction guide (this file summarized below)
    ├── pangu1b_hif8.yml                      # Training conda environment spec
    ├── pangu1b_eval.yml                      # Evaluation conda environment spec
    ├── Dockerfile                            # Container environment
    │
    ├── pangu_hif8_pretrain/
    │   ├── train.py                          # Main training script
    │   ├── hif8.py                           # HiF8 amax tracking & scale management
    │   ├── run_train_max_quant_lr1e5.sh      # ★ HiF8 QAT training (submission)
    │   └── run_train_bf16_lr1e5.sh           # BF16 baseline training
    │
    ├── HiFloat8/
    │   ├── hif8_cuda/                        # HiF8 CUDA quantization library
    │   └── hif8_npu/                         # HiF8 NPU quantization library
    │
    ├── evaluate_benchmarks/
    │   ├── run_eval.sh                       # Benchmark evaluation runner
    │   ├── compare_results.py               # Multi-model comparison table
    │   └── results/summary_*.txt            # Evaluation result summaries
    │
    └── data_visualize/
        ├── log_visualizer.py                 # Training curve visualization
        └── *.log                             # Training loss logs

Method

We apply HiF8 W8A8 Quantization-Aware Training to OpenPangu-Embedded-1B with the following key improvements over the baseline:

Parameter	Value	Rationale
amax algorithm	`max` (over history)	Prevents saturation by guaranteeing scale ≥ any historical peak
amax history length	64 steps	Wider window for stable scale estimation
BF16 warmup steps	500	Stabilizes pretrained weights before introducing quantization noise
HiF8 max value (fwd/bwd)	15	Maps peak values to highest-precision tier (3 mantissa bits)
Learning rate	1e-5	Reduces catastrophic forgetting of pretrained commonsense knowledge
Global batch size	1024
Training steps	10,000
High-precision layers	5	First 5 layers kept in BF16

Training Loss Comparison

The chart below compares training loss curves between the HiF8 QAT model and the BF16 baseline over 10,000 steps.

Conclusion: The HiF8 QAT training loss tracks the BF16 baseline extremely closely throughout training. The average absolute percentage error (APE) across all 10,000 steps is 0.11%, and 0.12% over the final 1,000 steps. This demonstrates that the max-algorithm amax scaling with a 64-step history window effectively eliminates quantization saturation, producing a training trajectory nearly identical to full-precision BF16.

Benchmark Results

Baseline: bf16_lr1e5 (BF16 training with lr=1e-5, same data/steps)

Task	BF16 (lr=1e-5)	HiF8 QAT (lr=1e-5)	Drop
MMLU (5-shot)	43.36%	43.17%	0.43% ✓
GSM8K (5-shot)	1.59%	1.29%	— (noise)
MATH500 (4-shot)	0.50%	0.46%	— (noise)
HellaSwag (10-shot)	41.10%	40.86%	0.58% ✓
ARC-Easy (25-shot)	51.56%	51.01%	1.06%
ARC-Challenge (25-shot)	36.77%	36.69%	0.22% ✓

GSM8K and MATH500 drops are statistically insignificant (absolute difference < 0.3%).

Reproduction Instructions

1. Environment Setup

# Training environment
conda env create -f code/pangu1b_hif8.yml
conda activate pangu1b_hif8

# Evaluation environment
conda env create -f code/pangu1b_eval.yml
conda activate pangu1b_eval
pip install lm-eval==0.4.11 ray==2.55.1 "antlr4-python3-runtime==4.11" sympy math_verify

2. Prepare Model & Data

# Place base model weights at:
code/pangu_hif8_pretrain/models/openPangu-Embedded-1B/

# Dataset is loaded automatically from HuggingFace (FineWeb sample-10BT)
# or set --cache_dir to a local path in the training script

3. Train BF16 Baseline

bash code/pangu_hif8_pretrain/run_train_bf16_lr1e5.sh
# Output: checkpoints/bf16_lr1e5/final/
# Runtime: ~21 hours on 8× NVIDIA H100 80GB

4. Train HiF8 QAT (this submission)

bash code/pangu_hif8_pretrain/run_train_max_quant_lr1e5.sh
# Output: checkpoints/max_quant_lr1e5/final/
# Runtime: ~21 hours on 8× NVIDIA H100 80GB

5. Evaluate

cd code/evaluate_benchmarks

# Evaluate a single model
bash run_eval.sh max_quant_lr1e5

# Generate full comparison table
python compare_results.py

Hardware

GPUs: 8× NVIDIA H100 80GB
Training time: ~21 hours per run
Framework: PyTorch + torchrun (DDP)

Downloads last month: 36

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support