YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OpenPangu-Embedded-1B — HiF8 W8A8 QAT Submission

IEEE ICME 2026 Low Bit-width Large Model Quantization Challenge

This repository contains the quantized model weights and source code for our HiF8 W8A8 Quantization-Aware Training (QAT) submission on OpenPangu-Embedded-1B.


Repository Structure

pangu_pretrain_submit/
├── model.safetensors                         # Quantized model weights (HiF8 W8A8, lr=1e-5)
├── config.json                               # Model configuration
├── tokenizer.*                               # Tokenizer files
│
└── code/                                     # Complete source code
    ├── REPRODUCE.md                          # Reproduction guide (this file summarized below)
    ├── pangu1b_hif8.yml                      # Training conda environment spec
    ├── pangu1b_eval.yml                      # Evaluation conda environment spec
    ├── Dockerfile                            # Container environment
    │
    ├── pangu_hif8_pretrain/
    │   ├── train.py                          # Main training script
    │   ├── hif8.py                           # HiF8 amax tracking & scale management
    │   ├── run_train_max_quant_lr1e5.sh      # ★ HiF8 QAT training (submission)
    │   └── run_train_bf16_lr1e5.sh           # BF16 baseline training
    │
    ├── HiFloat8/
    │   ├── hif8_cuda/                        # HiF8 CUDA quantization library
    │   └── hif8_npu/                         # HiF8 NPU quantization library
    │
    ├── evaluate_benchmarks/
    │   ├── run_eval.sh                       # Benchmark evaluation runner
    │   ├── compare_results.py               # Multi-model comparison table
    │   └── results/summary_*.txt            # Evaluation result summaries
    │
    └── data_visualize/
        ├── log_visualizer.py                 # Training curve visualization
        └── *.log                             # Training loss logs

Method

We apply HiF8 W8A8 Quantization-Aware Training to OpenPangu-Embedded-1B with the following key improvements over the baseline:

Parameter Value Rationale
amax algorithm max (over history) Prevents saturation by guaranteeing scale ≥ any historical peak
amax history length 64 steps Wider window for stable scale estimation
BF16 warmup steps 500 Stabilizes pretrained weights before introducing quantization noise
HiF8 max value (fwd/bwd) 15 Maps peak values to highest-precision tier (3 mantissa bits)
Learning rate 1e-5 Reduces catastrophic forgetting of pretrained commonsense knowledge
Global batch size 1024
Training steps 10,000
High-precision layers 5 First 5 layers kept in BF16

Training Loss Comparison

The chart below compares training loss curves between the HiF8 QAT model and the BF16 baseline over 10,000 steps.

Training Loss Comparison

Conclusion: The HiF8 QAT training loss tracks the BF16 baseline extremely closely throughout training. The average absolute percentage error (APE) across all 10,000 steps is 0.11%, and 0.12% over the final 1,000 steps. This demonstrates that the max-algorithm amax scaling with a 64-step history window effectively eliminates quantization saturation, producing a training trajectory nearly identical to full-precision BF16.


Benchmark Results

Baseline: bf16_lr1e5 (BF16 training with lr=1e-5, same data/steps)

Task BF16 (lr=1e-5) HiF8 QAT (lr=1e-5) Drop
MMLU (5-shot) 43.36% 43.17% 0.43%
GSM8K (5-shot) 1.59% 1.29% — (noise)
MATH500 (4-shot) 0.50% 0.46% — (noise)
HellaSwag (10-shot) 41.10% 40.86% 0.58%
ARC-Easy (25-shot) 51.56% 51.01% 1.06%
ARC-Challenge (25-shot) 36.77% 36.69% 0.22%

GSM8K and MATH500 drops are statistically insignificant (absolute difference < 0.3%).


Reproduction Instructions

1. Environment Setup

# Training environment
conda env create -f code/pangu1b_hif8.yml
conda activate pangu1b_hif8

# Evaluation environment
conda env create -f code/pangu1b_eval.yml
conda activate pangu1b_eval
pip install lm-eval==0.4.11 ray==2.55.1 "antlr4-python3-runtime==4.11" sympy math_verify

2. Prepare Model & Data

# Place base model weights at:
code/pangu_hif8_pretrain/models/openPangu-Embedded-1B/

# Dataset is loaded automatically from HuggingFace (FineWeb sample-10BT)
# or set --cache_dir to a local path in the training script

3. Train BF16 Baseline

bash code/pangu_hif8_pretrain/run_train_bf16_lr1e5.sh
# Output: checkpoints/bf16_lr1e5/final/
# Runtime: ~21 hours on 8× NVIDIA H100 80GB

4. Train HiF8 QAT (this submission)

bash code/pangu_hif8_pretrain/run_train_max_quant_lr1e5.sh
# Output: checkpoints/max_quant_lr1e5/final/
# Runtime: ~21 hours on 8× NVIDIA H100 80GB

5. Evaluate

cd code/evaluate_benchmarks

# Evaluate a single model
bash run_eval.sh max_quant_lr1e5

# Generate full comparison table
python compare_results.py

Hardware

  • GPUs: 8× NVIDIA H100 80GB
  • Training time: ~21 hours per run
  • Framework: PyTorch + torchrun (DDP)
Downloads last month
36
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support