WF-Champion: Calibrated Spectral Mixed-Precision LLM Quantization

πŸ“„ Champion Paper β€” Complete benchmark with 750-sample profiling
πŸ“„ v2 Paper β€” 2D Wavelet + Hessian + Golden Layer analysis
πŸ“„ Ternary Paper β€” 2-bit / ternary comprehensive comparison
πŸ“„ KV Paper β€” Spectral energy preservation in KV cache

Complete Benchmark (Qwen2.5-0.5B, A10G)

Method Bits PPL Ξ” PPL Tok/s TTFT Mem Size R-1 vs FP16
FP16 16.0 20.12 β€” 44.9 36ms 1.00G 0.99G 1.000
WF-Champion (NL) 10.0 20.13 +0.0% 44.9 25ms 1.00G 0.72G 0.978
RTN 8-bit 8.0 20.13 +0.0% 44.9 29ms 1.00G 0.63G 0.858
AutoRound 4-bit 4.0 21.63 +7.5% 24.4 45ms 1.04G 0.45G 0.434
WF-Champion (Med) 5.0 22.63 +12.4% 44.8 25ms 1.00G 0.50G 0.501
BnB NF4 4.0 22.83 +13.4% 22.2 83ms 1.53G 0.36G 0.622
RTN 4-bit 4.0 23.95 +19.0% 45.3 25ms 1.00G 0.45G 0.409
RTN 3-bit 3.0 72.40 +260% 45.1 25ms 1.00G 0.41G 0.205

Key Findings

1. Near-Lossless: Identical to FP16

WF-Champion near-lossless (10-bit avg) achieves PPL=20.13 with 97.8% ROUGE similarity to FP16, at 1.4Γ— compression and 31% faster TTFT.

2. Medium Beats RTN 4-bit

At ~5 effective bits, WF-Champion medium outperforms RTN 4-bit by 5.5% on PPL while maintaining 2Γ— higher throughput than BitsAndBytes NF4.

3. 750-Sample Profiler Finds Non-Obvious Golden Layers

Beyond positional layers (0,1,22,23), the profiler identifies layers 4-5 as golden due to high activation kurtosis (25-66Γ—), indicating outlier token patterns that require high precision.

4. Zero Runtime Overhead

Unlike BnB NF4 (+131% TTFT) or AutoRound (+25% TTFT), WF-Champion has zero inference overhead β€” quantized weights are standard integer tensors.

5. AutoRound Wins PPL-per-Bit, WF-Champion Wins Speed+Fidelity

Metric WF-Champion (Med) AutoRound 4-bit
PPL 22.63 21.63
Tok/s 44.8 (1.8Γ—) 24.4
TTFT 25ms (1.8Γ—) 45ms
R-1 0.501 0.434

Quality Tiers

Tier Avg Bits Golden Layers Compression Use Case
Near-lossless 10 @16-bit 1.4Γ— Maximum quality
High 5 @8-bit 2.0Γ— Production deployment
Medium 5 @8-bit 2.0Γ— Fast inference
Low 3.5 @8-bit 2.3Γ— Maximum compression

Quick Start

pip install torch transformers accelerate datasets PyWavelets auto-round bitsandbytes rouge-score
python champion_benchmark.py  # Full benchmark (~30 min on A10G)

Files

File Description
champion_benchmark.py WF-Champion: profiling + mixed precision + full benchmark
v2_benchmark.py WFIQ-SR v2 + golden layer experiments
ternary_benchmark.py 2-bit / ternary comparison
comprehensive_benchmark.py KV compression + energy analysis
tuning_sweep.py WaveletFourier hyperparameter sweep
results/ All JSON results
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support