magiccodingman's picture
File name changes
94a426d verified

Hybrid Naming Scheme & Benchmark Synopsis

This report summarizes baseline and hybrid quantization results for Seed-OSS-36B-Instruct-unsloth as measured by the Magic Quant pipeline.

Naming Scheme

Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.

Suffix Example Meaning
BF16 Pure full-precision family baseline (no quantization).
Q8_0, Q6_K, Q5_K, Q4_K_M, IQ4_NL, MXFP4_MOE Pure model-wide quantization baselines.
iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K Base conversion mode iq4_nl with per-group schemes: embeddings (emb_), output head (head_), MoE router (moe_rt_).
...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K Extended sensitivity groups: Attention Q (aq_), Attention K+V (akv_), FFN Down (fd_), Attention Output (ao_).
mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K MXFP4-centric hybrids with MoE expert group (moe_exp_) and mixed IQ / Q-schemes per tensor group.

In general, anything after the base model name is a purely mechanical description of how the weights were transformed, not a new training run.


Benchmark Methodology

All models were tested with a unified automated harness using llama.cpp tools.

Included tests:

  • Throughput:
    llama-bench with descending GPU offload (-ngl 35 → 0) and automatic OOM retry.
    Highest successful TPS is recorded.

  • Perplexity:
    Three domains: general, code, math.
    Each uses an auto-generated corpus of ~32k tokens.
    Perplexity is computed with llama-perplexity at 2048-token context.
    Same GPU retry logic as above.

  • Precision loss:
    Each model is compared to its family BF16 baseline.
    Precision-loss % is computed for all PPL domains, plus an averaged score.
    Models are ranked by this metric.


Table - Overview of Results

Comparing to BF16.

model_name size_reduction tps_change
mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 41.04% 54.44%
mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 46.87% 63.07%
mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 46.87% 62.89%
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 58.40% 111.41%
Q6_K 58.98% 99.91%
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K 58.98% 103.31%
mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 71.86% 178.75%
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL 72.29% 134.32%
MXFP4_MOE 73.42% 78.22%
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 73.42% 78.14%
  • All percentages compared against the selected family BF16 baseline.

Table - File Size + TPS + Avg Precision Loss

model_name file_size_gb bench_tps avg_prec_loss
BF16 67.35 11.48 0.0000%
mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 39.71 17.73 0.0213%
mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 35.78 18.72 0.0272%
mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 35.78 18.70 0.0272%
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 28.02 24.27 0.1768%
Q6_K 27.63 22.95 0.2037%
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K 27.63 23.34 0.2037%
mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 18.95 32.00 0.2709%
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL 18.66 26.90 0.7098%
MXFP4_MOE 17.90 20.46 2.7338%
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 17.90 20.45 2.7338%
  • avg_prec_loss is the averaged absolute precision-loss % vs BF16.

Table - PPL Columns

model_name gen gen_er code code_er math math_er
BF16 6.8872 0.1679 1.4128 0.0095 5.4442 0.1209
mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 6.8901 0.1680 1.4127 0.0095 5.4434 0.1208
mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 6.8866 0.1679 1.4130 0.0095 5.4474 0.1210
mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 6.8866 0.1679 1.4130 0.0095 5.4474 0.1210
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 6.8901 0.1682 1.4156 0.0096 5.4284 0.1203
Q6_K 6.9012 0.1685 1.4135 0.0095 5.4637 0.1218
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K 6.9012 0.1685 1.4135 0.0095 5.4637 0.1218
mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 6.8712 0.1654 1.4162 0.0095 5.4627 0.1201
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL 6.8452 0.1639 1.4140 0.0094 5.5223 0.1222
MXFP4_MOE 7.1007 0.1728 1.4351 0.0097 5.6360 0.1239
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 7.1007 0.1728 1.4351 0.0097 5.6360 0.1239
  • gen = ppl_general, code = ppl_code, math = ppl_math

Table - Precision Loss Columns

model_name loss_general loss_code loss_math
BF16 0.0000 0.0000 0.0000
mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 0.0421 0.0071 0.0147
mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 0.0087 0.0142 0.0588
mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 0.0087 0.0142 0.0588
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 0.0421 0.1982 0.2902
Q6_K 0.2033 0.0495 0.3582
mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K 0.2033 0.0495 0.3582
mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL 0.2323 0.2407 0.3398
mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL 0.6098 0.0849 1.4346
MXFP4_MOE 3.1000 1.5784 3.5230
mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 3.1000 1.5784 3.5230
  • loss_* values are absolute precision-loss % vs BF16 per domain.