| # Hybrid Naming Scheme & Benchmark Synopsis | |
| This report summarizes baseline and hybrid quantization results for `Seed-OSS-36B-Instruct-unsloth` as measured by the Magic Quant pipeline. | |
| ## Naming Scheme | |
| Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes. | |
| | Suffix Example | Meaning | | |
| | -------------- | ------- | | |
| | `BF16` | Pure full-precision family baseline (no quantization). | | |
| | `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K_M`, `IQ4_NL`, `MXFP4_MOE` | Pure model-wide quantization baselines. | | |
| | `iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K` | Base conversion mode `iq4_nl` with per-group schemes: embeddings (`emb_`), output head (`head_`), MoE router (`moe_rt_`). | | |
| | `...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K` | Extended sensitivity groups: Attention Q (`aq_`), Attention K+V (`akv_`), FFN Down (`fd_`), Attention Output (`ao_`). | | |
| | `mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K` | MXFP4-centric hybrids with MoE expert group (`moe_exp_`) and mixed IQ / Q-schemes per tensor group. | | |
| In general, anything after the base model name is a purely mechanical description of **how** the weights were transformed, not a new training run. | |
| --- | |
| ## Benchmark Methodology | |
| All models were tested with a unified automated harness using `llama.cpp` tools. | |
| **Included tests:** | |
| - **Throughput:** | |
| `llama-bench` with descending GPU offload (`-ngl 35 → 0`) and automatic OOM retry. | |
| Highest successful TPS is recorded. | |
| - **Perplexity:** | |
| Three domains: **general**, **code**, **math**. | |
| Each uses an auto-generated corpus of ~**32k tokens**. | |
| Perplexity is computed with `llama-perplexity` at **2048-token** context. | |
| Same GPU retry logic as above. | |
| - **Precision loss:** | |
| Each model is compared to its **family BF16 baseline**. | |
| Precision-loss % is computed for all PPL domains, plus an averaged score. | |
| Models are ranked by this metric. | |
| --- | |
| ### Table - Overview of Results | |
| Comparing to BF16. | |
| | model_name | size_reduction | tps_change | | |
| | ---------- | -------------- | ---------- | | |
| | mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 | 41.04% | 54.44% | | |
| | mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 46.87% | 63.07% | | |
| | mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 46.87% | 62.89% | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 | 58.40% | 111.41% | | |
| | Q6_K | 58.98% | 99.91% | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K | 58.98% | 103.31% | | |
| | mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 71.86% | 178.75% | | |
| | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL | 72.29% | 134.32% | | |
| | MXFP4_MOE | 73.42% | 78.22% | | |
| | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 | 73.42% | 78.14% | | |
| * All percentages compared against the selected family BF16 baseline. | |
| --- | |
| ### Table - File Size + TPS + Avg Precision Loss | |
| | model_name | file_size_gb | bench_tps | avg_prec_loss | | |
| | ---------- | ------------ | --------- | ------------- | | |
| | BF16 | 67.35 | 11.48 | 0.0000% | | |
| | mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 | 39.71 | 17.73 | 0.0213% | | |
| | mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 35.78 | 18.72 | 0.0272% | | |
| | mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 35.78 | 18.70 | 0.0272% | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 | 28.02 | 24.27 | 0.1768% | | |
| | Q6_K | 27.63 | 22.95 | 0.2037% | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K | 27.63 | 23.34 | 0.2037% | | |
| | mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 18.95 | 32.00 | 0.2709% | | |
| | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL | 18.66 | 26.90 | 0.7098% | | |
| | MXFP4_MOE | 17.90 | 20.46 | 2.7338% | | |
| | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 | 17.90 | 20.45 | 2.7338% | | |
| * `avg_prec_loss` is the averaged absolute precision-loss % vs BF16. | |
| --- | |
| ### Table - PPL Columns | |
| | model_name | gen | gen_er | code | code_er | math | math_er | | |
| | ---------- | --- | ------ | ---- | ------- | ---- | ------- | | |
| | BF16 | 6.8872 | 0.1679 | 1.4128 | 0.0095 | 5.4442 | 0.1209 | | |
| | mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 | 6.8901 | 0.1680 | 1.4127 | 0.0095 | 5.4434 | 0.1208 | | |
| | mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 6.8866 | 0.1679 | 1.4130 | 0.0095 | 5.4474 | 0.1210 | | |
| | mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 6.8866 | 0.1679 | 1.4130 | 0.0095 | 5.4474 | 0.1210 | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 | 6.8901 | 0.1682 | 1.4156 | 0.0096 | 5.4284 | 0.1203 | | |
| | Q6_K | 6.9012 | 0.1685 | 1.4135 | 0.0095 | 5.4637 | 0.1218 | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K | 6.9012 | 0.1685 | 1.4135 | 0.0095 | 5.4637 | 0.1218 | | |
| | mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 6.8712 | 0.1654 | 1.4162 | 0.0095 | 5.4627 | 0.1201 | | |
| | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL | 6.8452 | 0.1639 | 1.4140 | 0.0094 | 5.5223 | 0.1222 | | |
| | MXFP4_MOE | 7.1007 | 0.1728 | 1.4351 | 0.0097 | 5.6360 | 0.1239 | | |
| | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 | 7.1007 | 0.1728 | 1.4351 | 0.0097 | 5.6360 | 0.1239 | | |
| * gen = ppl_general, code = ppl_code, math = ppl_math | |
| --- | |
| ### Table - Precision Loss Columns | |
| | model_name | loss_general | loss_code | loss_math | | |
| | ---------- | ------------ | --------- | --------- | | |
| | BF16 | 0.0000 | 0.0000 | 0.0000 | | |
| | mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 | 0.0421 | 0.0071 | 0.0147 | | |
| | mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 0.0087 | 0.0142 | 0.0588 | | |
| | mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 | 0.0087 | 0.0142 | 0.0588 | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 | 0.0421 | 0.1982 | 0.2902 | | |
| | Q6_K | 0.2033 | 0.0495 | 0.3582 | | |
| | mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K | 0.2033 | 0.0495 | 0.3582 | | |
| | mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL | 0.2323 | 0.2407 | 0.3398 | | |
| | mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL | 0.6098 | 0.0849 | 1.4346 | | |
| | MXFP4_MOE | 3.1000 | 1.5784 | 3.5230 | | |
| | mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 | 3.1000 | 1.5784 | 3.5230 | | |
| * loss_* values are absolute precision-loss % vs BF16 per domain. | |