Benchmarks/prediction_winners_benchmarks.md · magiccodingman/Seed-OSS-36B-Instruct-unsloth-MagicQuant-Hybrid-GGUF at main

Seed-OSS-36B-Instruct-unsloth-MagicQuant-Hybrid-GGUF / Benchmarks /prediction_winners_benchmarks.md

magiccodingman

File name changes

94a426d verified 6 days ago

preview code

raw

history blame contribute delete

6.87 kB

	# Hybrid Naming Scheme & Benchmark Synopsis

	This report summarizes baseline and hybrid quantization results for `Seed-OSS-36B-Instruct-unsloth` as measured by the Magic Quant pipeline.

	## Naming Scheme

	Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.

	\| Suffix Example \| Meaning \|
	\| -------------- \| ------- \|
	\| `BF16` \| Pure full-precision family baseline (no quantization). \|
	\| `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K_M`, `IQ4_NL`, `MXFP4_MOE` \| Pure model-wide quantization baselines. \|
	\| `iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K` \| Base conversion mode `iq4_nl` with per-group schemes: embeddings (`emb_`), output head (`head_`), MoE router (`moe_rt_`). \|
	\| `...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K` \| Extended sensitivity groups: Attention Q (`aq_`), Attention K+V (`akv_`), FFN Down (`fd_`), Attention Output (`ao_`). \|
	\| `mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K` \| MXFP4-centric hybrids with MoE expert group (`moe_exp_`) and mixed IQ / Q-schemes per tensor group. \|

	In general, anything after the base model name is a purely mechanical description of how the weights were transformed, not a new training run.

	---

	## Benchmark Methodology

	All models were tested with a unified automated harness using `llama.cpp` tools.

	Included tests:

	- Throughput:
	`llama-bench` with descending GPU offload (`-ngl 35 → 0`) and automatic OOM retry.
	Highest successful TPS is recorded.

	- Perplexity:
	Three domains: general, code, math.
	Each uses an auto-generated corpus of ~32k tokens.
	Perplexity is computed with `llama-perplexity` at 2048-token context.
	Same GPU retry logic as above.

	- Precision loss:
	Each model is compared to its family BF16 baseline.
	Precision-loss % is computed for all PPL domains, plus an averaged score.
	Models are ranked by this metric.

	---

	### Table - Overview of Results

	Comparing to BF16.

	\| model_name \| size_reduction \| tps_change \|
	\| ---------- \| -------------- \| ---------- \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 41.04% \| 54.44% \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 46.87% \| 63.07% \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 46.87% \| 62.89% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 58.40% \| 111.41% \|
	\| Q6_K \| 58.98% \| 99.91% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 58.98% \| 103.31% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 71.86% \| 178.75% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 72.29% \| 134.32% \|
	\| MXFP4_MOE \| 73.42% \| 78.22% \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 73.42% \| 78.14% \|

	* All percentages compared against the selected family BF16 baseline.

	---

	### Table - File Size + TPS + Avg Precision Loss

	\| model_name \| file_size_gb \| bench_tps \| avg_prec_loss \|
	\| ---------- \| ------------ \| --------- \| ------------- \|
	\| BF16 \| 67.35 \| 11.48 \| 0.0000% \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 39.71 \| 17.73 \| 0.0213% \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 35.78 \| 18.72 \| 0.0272% \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 35.78 \| 18.70 \| 0.0272% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 28.02 \| 24.27 \| 0.1768% \|
	\| Q6_K \| 27.63 \| 22.95 \| 0.2037% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 27.63 \| 23.34 \| 0.2037% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 18.95 \| 32.00 \| 0.2709% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 18.66 \| 26.90 \| 0.7098% \|
	\| MXFP4_MOE \| 17.90 \| 20.46 \| 2.7338% \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 17.90 \| 20.45 \| 2.7338% \|

	* `avg_prec_loss` is the averaged absolute precision-loss % vs BF16.

	---

	### Table - PPL Columns

	\| model_name \| gen \| gen_er \| code \| code_er \| math \| math_er \|
	\| ---------- \| --- \| ------ \| ---- \| ------- \| ---- \| ------- \|
	\| BF16 \| 6.8872 \| 0.1679 \| 1.4128 \| 0.0095 \| 5.4442 \| 0.1209 \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 6.8901 \| 0.1680 \| 1.4127 \| 0.0095 \| 5.4434 \| 0.1208 \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 6.8866 \| 0.1679 \| 1.4130 \| 0.0095 \| 5.4474 \| 0.1210 \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 6.8866 \| 0.1679 \| 1.4130 \| 0.0095 \| 5.4474 \| 0.1210 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 6.8901 \| 0.1682 \| 1.4156 \| 0.0096 \| 5.4284 \| 0.1203 \|
	\| Q6_K \| 6.9012 \| 0.1685 \| 1.4135 \| 0.0095 \| 5.4637 \| 0.1218 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 6.9012 \| 0.1685 \| 1.4135 \| 0.0095 \| 5.4637 \| 0.1218 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 6.8712 \| 0.1654 \| 1.4162 \| 0.0095 \| 5.4627 \| 0.1201 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 6.8452 \| 0.1639 \| 1.4140 \| 0.0094 \| 5.5223 \| 0.1222 \|
	\| MXFP4_MOE \| 7.1007 \| 0.1728 \| 1.4351 \| 0.0097 \| 5.6360 \| 0.1239 \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 7.1007 \| 0.1728 \| 1.4351 \| 0.0097 \| 5.6360 \| 0.1239 \|

	* gen = ppl_general, code = ppl_code, math = ppl_math

	---

	### Table - Precision Loss Columns

	\| model_name \| loss_general \| loss_code \| loss_math \|
	\| ---------- \| ------------ \| --------- \| --------- \|
	\| BF16 \| 0.0000 \| 0.0000 \| 0.0000 \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 0.0421 \| 0.0071 \| 0.0147 \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 0.0087 \| 0.0142 \| 0.0588 \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 0.0087 \| 0.0142 \| 0.0588 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 0.0421 \| 0.1982 \| 0.2902 \|
	\| Q6_K \| 0.2033 \| 0.0495 \| 0.3582 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 0.2033 \| 0.0495 \| 0.3582 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 0.2323 \| 0.2407 \| 0.3398 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 0.6098 \| 0.0849 \| 1.4346 \|
	\| MXFP4_MOE \| 3.1000 \| 1.5784 \| 3.5230 \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 3.1000 \| 1.5784 \| 3.5230 \|

	* loss_* values are absolute precision-loss % vs BF16 per domain.

	# Hybrid Naming Scheme & Benchmark Synopsis

	This report summarizes baseline and hybrid quantization results for `Seed-OSS-36B-Instruct-unsloth` as measured by the Magic Quant pipeline.

	## Naming Scheme

	Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.

	\| Suffix Example \| Meaning \|
	\| -------------- \| ------- \|
	\| `BF16` \| Pure full-precision family baseline (no quantization). \|
	\| `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K_M`, `IQ4_NL`, `MXFP4_MOE` \| Pure model-wide quantization baselines. \|
	\| `iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K` \| Base conversion mode `iq4_nl` with per-group schemes: embeddings (`emb_`), output head (`head_`), MoE router (`moe_rt_`). \|
	\| `...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K` \| Extended sensitivity groups: Attention Q (`aq_`), Attention K+V (`akv_`), FFN Down (`fd_`), Attention Output (`ao_`). \|
	\| `mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K` \| MXFP4-centric hybrids with MoE expert group (`moe_exp_`) and mixed IQ / Q-schemes per tensor group. \|

	In general, anything after the base model name is a purely mechanical description of how the weights were transformed, not a new training run.

	---

	## Benchmark Methodology

	All models were tested with a unified automated harness using `llama.cpp` tools.

	Included tests:

	- Throughput:
	`llama-bench` with descending GPU offload (`-ngl 35 → 0`) and automatic OOM retry.
	Highest successful TPS is recorded.

	- Perplexity:
	Three domains: general, code, math.
	Each uses an auto-generated corpus of ~32k tokens.
	Perplexity is computed with `llama-perplexity` at 2048-token context.
	Same GPU retry logic as above.

	- Precision loss:
	Each model is compared to its family BF16 baseline.
	Precision-loss % is computed for all PPL domains, plus an averaged score.
	Models are ranked by this metric.

	---

	### Table - Overview of Results

	Comparing to BF16.

	\| model_name \| size_reduction \| tps_change \|
	\| ---------- \| -------------- \| ---------- \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 41.04% \| 54.44% \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 46.87% \| 63.07% \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 46.87% \| 62.89% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 58.40% \| 111.41% \|
	\| Q6_K \| 58.98% \| 99.91% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 58.98% \| 103.31% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 71.86% \| 178.75% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 72.29% \| 134.32% \|
	\| MXFP4_MOE \| 73.42% \| 78.22% \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 73.42% \| 78.14% \|

	* All percentages compared against the selected family BF16 baseline.

	---

	### Table - File Size + TPS + Avg Precision Loss

	\| model_name \| file_size_gb \| bench_tps \| avg_prec_loss \|
	\| ---------- \| ------------ \| --------- \| ------------- \|
	\| BF16 \| 67.35 \| 11.48 \| 0.0000% \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 39.71 \| 17.73 \| 0.0213% \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 35.78 \| 18.72 \| 0.0272% \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 35.78 \| 18.70 \| 0.0272% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 28.02 \| 24.27 \| 0.1768% \|
	\| Q6_K \| 27.63 \| 22.95 \| 0.2037% \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 27.63 \| 23.34 \| 0.2037% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 18.95 \| 32.00 \| 0.2709% \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 18.66 \| 26.90 \| 0.7098% \|
	\| MXFP4_MOE \| 17.90 \| 20.46 \| 2.7338% \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 17.90 \| 20.45 \| 2.7338% \|

	* `avg_prec_loss` is the averaged absolute precision-loss % vs BF16.

	---

	### Table - PPL Columns

	\| model_name \| gen \| gen_er \| code \| code_er \| math \| math_er \|
	\| ---------- \| --- \| ------ \| ---- \| ------- \| ---- \| ------- \|
	\| BF16 \| 6.8872 \| 0.1679 \| 1.4128 \| 0.0095 \| 5.4442 \| 0.1209 \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 6.8901 \| 0.1680 \| 1.4127 \| 0.0095 \| 5.4434 \| 0.1208 \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 6.8866 \| 0.1679 \| 1.4130 \| 0.0095 \| 5.4474 \| 0.1210 \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 6.8866 \| 0.1679 \| 1.4130 \| 0.0095 \| 5.4474 \| 0.1210 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 6.8901 \| 0.1682 \| 1.4156 \| 0.0096 \| 5.4284 \| 0.1203 \|
	\| Q6_K \| 6.9012 \| 0.1685 \| 1.4135 \| 0.0095 \| 5.4637 \| 0.1218 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 6.9012 \| 0.1685 \| 1.4135 \| 0.0095 \| 5.4637 \| 0.1218 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 6.8712 \| 0.1654 \| 1.4162 \| 0.0095 \| 5.4627 \| 0.1201 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 6.8452 \| 0.1639 \| 1.4140 \| 0.0094 \| 5.5223 \| 0.1222 \|
	\| MXFP4_MOE \| 7.1007 \| 0.1728 \| 1.4351 \| 0.0097 \| 5.6360 \| 0.1239 \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 7.1007 \| 0.1728 \| 1.4351 \| 0.0097 \| 5.6360 \| 0.1239 \|

	* gen = ppl_general, code = ppl_code, math = ppl_math

	---

	### Table - Precision Loss Columns

	\| model_name \| loss_general \| loss_code \| loss_math \|
	\| ---------- \| ------------ \| --------- \| --------- \|
	\| BF16 \| 0.0000 \| 0.0000 \| 0.0000 \|
	\| mxfp4_moe-akv_BF16-ao_Q5_K-aq_Q8_0-emb_Q5_K-fd_Q8_0-fug_Q8_0-head_BF16 \| 0.0421 \| 0.0071 \| 0.0147 \|
	\| mxfp4_moe-akv_Q8_0-ao_MXFP4-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 0.0087 \| 0.0142 \| 0.0588 \|
	\| mxfp4_moe-akv_Q8_0-ao_IQ4_NL-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0-head_Q8_0 \| 0.0087 \| 0.0142 \| 0.0588 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q8_0-emb_BF16-fd_IQ4_NL-fug_Q6_K-head_Q8_0 \| 0.0421 \| 0.1982 \| 0.2902 \|
	\| Q6_K \| 0.2033 \| 0.0495 \| 0.3582 \|
	\| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K-head_Q6_K \| 0.2033 \| 0.0495 \| 0.3582 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_IQ4_NL-fug_IQ4_NL-head_IQ4_NL \| 0.2323 \| 0.2407 \| 0.3398 \|
	\| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_MXFP4-fd_MXFP4-fug_IQ4_NL-head_IQ4_NL \| 0.6098 \| 0.0849 \| 1.4346 \|
	\| MXFP4_MOE \| 3.1000 \| 1.5784 \| 3.5230 \|
	\| mxfp4_moe-akv_MXFP4-ao_MXFP4-aq_MXFP4-emb_MXFP4-fd_MXFP4-fug_MXFP4-head_MXFP4 \| 3.1000 \| 1.5784 \| 3.5230 \|

	* loss_* values are absolute precision-loss % vs BF16 per domain.