Solar-Open-100B-MXFP4
W4A4 (MXFP4) quantized version of upstage/Solar-Open-100B, with both weights and activations quantized to OCP MXFP4 format.
Solar Open is a dense-MoE hybrid large language model developed by Upstage, featuring 100B total parameters with shared expert and routed expert architecture.
Quantization Details
| Property | Value |
|---|---|
| Base Model | upstage/Solar-Open-100B (100B params, dense-MoE hybrid) |
| Precision | W4A4 (MXFP4 Weight + MXFP4 Activation) |
| Weight Quantization | OCP MXFP4 (E2M1), Static, group_size=32, E8M0 shared scales |
| Activation Quantization | OCP MXFP4 (E2M1), Dynamic, group_size=32, E8M0 shared scales |
| Quantization Tool | quanto + AMD Quark 0.11.1 (file-to-file) |
| Algorithm | RTN (Round-To-Nearest) |
| Original Size | 192 GB |
| Quantized Size | 63.4 GB |
| Compression Ratio | 3.0x |
Benchmark Results
Evaluated using vLLM (ROCm) with 5-shot prompting on AMD MI355 (gfx950).
MMLU (5-shot)
| Category | Baseline (BF16) | MXFP4 | Delta |
|---|---|---|---|
| Overall | 77.58% | 76.78% | -0.80% |
| Humanities | 71.09% | 70.48% | -0.61% |
| Social Sciences | 86.90% | 86.32% | -0.58% |
| STEM | 74.44% | 73.04% | -1.40% |
| Other | 81.36% | 80.69% | -0.67% |
KMMLU (5-shot, Korean)
| Category | Baseline (BF16) | MXFP4 | Delta |
|---|---|---|---|
| Overall | 57.38% | 56.45% | -0.93% |
| Applied Science | 53.28% | 52.41% | -0.87% |
| HUMSS | 66.43% | 66.00% | -0.43% |
| STEM | 58.13% | 57.29% | -0.84% |
| Other | 56.64% | 55.19% | -1.45% |
Excluded Layers
This model uses the attn-excl strategy: all attention projections and all shared expert projections are excluded from quantization, leaving only MoE routed expert weights quantized to MXFP4.
- Self-Attention (all 48 layers): q_proj, k_proj, v_proj, o_proj
- Shared Expert MLP (all 48 layers): gate_proj, up_proj, down_proj
- MoE Router Gates: All mlp.gate layers
- Standard exclusions: lm_head, embed_tokens, all norm layers
Total: 340 excluded layers. This approach achieves less than 1% accuracy degradation vs BF16 while maintaining 3.0x compression.
Hardware
- Quantized on: AMD MI355 (gfx950), 288 GB VRAM
- Tested with: vLLM v0.18.2 (ROCm), TP=1
- Compatible with: vLLM with Quark quantization support (quant_method: "quark")
Usage
vLLM
vllm serve haanjack/Solar-Open-100B-MXFP4 \
--trust-remote-code \
--tensor-parallel-size 1 \
--max-model-len 4096
Quantization Reproduction
from quanto import UnifiedQuantizer, UnifiedConfig
NUM_LAYERS = 48
exclude = ["lm_head", "*embed*", "*norm*", "*.gate"]
for i in range(NUM_LAYERS):
for proj in ["q_proj", "k_proj", "v_proj", "o_proj"]:
exclude.append(f"model.layers.{i}.self_attn.{proj}")
for proj in ["gate_proj", "up_proj", "down_proj"]:
exclude.append(f"model.layers.{i}.mlp.shared_experts.{proj}")
config = UnifiedConfig(
model_path="upstage/Solar-Open-100B",
output_dir="./Solar-Open-100B-MXFP4",
precision="mxfp4",
sensitivity_analysis=False,
exclude_layers=exclude,
trust_remote_code=True,
)
UnifiedQuantizer(config).run()
Credits
- Base Model: Upstage — Solar Open
- Quantization: quanto with AMD Quark
- Hardware: AMD MI355 (gfx950), 288 GB VRAM
License
This model inherits the Apache 2.0 License from the base model.
- Downloads last month
- 492
Model tree for haanjack/Solar-Open-100B-MXFP4
Base model
upstage/Solar-Open-100B