Dimensional Matrixing: B1p KnowledgeCorridor
Non-uniform quantization of Qwen3.5-27B using per-layer precision allocation based on attention head behavior.
| Uniform Q4 | B1p (this model) | Delta | |
|---|---|---|---|
| Avg Weight Bits | 4.00 | 3.19 | -20% |
| Model Size | 14.0 GB | 15.2 GB | -- |
| MMLU-Pro | 52.2% | 49.4% | -2.8 |
| ARC-Challenge | 90.4% | 92.3% | +1.9 |
| GSM8K | 90.9% | 90.4% | -0.5 |
Beats uniform Q4 on reasoning. Matches on math. 20% fewer bits on average.
What is Dimensional Matrixing?
Qwen3.5-27B has 64 layers: 48 GatedDeltaNet (linear attention, no KV cache) and 16 standard attention layers. Uniform quantization treats them all the same. DM doesn't.
We profile every layer along three axes:
- Weight sensitivity -- per-layer quantization sweep at 2/3/4/6 bits
- KV cache sensitivity -- per-layer KV perturbation tests
- Attention head classification -- sink heads, retrieval heads, mixing heads
The result: a precision map that allocates 2-bit to layers that don't care and 8-bit to layers that do.
Key insight: Sink heads demand precision
Layer 15 has 15 out of 16 heads dedicated to position-0 attention (risk=24.6). It gets 8-bit keys. Layer 55 has zero sink heads (risk=0.7). It gets 2-bit keys. The 4x spread in precision across layers is driven by measurable head behavior.
K/V Asymmetry
Keys participate in every Q*K dot product. Values are gated by attention weights. B1p allocates 5.69-bit keys vs 3.63-bit values on average.
Knowledge Corridor
GDN layers flanking high-risk attention layers need protection. Promoting 4 GDN layers around L15 from 2-bit to 4-bit boosted ARC by +1.2 points.
Precision Map
Layer Wt K V Risk Sinks Role
3 3 2 2 0.8 0 Pure mixer (safe to compress)
7 3 2 2 1.1 0 Pure mixer
11 3 6 4 7.2 4 Transition zone
15 4 8 4 24.6 15 CRITICAL: model's primary sink nexus
19 3 8 4 13.6 8 High sink concentration
23 3 8 4 19.4 12 Secondary sink nexus
27 3 8 4 17.1 10 High sink concentration
31 3 8 4 13.8 8 High sink concentration
35 4 8 6 11.9 7 High risk (V needs extra precision)
39 3 8 6 10.0 6 High risk (V needs extra precision)
43 4 4 3 7.0 4 Transition zone
47 4 8 4 23.6 15 CRITICAL: twin sink nexus with L15
51 4 6 4 7.5 4 Transition zone
55 4 2 2 0.7 0 Pure mixer (most compressible)
59 4 2 2 1.2 0 Pure mixer
63 4 3 3 2.9 0 Final layer, specialized behavior
48 GDN layers: 7 at 2-bit, 38 at 3-bit, 19 at 4-bit weights. No KV cache.
Usage
MLX (Apple Silicon)
import mlx_lm
model, tokenizer = mlx_lm.load("Funkylazer/dm-qwen3.5-27b-b1p-knowledgecorridor")
response = mlx_lm.generate(model, tokenizer, prompt="Explain quantum entanglement.", max_tokens=512)
print(response)
Hardware Requirements
- Minimum: 16 GB unified memory (Apple Silicon) or 16 GB VRAM
- Comfortable: 24 GB+
- Tested on: Mac mini M4 Pro, 64 GB unified memory
Files
model-0000{1-4}-of-00004.safetensors-- Quantized model weights (15.2 GB total)dm_precision_map.json-- Full per-layer precision map with risk scores, sink counts, entropyconfig.json-- MLX-compatible model configurationtokenizer.json+tokenizer_config.json-- Tokenizer fileschat_template.jinja-- Chat template
Branch History
This model is the result of 16 iterative branches (B1a-B1p), each testing a specific quantization hypothesis:
| Branch | Hypothesis | MMLU-Pro | ARC | GSM8K |
|---|---|---|---|---|
| B1e | Fix knowledge-critical GDN layers | 51.4% | 85.5% | 91.9% |
| B1k | Ultra-aggressive compression | 47.0% | 50.2% | 81.9% |
| B1o | Composite merge | 49.6% | 91.1% | 91.3% |
| B1p | Knowledge corridor protection | 49.4% | 92.3% | 90.4% |
Community Testing Needed
We validated on 3 benchmarks on Apple Silicon. This model needs evaluation on:
- Code generation (HumanEval, MBPP)
- Long-context retrieval (NIAH at 32K-128K)
- Instruction following (MT-Bench, AlpacaEval)
- CUDA GPU performance
- Inference speed vs uniform Q4
Please open an issue on GitHub with results.
Paper & Code
- Paper: GitHub - paper/sections/
- Interactive Site: funkylazer.github.io/dimensional-matrixing
- Code: github.com/funkylazer/dimensional-matrixing
Citation
@misc{zhivelev2026dm,
title={Dimensional Matrixing: Non-Uniform Quantization for Hybrid Attention-SSM Architectures},
author={Zhivelev, Leon},
year={2026},
url={https://github.com/funkylazer/dimensional-matrixing}
}
License
Apache 2.0. Base model (Qwen3.5-27B) is Apache 2.0 by Alibaba/Qwen.
- Downloads last month
- 368
4-bit
Model tree for Funkylazer/dm-qwen3.5-27b-b1p-knowledgecorridor
Base model
Qwen/Qwen3.5-27B