DeepSeek-V4-Flash-5bit-mlx

osmapi/DeepSeek-V4-Flash-5bit-mlx is an Apple-Silicon MLX quantization of deepseek-ai/DeepSeek-V4-Flash.

No fine-tuning, distillation, or retraining was applied. The official mixed FP4/FP8 source weights were converted locally, the MTP head was dropped because it is not used for normal decode, and router/mHC/control tensors were preserved rather than aggressively quantized.

Model Details

Property	Value
Base model	`deepseek-ai/DeepSeek-V4-Flash`
Architecture	DeepSeek-V4 Flash MoE, 284B total / 13B active, 1M context
Local profile	`MLX-Affine-Q5`
Bundle size	195.67 GB
Layout	Pre-stacked MLX `switch_mlp` layout
MTP head	Dropped
Validation	Safetensors header/index validation, metadata validation

Quantization Recipe

Tensor class	Codec	Bits / handling
Linear/Embedding/SwitchLinear weights	MLX affine	5-bit, group size 64
Routed experts	MLX affine	pre-stacked `switch_mlp` tensors with `.weight`, `.scales`, `.biases`
Norms, router gate, mHC, sinks, APE, integer routing tables	passthrough	source precision preserved

This is a normal MLX affine quantization, not JANGTQ/TurboQuant. Quantized tensors use the standard MLX triplet layout:

.weight
.scales
.biases

Use with MLX

pip install -U mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("osmapi/DeepSeek-V4-Flash-5bit-mlx")
prompt = "Write a short note about MLX quantization."
text = generate(model, tokenizer, prompt=prompt, verbose=True)
print(text)

For the DeepSeek-V4 official chat/message format, see the included encoding/ folder from the upstream repository.

Files

model-*.safetensors: standard MLX affine shards
model.safetensors.index.json: shard index
config.json, jang_config.json: MLX metadata
encoding/: upstream DeepSeek-V4 prompt encoding reference

Notes

The naming follows the common MLX category convention used by mlx-community/*-4bit / *-8bit uploads and the local osmapi/*-6bit-mlx style, while the README keeps the explicit recipe/validation structure used by larger DeepSeek-V4 quant uploads.

License

MIT, following the upstream DeepSeek-V4-Flash release.

Downloads last month: 540

Safetensors

Model size

53B params

Tensor type

BF16

U32

F32

I64

MLX

Hardware compatibility

5-bit

Model tree for osmapi/DeepSeek-V4-Flash-5bit-mlx

Base model

deepseek-ai/DeepSeek-V4-Flash

Quantized

(87)

this model