DeepSeek-V4-Flash-6bit-mlx

osmapi/DeepSeek-V4-Flash-6bit-mlx is an Apple-Silicon MLX quantization of deepseek-ai/DeepSeek-V4-Flash.

No fine-tuning, distillation, or retraining was applied. The official mixed FP4/FP8 source weights were converted locally, the MTP head was dropped because it is not used for normal decode, and router/mHC/control tensors were preserved rather than aggressively quantized.

Model Details

Property Value
Base model deepseek-ai/DeepSeek-V4-Flash
Architecture DeepSeek-V4 Flash MoE, 284B total / 13B active, 1M context
Local profile MLX-Affine-Q6
Bundle size 231.21 GB
Layout Pre-stacked MLX switch_mlp layout
MTP head Dropped
Validation Safetensors header/index validation, metadata validation

Quantization Recipe

Tensor class Codec Bits / handling
Linear/Embedding/SwitchLinear weights MLX affine 6-bit, group size 64
Routed experts MLX affine pre-stacked switch_mlp tensors with .weight, .scales, .biases
Norms, router gate, mHC, sinks, APE, integer routing tables passthrough source precision preserved

This is a normal MLX affine quantization, not JANGTQ/TurboQuant. Quantized tensors use the standard MLX triplet layout:

  • .weight
  • .scales
  • .biases

Use with MLX

pip install -U mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("osmapi/DeepSeek-V4-Flash-6bit-mlx")
prompt = "Write a short note about MLX quantization."
text = generate(model, tokenizer, prompt=prompt, verbose=True)
print(text)

For the DeepSeek-V4 official chat/message format, see the included encoding/ folder from the upstream repository.

Files

  • model-*.safetensors: standard MLX affine shards
  • model.safetensors.index.json: shard index
  • config.json, jang_config.json: MLX metadata
  • encoding/: upstream DeepSeek-V4 prompt encoding reference

Notes

The naming follows the common MLX category convention used by mlx-community/*-4bit / *-8bit uploads and the local osmapi/*-6bit-mlx style, while the README keeps the explicit recipe/validation structure used by larger DeepSeek-V4 quant uploads.

License

MIT, following the upstream DeepSeek-V4-Flash release.

Downloads last month
118
Safetensors
Model size
62B params
Tensor type
BF16
·
U32
·
F32
·
I64
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for osmapi/DeepSeek-V4-Flash-6bit-mlx

Quantized
(50)
this model