Instructions to use osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX", "messages": [ {"role": "user", "content": "Hello"} ] }'
DeepSeek-V4-Flash-TQ-Q2.3-MLX
osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX is an Apple-Silicon MLX TurboQuant/JANGTQ quantization of deepseek-ai/DeepSeek-V4-Flash.
No fine-tuning, distillation, or retraining was applied. The official mixed FP4/FP8 source weights were converted locally, the MTP head was dropped because it is not used for normal decode, and router/mHC/control tensors were preserved rather than aggressively quantized.
Model Details
| Property | Value |
|---|---|
| Base model | deepseek-ai/DeepSeek-V4-Flash |
| Architecture | DeepSeek-V4 Flash MoE, 284B total / 13B active, 1M context |
| Local profile | JANGTQ-Q2.3 |
| Bundle size | 88.03 GB |
| Layout | Pre-stacked MLX switch_mlp layout |
| MTP head | Dropped |
| Validation | Safetensors header/index validation, metadata validation |
Required Sidecar
This is a JANGTQ/TurboQuant bundle and requires jangtq_runtime.safetensors from this repository. The sidecar stores the deterministic codebooks and Hadamard rotation signs used to decode the .tq_packed expert weights. If it is missing, re-download the full repository or fetch that file explicitly:
hf download osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX jangtq_runtime.safetensors --local-dir <your-model-dir>
Quantization Recipe
| Tensor class | Codec | Bits / handling |
|---|---|---|
| Routed experts | TurboQuant MXTQ | 110 routed layer/projection groups at 2-bit MXTQ and 19 at 4-bit MXTQ |
| Routed effective bits | MXTQ | 2.2946 bits |
| Attention, shared experts, compressor, indexer, embed, lm head | MLX affine | 8-bit, group size 32 |
| Norms, router, mHC, sinks, integer routing tables | passthrough | source precision preserved |
The fractional target is implemented as a power-of-two lane mix because the current JANGTQ vectorized packer is stable on 2/4/8-bit lanes for DeepSeek-V4 expert dimensions.
Use
Install the JANG loader/runtime and MLX LM:
pip install mlx-lm jang-tools
Example:
from jang_tools.load_jangtq import load_jangtq_model
from mlx_lm import generate
model, tokenizer = load_jangtq_model("osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX")
prompt = "Write a short note about MLX quantization."
text = generate(model, tokenizer, prompt=prompt, verbose=True)
print(text)
Files
model-*.safetensors: pre-stacked JANGTQ/MLX shardsmodel.safetensors.index.json: shard indexjangtq_runtime.safetensors: required TurboQuant runtime sidecarconfig.json,jang_config.json: MLX/JANGTQ metadataencoding/: upstream DeepSeek-V4 prompt encoding reference
Notes
This card follows the same broad shape as the other osmapi DeepSeek-V4-Flash MLX uploads: a sidecar warning, an explicit recipe table, and minimal reproducible loading instructions. Q2.3 is an aggressive size-first TurboQuant profile, so treat it as experimental until evaluated on your target prompts.
License
MIT, following the upstream DeepSeek-V4-Flash release.
- Downloads last month
- 1,149
Quantized
Model tree for osmapi/DeepSeek-V4-Flash-TQ-Q2.3-MLX
Base model
deepseek-ai/DeepSeek-V4-Flash