mlx-community
/

Nemotron-Mini-4B-Instruct-4bit-mlx

+---
+language:
+- en
+license: other
+license_name: nvidia-open-model-license
+license_link: >-
+  https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
+tags:
+- mlx
+- llm
+- nemotron
+- apple-silicon
+base_model: nvidia/Nemotron-Mini-4B-Instruct
+---
+# Nemotron-Mini-4B-Instruct-4bit-mlx
+This model was converted from [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct)
+to [MLX](https://github.com/ml-explore/mlx) format for use on Apple Silicon.
+**Quantization:** 4-bit default affine quantization (~4.5 bpw)
+## Usage
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx")
+prompt = (
+    "<extra_id_0>System\n"
+    "You are a helpful, honest AI assistant.\n\n"
+    "<extra_id_1>User\n"
+    "Who are you?\n"
+    "<extra_id_1>Assistant\n"
+)
+print(generate(model, tokenizer, prompt, max_tokens=256))
+```
+## Benchmark (Apple Silicon, single prompt, 23 tokens)
+| Variant | tok/s |
+|---|---|
+| bf16 (this) | 2.47 |
+| 4-bit default | 4.37 |
+| mxfp4-q4 | 4.56 |
+| nvfp4-q4 | 9.69 |
+| mixed-3-6 | 9.72 |
+## Original model
+See [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct)
+for the original model card, license, and usage terms.