c2p-cmd's picture
Upload README.md with huggingface_hub
af2c835 verified
---
language:
- en
license: other
license_name: nvidia-open-model-license
license_link: >-
https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
tags:
- mlx
- llm
- nemotron
- apple-silicon
base_model: nvidia/Nemotron-Mini-4B-Instruct
---
# Nemotron-Mini-4B-Instruct-4bit-mlx
This model was converted from [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct)
to [MLX](https://github.com/ml-explore/mlx) format for use on Apple Silicon.
**Quantization:** 4-bit default affine quantization (~4.5 bpw)
## Usage
```python
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx")
prompt = (
"<extra_id_0>System\n"
"You are a helpful, honest AI assistant.\n\n"
"<extra_id_1>User\n"
"Who are you?\n"
"<extra_id_1>Assistant\n"
)
print(generate(model, tokenizer, prompt, max_tokens=256))
```
## Benchmark (Apple Silicon, single prompt, 23 tokens)
| Variant | tok/s |
|---|---|
| bf16 (this) | 2.47 |
| 4-bit default | 4.37 |
| mxfp4-q4 | 4.56 |
| nvfp4-q4 | 9.69 |
| mixed-3-6 | 9.72 |
## Original model
See [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct)
for the original model card, license, and usage terms.