Instructions to use mlx-community/Nemotron-Mini-4B-Instruct-nvfp4-4bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Nemotron-Mini-4B-Instruct-nvfp4-4bit-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Nemotron-Mini-4B-Instruct-nvfp4-4bit-mlx mlx-community/Nemotron-Mini-4B-Instruct-nvfp4-4bit-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Nemotron-Mini-4B-Instruct-nvfp4-4bit-mlx
This model was converted from nvidia/Nemotron-Mini-4B-Instruct to MLX format for use on Apple Silicon.
Quantization: 4-bit nvfp4 quantization
Usage
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Nemotron-Mini-4B-Instruct-nvfp4-4bit-mlx")
prompt = (
"<extra_id_0>System\n"
"You are a helpful, honest AI assistant.\n\n"
"<extra_id_1>User\n"
"Who are you?\n"
"<extra_id_1>Assistant\n"
)
print(generate(model, tokenizer, prompt, max_tokens=256))
Benchmark (Apple Silicon, single prompt, 23 tokens)
| Variant | tok/s |
|---|---|
| bf16 (this) | 2.47 |
| 4-bit default | 4.37 |
| mxfp4-q4 | 4.56 |
| nvfp4-q4 | 9.69 |
| mixed-3-6 | 9.72 |
Original model
See nvidia/Nemotron-Mini-4B-Instruct for the original model card, license, and usage terms.
- Downloads last month
- 149
Model size
4B params
Tensor type
BF16
·
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for mlx-community/Nemotron-Mini-4B-Instruct-nvfp4-4bit-mlx
Base model
nvidia/Nemotron-Mini-4B-Instruct