Instructions to use mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Nemotron-Mini-4B-Instruct-4bit-mlx mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| language: | |
| - en | |
| license: other | |
| license_name: nvidia-open-model-license | |
| license_link: >- | |
| https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf | |
| tags: | |
| - mlx | |
| - llm | |
| - nemotron | |
| - apple-silicon | |
| base_model: nvidia/Nemotron-Mini-4B-Instruct | |
| # Nemotron-Mini-4B-Instruct-4bit-mlx | |
| This model was converted from [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct) | |
| to [MLX](https://github.com/ml-explore/mlx) format for use on Apple Silicon. | |
| **Quantization:** 4-bit default affine quantization (~4.5 bpw) | |
| ## Usage | |
| ```python | |
| from mlx_lm import load, generate | |
| model, tokenizer = load("mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx") | |
| prompt = ( | |
| "<extra_id_0>System\n" | |
| "You are a helpful, honest AI assistant.\n\n" | |
| "<extra_id_1>User\n" | |
| "Who are you?\n" | |
| "<extra_id_1>Assistant\n" | |
| ) | |
| print(generate(model, tokenizer, prompt, max_tokens=256)) | |
| ``` | |
| ## Benchmark (Apple Silicon, single prompt, 23 tokens) | |
| | Variant | tok/s | | |
| |---|---| | |
| | bf16 (this) | 2.47 | | |
| | 4-bit default | 4.37 | | |
| | mxfp4-q4 | 4.56 | | |
| | nvfp4-q4 | 9.69 | | |
| | mixed-3-6 | 9.72 | | |
| ## Original model | |
| See [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct) | |
| for the original model card, license, and usage terms. | |