--- language: - en license: other license_name: nvidia-open-model-license license_link: >- https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf tags: - mlx - llm - nemotron - apple-silicon base_model: nvidia/Nemotron-Mini-4B-Instruct --- # Nemotron-Mini-4B-Instruct-4bit-mlx This model was converted from [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct) to [MLX](https://github.com/ml-explore/mlx) format for use on Apple Silicon. **Quantization:** 4-bit default affine quantization (~4.5 bpw) ## Usage ```python from mlx_lm import load, generate model, tokenizer = load("mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx") prompt = ( "System\n" "You are a helpful, honest AI assistant.\n\n" "User\n" "Who are you?\n" "Assistant\n" ) print(generate(model, tokenizer, prompt, max_tokens=256)) ``` ## Benchmark (Apple Silicon, single prompt, 23 tokens) | Variant | tok/s | |---|---| | bf16 (this) | 2.47 | | 4-bit default | 4.37 | | mxfp4-q4 | 4.56 | | nvfp4-q4 | 9.69 | | mixed-3-6 | 9.72 | ## Original model See [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct) for the original model card, license, and usage terms.