Upload README.md with huggingface_hub

af2c835 verified about 1 month ago

1.32 kB

language:
  - en
license: other
license_name: nvidia-open-model-license
license_link: >-
  https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
tags:
  - mlx
  - llm
  - nemotron
  - apple-silicon
base_model: nvidia/Nemotron-Mini-4B-Instruct

Nemotron-Mini-4B-Instruct-4bit-mlx

This model was converted from nvidia/Nemotron-Mini-4B-Instruct to MLX format for use on Apple Silicon.

Quantization: 4-bit default affine quantization (~4.5 bpw)

Usage

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx")

prompt = (
    "<extra_id_0>System\n"
    "You are a helpful, honest AI assistant.\n\n"
    "<extra_id_1>User\n"
    "Who are you?\n"
    "<extra_id_1>Assistant\n"
)

print(generate(model, tokenizer, prompt, max_tokens=256))

Benchmark (Apple Silicon, single prompt, 23 tokens)

Variant	tok/s
bf16 (this)	2.47
4-bit default	4.37
mxfp4-q4	4.56
nvfp4-q4	9.69
mixed-3-6	9.72

Original model

See nvidia/Nemotron-Mini-4B-Instruct for the original model card, license, and usage terms.