c2p-cmd commited on
Commit
af2c835
·
verified ·
1 Parent(s): 1c0595e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ license_name: nvidia-open-model-license
6
+ license_link: >-
7
+ https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
8
+ tags:
9
+ - mlx
10
+ - llm
11
+ - nemotron
12
+ - apple-silicon
13
+ base_model: nvidia/Nemotron-Mini-4B-Instruct
14
+ ---
15
+
16
+ # Nemotron-Mini-4B-Instruct-4bit-mlx
17
+
18
+ This model was converted from [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct)
19
+ to [MLX](https://github.com/ml-explore/mlx) format for use on Apple Silicon.
20
+
21
+ **Quantization:** 4-bit default affine quantization (~4.5 bpw)
22
+
23
+ ## Usage
24
+
25
+ ```python
26
+ from mlx_lm import load, generate
27
+
28
+ model, tokenizer = load("mlx-community/Nemotron-Mini-4B-Instruct-4bit-mlx")
29
+
30
+ prompt = (
31
+ "<extra_id_0>System\n"
32
+ "You are a helpful, honest AI assistant.\n\n"
33
+ "<extra_id_1>User\n"
34
+ "Who are you?\n"
35
+ "<extra_id_1>Assistant\n"
36
+ )
37
+
38
+ print(generate(model, tokenizer, prompt, max_tokens=256))
39
+ ```
40
+
41
+ ## Benchmark (Apple Silicon, single prompt, 23 tokens)
42
+
43
+ | Variant | tok/s |
44
+ |---|---|
45
+ | bf16 (this) | 2.47 |
46
+ | 4-bit default | 4.37 |
47
+ | mxfp4-q4 | 4.56 |
48
+ | nvfp4-q4 | 9.69 |
49
+ | mixed-3-6 | 9.72 |
50
+
51
+ ## Original model
52
+
53
+ See [nvidia/Nemotron-Mini-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Mini-4B-Instruct)
54
+ for the original model card, license, and usage terms.