fabriziosalmi commited on
Commit
0330d66
·
verified ·
1 Parent(s): b3d6d01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -7
README.md CHANGED
@@ -1,12 +1,68 @@
1
  ---
 
2
  library_name: mlx
3
  tags:
4
- - agent
5
- - code
6
  - mlx
7
- license: mit
8
- datasets:
9
- - ricdomolm/mini-coder-trajs-400k
10
- base_model: ricdomolm/mini-coder-1.7b
11
- pipeline_tag: text-generation
12
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: ricdomolm/mini-coder-1.7b
3
  library_name: mlx
4
  tags:
 
 
5
  - mlx
6
+ - quantized
7
+ - 4-bit
8
+ - code-generation
 
 
9
  ---
10
+
11
+ # Mini-Coder 1.7B - MLX 4-bit
12
+
13
+ This is the [ricdomolm/mini-coder-1.7b](https://huggingface.co/ricdomolm/mini-coder-1.7b) model quantized into **4-bit MLX format** for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).
14
+
15
+ The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.
16
+
17
+ ## 💻 How to use it with MLX
18
+
19
+ You can load and run this model directly in Python using the official `mlx-lm` library.
20
+
21
+ ### 1. Installation
22
+
23
+ If you haven't already, install the necessary package:
24
+
25
+ ```bash
26
+ pip install mlx-lm
27
+
28
+ ```
29
+
30
+ ### 2. Execution (Inference)
31
+
32
+ Here is a quick Python script to generate code:
33
+
34
+ ```python
35
+ from mlx_lm import load, generate
36
+
37
+ # Loading the model from your Hugging Face hub
38
+ model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"
39
+
40
+ model, tokenizer = load(model_path)
41
+
42
+ prompt = "Write a Python function to calculate the Fibonacci sequence."
43
+
44
+ # If the model uses a specific chat template, apply it:
45
+ if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
46
+ messages = [{"role": "user", "content": prompt}]
47
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
48
+
49
+ response = generate(
50
+ model,
51
+ tokenizer,
52
+ prompt=prompt,
53
+ max_tokens=512,
54
+ verbose=True,
55
+ temp=0.2 # Keep the temperature low for better code generation
56
+ )
57
+
58
+ ```
59
+
60
+ ## ⚙️ Quantization Details
61
+
62
+ * **Framework:** MLX
63
+ * **Bits:** 4
64
+ * **Base Model:** ricdomolm/mini-coder-1.7b
65
+
66
+
67
+
68
+