fabriziosalmi
/

mini-coder-1.7b-mlx-4bit

4-bit precision

code-generation

Model card Files Files and versions

fabriziosalmi commited on 5 days ago

Commit

0330d66

·

verified ·

1 Parent(s): b3d6d01

Update README.md

Files changed (1) hide show

README.md +63 -7

README.md CHANGED Viewed

@@ -1,12 +1,68 @@
 ---
 library_name: mlx
 tags:
-- agent
-- code
 - mlx
-license: mit
-datasets:
-- ricdomolm/mini-coder-trajs-400k
-base_model: ricdomolm/mini-coder-1.7b
-pipeline_tag: text-generation
 ---

 ---
+base_model: ricdomolm/mini-coder-1.7b
 library_name: mlx
 tags:
 - mlx
+- quantized
+- 4-bit
+- code-generation
 ---
+# Mini-Coder 1.7B - MLX 4-bit
+This is the [ricdomolm/mini-coder-1.7b](https://huggingface.co/ricdomolm/mini-coder-1.7b) model quantized into **4-bit MLX format** for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).
+The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.
+## 💻 How to use it with MLX
+You can load and run this model directly in Python using the official `mlx-lm` library.
+### 1. Installation
+If you haven't already, install the necessary package:
+```bash
+pip install mlx-lm
+```
+### 2. Execution (Inference)
+Here is a quick Python script to generate code:
+```python
+from mlx_lm import load, generate
+# Loading the model from your Hugging Face hub
+model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"
+model, tokenizer = load(model_path)
+prompt = "Write a Python function to calculate the Fibonacci sequence."
+# If the model uses a specific chat template, apply it:
+if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
+    messages = [{"role": "user", "content": prompt}]
+    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+response = generate(
+    model,
+    tokenizer,
+    prompt=prompt,
+    max_tokens=512,
+    verbose=True,
+    temp=0.2 # Keep the temperature low for better code generation
+)
+```
+## ⚙️ Quantization Details
+* **Framework:** MLX
+* **Bits:** 4
+* **Base Model:** ricdomolm/mini-coder-1.7b