fabriziosalmi's picture
Update README.md
32c1f7e verified
---
base_model: ricdomolm/mini-coder-1.7b
library_name: mlx
tags:
- mlx
- quantized
- 4-bit
- code-generation
---
# Mini-Coder 1.7B - MLX 4-bit
This is the [ricdomolm/mini-coder-1.7b](https://huggingface.co/ricdomolm/mini-coder-1.7b) model quantized into **4-bit MLX format** for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).
The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.
## How to use it with MLX
You can load and run this model directly in Python using the official `mlx-lm` library.
### 1. Installation
If you haven't already, install the necessary package:
```bash
pip install mlx-lm
```
### 2. Execution (Inference)
Here is a quick Python script to generate code:
```python
from mlx_lm import load, generate
# Loading the model from your Hugging Face hub
model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"
model, tokenizer = load(model_path)
prompt = "Write a Python function to calculate the Fibonacci sequence."
# If the model uses a specific chat template, apply it:
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=512,
verbose=True,
temp=0.2 # Keep the temperature low for better code generation
)
```
## Quantization Details
* **Framework:** MLX
* **Bits:** 4
* **Base Model:** ricdomolm/mini-coder-1.7b