---
base_model: ricdomolm/mini-coder-1.7b
library_name: mlx
tags:
- mlx
- quantized
- 4-bit
- code-generation
---

# Mini-Coder 1.7B - MLX 4-bit

This is the [ricdomolm/mini-coder-1.7b](https://huggingface.co/ricdomolm/mini-coder-1.7b) model quantized into **4-bit MLX format** for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).

The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.

## How to use it with MLX

You can load and run this model directly in Python using the official `mlx-lm` library.

### 1. Installation

If you haven't already, install the necessary package:

```bash
pip install mlx-lm

```

### 2. Execution (Inference)

Here is a quick Python script to generate code:

```python
from mlx_lm import load, generate

# Loading the model from your Hugging Face hub
model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"

model, tokenizer = load(model_path)

prompt = "Write a Python function to calculate the Fibonacci sequence."

# If the model uses a specific chat template, apply it:
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    max_tokens=512, 
    verbose=True,
    temp=0.2 # Keep the temperature low for better code generation
)

```

## Quantization Details

* **Framework:** MLX
* **Bits:** 4
* **Base Model:** ricdomolm/mini-coder-1.7b