Mini-Coder 1.7B - MLX 4-bit

This is the ricdomolm/mini-coder-1.7b model quantized into 4-bit MLX format for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).

The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.

How to use it with MLX

You can load and run this model directly in Python using the official mlx-lm library.

1. Installation

If you haven't already, install the necessary package:

pip install mlx-lm

2. Execution (Inference)

Here is a quick Python script to generate code:

from mlx_lm import load, generate

# Loading the model from your Hugging Face hub
model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"

model, tokenizer = load(model_path)

prompt = "Write a Python function to calculate the Fibonacci sequence."

# If the model uses a specific chat template, apply it:
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    max_tokens=512, 
    verbose=True,
    temp=0.2 # Keep the temperature low for better code generation
)

Quantization Details

Framework: MLX
Bits: 4
Base Model: ricdomolm/mini-coder-1.7b

Downloads last month: 109

Safetensors

Model size

0.3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fabriziosalmi/mini-coder-1.7b-mlx-4bit

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

ricdomolm/mini-coder-1.7b

Quantized

(3)

this model