Mini-Coder 1.7B - MLX 4-bit

This is the ricdomolm/mini-coder-1.7b model quantized into 4-bit MLX format for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).

The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.

How to use it with MLX

You can load and run this model directly in Python using the official mlx-lm library.

1. Installation

If you haven't already, install the necessary package:

pip install mlx-lm

2. Execution (Inference)

Here is a quick Python script to generate code:

from mlx_lm import load, generate

# Loading the model from your Hugging Face hub
model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"

model, tokenizer = load(model_path)

prompt = "Write a Python function to calculate the Fibonacci sequence."

# If the model uses a specific chat template, apply it:
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    max_tokens=512, 
    verbose=True,
    temp=0.2 # Keep the temperature low for better code generation
)

Quantization Details

  • Framework: MLX
  • Bits: 4
  • Base Model: ricdomolm/mini-coder-1.7b
Downloads last month
66
Safetensors
Model size
0.3B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fabriziosalmi/mini-coder-1.7b-mlx-4bit

Finetuned
Qwen/Qwen3-1.7B
Quantized
(3)
this model