fabriziosalmi's picture
Update README.md
32c1f7e verified
metadata
base_model: ricdomolm/mini-coder-1.7b
library_name: mlx
tags:
  - mlx
  - quantized
  - 4-bit
  - code-generation

Mini-Coder 1.7B - MLX 4-bit

This is the ricdomolm/mini-coder-1.7b model quantized into 4-bit MLX format for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).

The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.

How to use it with MLX

You can load and run this model directly in Python using the official mlx-lm library.

1. Installation

If you haven't already, install the necessary package:

pip install mlx-lm

2. Execution (Inference)

Here is a quick Python script to generate code:

from mlx_lm import load, generate

# Loading the model from your Hugging Face hub
model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"

model, tokenizer = load(model_path)

prompt = "Write a Python function to calculate the Fibonacci sequence."

# If the model uses a specific chat template, apply it:
if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    max_tokens=512, 
    verbose=True,
    temp=0.2 # Keep the temperature low for better code generation
)

Quantization Details

  • Framework: MLX
  • Bits: 4
  • Base Model: ricdomolm/mini-coder-1.7b