--- base_model: ricdomolm/mini-coder-1.7b library_name: mlx tags: - mlx - quantized - 4-bit - code-generation --- # Mini-Coder 1.7B - MLX 4-bit This is the [ricdomolm/mini-coder-1.7b](https://huggingface.co/ricdomolm/mini-coder-1.7b) model quantized into **4-bit MLX format** for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips). The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio. ## How to use it with MLX You can load and run this model directly in Python using the official `mlx-lm` library. ### 1. Installation If you haven't already, install the necessary package: ```bash pip install mlx-lm ``` ### 2. Execution (Inference) Here is a quick Python script to generate code: ```python from mlx_lm import load, generate # Loading the model from your Hugging Face hub model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit" model, tokenizer = load(model_path) prompt = "Write a Python function to calculate the Fibonacci sequence." # If the model uses a specific chat template, apply it: if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) response = generate( model, tokenizer, prompt=prompt, max_tokens=512, verbose=True, temp=0.2 # Keep the temperature low for better code generation ) ``` ## Quantization Details * **Framework:** MLX * **Bits:** 4 * **Base Model:** ricdomolm/mini-coder-1.7b