fabriziosalmi
/

mini-coder-1.7b-mlx-4bit

4-bit precision

code-generation

Model card Files Files and versions

mini-coder-1.7b-mlx-4bit / README.md

fabriziosalmi's picture

Update README.md

32c1f7e verified 4 days ago

|

history blame contribute delete

1.75 kB

	---
	base_model: ricdomolm/mini-coder-1.7b
	library_name: mlx
	tags:
	- mlx
	- quantized
	- 4-bit
	- code-generation
	---

	# Mini-Coder 1.7B - MLX 4-bit

	This is the [ricdomolm/mini-coder-1.7b](https://huggingface.co/ricdomolm/mini-coder-1.7b) model quantized into 4-bit MLX format for native, ultra-fast execution on Apple Silicon devices (M1/M2/M3/M4 chips).

	The conversion was performed to ensure the best trade-off between inference speed and the quality of the generated code, while keeping the unified RAM footprint to a minimum. I got 86 tps on MacBook Pro M4 16GB by using this model in LMStudio.

	## How to use it with MLX

	You can load and run this model directly in Python using the official `mlx-lm` library.

	### 1. Installation

	If you haven't already, install the necessary package:

	```bash
	pip install mlx-lm

	```

	### 2. Execution (Inference)

	Here is a quick Python script to generate code:

	```python
	from mlx_lm import load, generate

	# Loading the model from your Hugging Face hub
	model_path = "fabriziosalmi/mini-coder-1.7b-mlx-4bit"

	model, tokenizer = load(model_path)

	prompt = "Write a Python function to calculate the Fibonacci sequence."

	# If the model uses a specific chat template, apply it:
	if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
	messages = [{"role": "user", "content": prompt}]
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	response = generate(
	model,
	tokenizer,
	prompt=prompt,
	max_tokens=512,
	verbose=True,
	temp=0.2 # Keep the temperature low for better code generation
	)

	```

	## Quantization Details

	* Framework: MLX
	* Bits: 4
	* Base Model: ricdomolm/mini-coder-1.7b