Qwen3-4B-Instruct-MLX-4bit
This model is a 4-bit quantized version of rautaditya/Qwen3-4B-Instruct-2507-heretic-1 for Apple Silicon using MLX.
Features
- 4-bit Quantization: Affine mode with group size 64 for efficient inference.
- MLX Support: Native support for Mac computers with the M-series chips.
- Chat Ready: Optimized for instruction-following and thinking tasks.
Usage
Installation
Ensure you have mlx-lm installed:
pip install mlx-lm
Inference
You can run the model using the mlx-lm command line or in your Python code.
Example Python Script
from mlx_lm import load, generate
model, tokenizer = load("rautaditya/Qwen3-4B-Instruct-MLX-4bit")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "What are some good features of MLX?"}],
tokenize=False,
add_generation_prompt=True,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)
Details
- Base Model:
rautaditya/Qwen3-4B-Instruct-2507-heretic-1 - Quantization: 4-bit Affine (group_size=64)
- Framework: MLX-LM
- Downloads last month
- 19
Model size
0.6B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit