Qwen3-32B 3bit MLX

This model is a 3-bit quantized version of Qwen/Qwen3-32B using MLX.

Model Details

Quantization: 3-bit
Framework: MLX
Base Model: Qwen/Qwen3-32B
Model Size: ~12GB (3-bit quantized)

Usage

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Qwen3-32B-3bit")

prompt = "Hello, how are you?"
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=100)
print(response)

Requirements

Apple Silicon Mac (M1/M2/M3)
macOS 13.0+
Python 3.8+
MLX and mlx-lm packages

Installation

pip install mlx mlx-lm

Downloads last month: 34

MLX

Hardware compatibility

3-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Qwen3-32B-3bit

Base model

Qwen/Qwen3-32B

Quantized

(158)

this model