Instructions to use mlx-community/Qwen3-32B-3bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Qwen3-32B-3bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3-32B-3bit mlx-community/Qwen3-32B-3bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Qwen3-32B 3bit MLX
This model is a 3-bit quantized version of Qwen/Qwen3-32B using MLX.
Model Details
- Quantization: 3-bit
- Framework: MLX
- Base Model: Qwen/Qwen3-32B
- Model Size: ~12GB (3-bit quantized)
Usage
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Qwen3-32B-3bit")
prompt = "Hello, how are you?"
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=100)
print(response)
Requirements
- Apple Silicon Mac (M1/M2/M3)
- macOS 13.0+
- Python 3.8+
- MLX and mlx-lm packages
Installation
pip install mlx mlx-lm
- Downloads last month
- 21
Hardware compatibility
Log In to add your hardware
3-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for mlx-community/Qwen3-32B-3bit
Base model
Qwen/Qwen3-32B