Qwen3-Swallow-30B-A3B-SFT-v0.2 โ MLX 4bit
MLX 4bit quantized version of tokyotech-llm/Qwen3-Swallow-30B-A3B-SFT-v0.2 for Apple Silicon Macs.
Model Details
- Original model: tokyotech-llm/Qwen3-Swallow-30B-A3B-SFT-v0.2
- Architecture: Mixture of Experts (MoE) โ 30B total parameters, 3B active parameters
- Training: Japanese Continued Pre-Training + Supervised Fine-Tuning by Tokyo Institute of Technology Swallow Project
- License: Apache 2.0
Conversion Details
| Item | Value |
|---|---|
| Conversion tool | mlx-lm |
| Quantization | 4bit |
| Model size | ~17 GB |
| Source | tokyotech-llm/Qwen3-Swallow-30B-A3B-SFT-v0.2 |
Performance (MacBook Pro M4 Max, 128GB)
| Metric | Value |
|---|---|
| Generation speed | 120.6 tokens/s |
| Peak memory usage | 17.3 GB |
Usage
Install
pip install mlx-lm
Text Generation
mlx_lm.generate \
--model tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit \
--prompt "็ๆAIใซใคใใฆใ10ๆญณๅใใฎ่ชฌๆใใใฆ" \
--max-tokens 500
Chat
mlx_lm.chat --model tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit
OpenAI-Compatible API Server
mlx_lm.server \
--model tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit \
--port 8080
Python API
from mlx_lm import load, generate
model, tokenizer = load("tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit")
messages = [{"role": "user", "content": "ๆฅๆฌใฎๅๅญฃใฎ้ญ
ๅใ่ชฌๆใใฆ"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=500))
Recommended Hardware
| Machine | Memory | Status |
|---|---|---|
| M4 Max 128GB | Plenty of headroom | โ |
| M4 Pro 64GB | Comfortable | โ |
| M4 Pro 48GB | Comfortable | โ |
| M1/M2/M3 16GB | Tight | โ ๏ธ |
Other Variants
| Precision | Repository | Size | Speed |
|---|---|---|---|
| 4bit (this model) | tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit | 17 GB | 120.6 tok/s |
| 8bit | tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-8bit | 32 GB | 89.7 tok/s |
| fp16 | tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-fp16 | 61 GB | 60.9 tok/s |
Compatible Tools
- mlx-lm (CLI / Python / API server)
- LM Studio (GUI app)
- Pico AI Server (App Store)
- Downloads last month
- 86
Model size
31B params
Tensor type
BF16
ยท
U32 ยท
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit
Base model
Qwen/Qwen3-30B-A3B-Base