Qwen3-Swallow-30B-A3B-SFT-v0.2 — MLX 4bit

MLX 4bit quantized version of tokyotech-llm/Qwen3-Swallow-30B-A3B-SFT-v0.2 for Apple Silicon Macs.

Model Details

Original model: tokyotech-llm/Qwen3-Swallow-30B-A3B-SFT-v0.2
Architecture: Mixture of Experts (MoE) — 30B total parameters, 3B active parameters
Training: Japanese Continued Pre-Training + Supervised Fine-Tuning by Tokyo Institute of Technology Swallow Project
License: Apache 2.0

Conversion Details

Item	Value
Conversion tool	mlx-lm
Quantization	4bit
Model size	~17 GB
Source	`tokyotech-llm/Qwen3-Swallow-30B-A3B-SFT-v0.2`

Performance (MacBook Pro M4 Max, 128GB)

Metric	Value
Generation speed	120.6 tokens/s
Peak memory usage	17.3 GB

Usage

Install

pip install mlx-lm

Text Generation

mlx_lm.generate \
  --model tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit \
  --prompt "生成AIについて、10歳向けの説明をして" \
  --max-tokens 500

Chat

mlx_lm.chat --model tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit

OpenAI-Compatible API Server

mlx_lm.server \
  --model tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit \
  --port 8080

Python API

from mlx_lm import load, generate

model, tokenizer = load("tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit")
messages = [{"role": "user", "content": "日本の四季の魅力を説明して"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=500))

Recommended Hardware

Machine	Memory	Status
M4 Max 128GB	Plenty of headroom	✅
M4 Pro 64GB	Comfortable	✅
M4 Pro 48GB	Comfortable	✅
M1/M2/M3 16GB	Tight	⚠️

Other Variants

Precision	Repository	Size	Speed
4bit (this model)	tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit	17 GB	120.6 tok/s
8bit	tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-8bit	32 GB	89.7 tok/s
fp16	tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-fp16	61 GB	60.9 tok/s

Compatible Tools

mlx-lm (CLI / Python / API server)
LM Studio (GUI app)
Pico AI Server (App Store)

Downloads last month: 86

Safetensors

Model size

31B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for tocchitocchi/Qwen3-Swallow-30B-A3B-SFT-v0.2-MLX-4bit

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

tokyotech-llm/Qwen3-Swallow-30B-A3B-CPT-v0.2

Finetuned

tokyotech-llm/Qwen3-Swallow-30B-A3B-SFT-v0.2

Quantized

(3)

this model