Qwen 3.5 122B-A10B — JANG_4K (Mixed-Precision, 4-bit)

JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

Osaurus natively supports JANG models. Download at osaurus.ai.

Model Details

Property	Value
Base Model	Qwen 3.5 VL 122B-A10B
Architecture	MoE Transformer + Vision
Total Parameters	122B (10B active per token)
Profile	JANG_4K
Avg Bits/Weight	3.96
Bit Widths Used	3, 4, 5, 8
Model Size	57.4 GB
Vision	Yes
Format	JANG v2 (MLX-native safetensors)

Benchmarks

200-question MMLU (20 per subject x 10 subjects). Thinking OFF (enable_thinking=False), greedy decoding (temp=0.0).

Model	MMLU	Size
JANG_4K (this)	86%	57.4 GB
MLX 4-bit	85%	64 GB
JANG_2S	79%	30.7 GB
MLX 2-bit	56.5%	36 GB

JANG_4K beats MLX 4-bit by +1 MMLU at 7 GB smaller. Near-lossless quantization of the full 122B model.

Per-Subject Breakdown

Subject	JANG_4K
Abstract Algebra	16/20
Anatomy	19/20
Astronomy	19/20
College CS	15/20
College Physics	14/20
HS Biology	19/20
HS Chemistry	18/20
HS Mathematics	14/20
Logical Fallacies	19/20
World Religions	19/20
Total	172/200 (86%)

JANG_4K Profile

JANG_4K is a balanced 4-bit mixed-precision profile providing near-original quality. Critical layers (attention, routing, embeddings) are kept at 8-bit, with expert MLP weights at 3-5 bit depending on importance scoring. Best quality-to-size ratio for the 122B model.

Usage

# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/Qwen3.5-122B-A10B-JANG_4K

Requirements

Apple Silicon Mac with 96+ GB unified memory (e.g., M2/M3/M4 Ultra)
MLX framework with Qwen 3.5 MoE support

Quantized by Osaurus AI using JANG

Downloads last month: 322

Safetensors

Model size

18B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized