Qwen3-VL-235B-A22B-Instruct — MLX nvfp4

MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.

Quantization

Parameter	Value
Format	MLX safetensors
Quantization	nvfp4
Bits per weight	4.528
Group size	32
Shards	24
Total size	133.41nvfp4 GB

Usage

pip install mlx-vlm

# Text generation
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4 \
    --prompt "What model are you?" \
    --max-tokens 128

# Vision
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4 \
    --image photo.jpg \
    --prompt "Describe this image in detail." \
    --max-tokens 256

Hardware Requirements

Apple Silicon with ≥128 GB unified memory (tested on M3 Ultra 512 GB)
macOS 15+, MLX 0.30.4+

Model Details

Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
Parameters: 235B total, ~22B active per token
Capabilities: Text, image, and video understanding
Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models

Conversion

Converted with mlx-vlm (patched for 235B+ model support):

python -m mlx_vlm convert \
    --hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
    -q --q-bits 4 --q-mode nvfp4 --q-group-size 32 \
    --mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4

Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.

Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4

Base model

Qwen/Qwen3-VL-235B-A22B-Instruct

Quantized

(27)

this model

LibraxisAI
/

Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4