Qwen3-VL-235B-A22B-Instruct โ MLX nvfp4
MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.
Quantization
| Parameter | Value |
|---|---|
| Format | MLX safetensors |
| Quantization | nvfp4 |
| Bits per weight | 4.528 |
| Group size | 32 |
| Shards | 24 |
| Total size | 133.41nvfp4 GB |
Usage
pip install mlx-vlm
# Text generation
python -m mlx_vlm generate \
--model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4 \
--prompt "What model are you?" \
--max-tokens 128
# Vision
python -m mlx_vlm generate \
--model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4 \
--image photo.jpg \
--prompt "Describe this image in detail." \
--max-tokens 256
Hardware Requirements
- Apple Silicon with โฅ128 GB unified memory (tested on M3 Ultra 512 GB)
- macOS 15+, MLX 0.30.4+
Model Details
- Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
- Parameters: 235B total, ~22B active per token
- Capabilities: Text, image, and video understanding
- Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models
Conversion
Converted with mlx-vlm (patched for 235B+ model support):
python -m mlx_vlm convert \
--hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
-q --q-bits 4 --q-mode nvfp4 --q-group-size 32 \
--mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4
Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.
See Also
- mxfp8 version โ higher precision, 243.61 GB
- mxfp4 version - different quant โ higher precision, 126.06 GB
Vibecrafted with AI Agents by VetCoders (c)2026 The LibraxisAI Team
- Downloads last month
- 131
Model size
59B params
Tensor type
U8
ยท
U32 ยท
BF16 ยท
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-nvfp4
Base model
Qwen/Qwen3-VL-235B-A22B-Instruct