Configuration Parsing Warning: In config.json: "quantization_config.bits" must be an integer

Qwen3-VL-32B-Thinking-EXL3-3.5bpw

ExLlamaV3 quantization of Qwen/Qwen3-VL-32B-Thinking - A vision-language model with enhanced reasoning capabilities.

Quantization Details

Parameter	Value
Bits per Weight	3.5 bpw
Head Bits	6 bpw
Calibration Rows	128
Calibration Context	4096 tokens
Format	ExLlamaV3 (EXL3)
Size	~17 GB

Model Capabilities

Vision + Reasoning: Process images with chain-of-thought analysis
Thinking Mode: <think>...</think> tags for complex visual reasoning
Context Window: 32K tokens
Image Support: Single/multiple images, various resolutions
Video Support: Frame-by-frame analysis

Hardware Requirements

GPU	VRAM	Notes
RTX 4090	24 GB	Fits with moderate context + images
RTX 3090	24 GB	Works, may need lower context with large images
A100 40GB	40 GB	Comfortable for all use cases

Use Cases

Screenshot Analysis: Understand UI, extract information
Document OCR: Read and interpret documents with reasoning
Visual Q&A: Answer questions about images with explanations
Code from Screenshots: Analyze and explain code in images

Usage with TabbyAPI

# config.yml
model:
  model_dir: models
  model_name: Qwen3-VL-32B-Thinking-EXL3-3.5bpw

network:
  host: 0.0.0.0
  port: 5000

model_defaults:
  max_seq_len: 16384
  cache_mode: Q4

Recommended Settings

Visual Reasoning (detailed analysis):

Temperature: 0.6
Top-P: 0.95
Enable thinking mode

Quick Visual Tasks (fast responses):

Temperature: 0.7
Top-P: 0.8
Disable thinking mode

Original Model

This is a quantization of Qwen/Qwen3-VL-32B-Thinking. All credit for the base model goes to the Qwen team at Alibaba.

License

Apache 2.0 (inherited from base model)

Downloads last month: 8

Safetensors

Model size

9B params

Tensor type

F16

I16

BF16

Model tree for nullrunner/Qwen3-VL-32B-Thinking-EXL3-3.5bpw

Base model

Qwen/Qwen3-VL-32B-Thinking

Quantized

(26)

this model