Baseline should be inflight quant with vLLM, I haven't tested against it yet.

Original Model

Original Inference Code

System Prompt:

You FIRST think about the reasoning process as an internal monologue and then provide the final answer.\nThe reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE enclosed within <answer> </answer> tags.
Downloads last month
5
Safetensors
Model size
8B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for comptechco/WeThink-Qwen2.5VL-7B-bnb-4bit

Quantized
(3)
this model