Qwen3-VL-8B-Thinking - Gemini 3 Distill scale 6

Proper grad norm and alpha. Fixed template

Fine-tuned on 1k dataset distilled from Gemini 3.

Base Model

  • unsloth/Qwen3-VL-8B-Thinking

Training

  • Dataset: 1k Gemini 3 distillation samples
Downloads last month
31
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for fremko/Qwen3-VL-8B-Thinking-norm

Finetuned
(3)
this model
Quantizations
1 model