You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3-VL-8B-Instruct (Base)

Zero-shot baseline of Qwen3-VL-8B-Instruct for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #3/21 on the Denali-AI eval_hard_3500 benchmark with 78.1% weighted score (zero-shot).

Model Details

Property	Value
Architecture	Qwen3-VL
Parameters	8B
Base Model	Qwen/Qwen3-VL-8B-Instruct
Training	None (zero-shot baseline)
Task	Garment Attribute Extraction (9-field JSON)
Output Format	Structured JSON

Key Highlights

Zero-shot baseline — no task-specific fine-tuning applied
100% JSON parse rate — model produces valid structured JSON out of the box
#3/21 on the Denali-AI garment classification leaderboard
Strongest zero-shot base model tested — outperforms all fine-tuned 2B models except the SFT+GRPO variant
Throughput: 5.5 samples/s

Benchmark Results

Rank #3/21 on eval_hard_3500

Metric	Score
Weighted Score	78.1%
SBERT+NLI Combined	75.6%
JSON Parse Rate	100%
Throughput	5.5 samples/s
Inference Time	640s (3500 samples)

Per-Field Scores

Field	SBERT	NLI	Levenshtein	Token F1	SBERT+NLI	Weight
type	78.9%	67.0%	72.0%	59.6%	69.6%	2.5x
color	80.7%	61.7%	65.9%	41.6%	71.2%	1.0x
pattern	67.6%	67.1%	63.6%	48.0%	59.9%	1.0x
closure	43.2%	34.6%	41.0%	29.1%	35.4%	1.0x
sleeve	77.2%	88.1%	76.6%	77.1%	82.9%	1.0x
neckline	80.8%	75.0%	79.9%	73.3%	73.5%	1.0x
defect	96.1%	96.1%	95.9%	95.5%	96.0%	2.0x
brand	93.4%	93.4%	93.5%	92.6%	93.2%	1.5x
size	98.8%	98.7%	98.7%	98.6%	98.7%	1.5x

Visualizations

Full Leaderboard

Rank	Model	Weighted	SBERT+NLI	JSON Parse	Throughput	Inference
1	qwen3-vl-8b-sft+grpo	80.9%	78.7%	100%	7.5/s	464s
2	qwen3-vl-2b-sft-grpo-v9	79.9%	78.5%	100%	15.9/s	220s
3	qwen3-vl-8b-instruct-base >>>	78.1%	75.6%	100%	5.5/s	640s
4	qwen3-vl-8b-instruct-nvfp4	77.8%	75.0%	100%	8.2/s	424s
5	qwen35-2b-base	76.2%	73.0%	100%	6.6/s	534s
6	qwen3-vl-2b-sft-grpo-v9-nvfp4	74.6%	74.1%	100%	17.2/s	203s
7	qwen3-vl-2b-instruct-base	68.0%	66.7%	100%	15.1/s	231s
8	internvl3-2b-grpo-gtpo-full	67.5%	64.3%	100%	11.8/s	297s
9	internvl3-2b-grpo-gtpo-fp8	67.1%	63.8%	100%	14.3/s	244s
10	internvl3-2b-base	66.8%	63.7%	100%	11.8/s	297s
11	moondream2-base	63.8%	61.8%	100%	1.4/s	2416s
12	qwen35-2b-sft-grpo-gtpo-v8	60.7%	60.1%	100%	11.3/s	309s
13	qwen35-2b-sft-v7	58.6%	58.9%	100%	11.6/s	302s
14	qwen35-35b-a3b-gptq-int4	51.5%	48.7%	14%	1.6/s	2124s
15	qwen35-9b-nvfp4-v10	48.9%	46.0%	8%	1.7/s	2075s
16	qwen35-9b-sft-nvfp4-v11	48.3%	45.5%	8%	1.7/s	2023s
17	qwen35-2b-base-nvfp4-v10	45.9%	42.9%	0%	4.0/s	878s
18	qwen3.5-122b-a10b-nvfp4	45.9%	42.9%	0%	1.2/s	2893s
19	qwen35-2b-sft-nvfp4-v11	45.9%	42.9%	0%	4.0/s	876s
20	qwen35-2b-sft-grpo-gtpo-nvfp4	45.9%	42.9%	0%	3.9/s	907s
21	qwen3-vl-8b-sft-grpo	0.0%	0.0%	100%	0.0/s	462s

Comparative Analysis

vs qwen3-vl-2b-sft-grpo-v9: -1.8pp weighted score
- type: +1.4pp
- color: -5.8pp
- pattern: -2.9pp
- closure: -26.0pp
- sleeve: -2.7pp
- neckline: +3.8pp
- defect: -1.2pp
- brand: +3.9pp
- size: +2.9pp
vs qwen3-vl-2b-instruct-base: +10.1pp weighted score
- type: +1.3pp
- color: +2.6pp
- pattern: +7.1pp
- closure: +3.0pp
- sleeve: +4.8pp
- neckline: +13.5pp
- defect: +41.1pp
- brand: +6.8pp
- size: -0.2pp
vs qwen35-2b-base: +1.9pp weighted score
- type: +1.3pp
- color: +3.9pp
- pattern: -0.0pp
- closure: +1.4pp
- sleeve: +2.4pp
- neckline: +16.1pp
- defect: -0.0pp
- brand: -1.0pp
- size: -0.6pp

Improvement Recommendations

Fine-tuning (SFT): The 2B Qwen3-VL variant gained +13.1pp from SFT+GRPO. Applying SFT to this 8B model could push it well above 80%.
Closure field: At 35.4% SBERT+NLI, closure is the weakest field — targeted data augmentation for closure types (zipper, button, snap, etc.) would help significantly.
GRPO/GTPO reinforcement: After SFT, reward-model-based RL (GRPO or GTPO) could further refine per-field accuracy, especially on high-weight fields (type, defect).
Quantization: NVFP4 quantization could improve throughput from 5.5 to ~10+ samples/s while maintaining accuracy (unlike Qwen3.5 NVFP4 which degraded badly).

Alternative Models

Qwen3-VL-4B-Instruct — mid-point between 2B and 8B, worth testing for speed/quality tradeoff
InternVL3-8B — competing 8B VLM architecture, could serve as a direct comparison
Qwen3.5-VL-8B — newer architecture revision, may have improved structured output compliance
SmolVLM2-8B — alternative lightweight VLM worth benchmarking

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric	Description
SBERT Cosine	Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score	Natural language inference entailment scoring
Levenshtein Ratio	Fuzzy string matching
Token F1	Token-level precision/recall
Weighted Score	Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen3-vl-8b-instruct-base,
  title={Qwen3-VL-8B-Instruct (Base)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen3-vl-8b-instruct-base}
}

License

This model is released under the Apache 2.0 License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Denali-AI/qwen3-vl-8b-instruct-base

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(276)

this model

Collection including Denali-AI/qwen3-vl-8b-instruct-base

Qwen3-VL Models

Collection

Garment classification models based on Qwen3-VL (2B) • 7 items • Updated Mar 31