You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3-VL-2B-Instruct (Base)

Zero-shot baseline of Qwen3-VL-2B-Instruct for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #7/21 on the Denali-AI eval_hard_3500 benchmark with 76.4% weighted score (zero-shot).

Model Details

Property	Value
Architecture	Qwen3-VL
Parameters	2B
Base Model	Qwen/Qwen3-VL-2B-Instruct
Training	None (zero-shot baseline)
Task	Garment Attribute Extraction (9-field JSON)
Output Format	Structured JSON

Key Highlights

Zero-shot baseline — no task-specific fine-tuning applied
100% JSON parse rate — model produces valid structured JSON out of the box
Serves as the comparison point for Denali-AI fine-tuned variants
Throughput: 15.1 samples/s

Benchmark Results

Rank #7/21 on eval_hard_3500

Metric	Score
Weighted Score	76.4%
SBERT+NLI Combined	66.7%
JSON Parse Rate	100%
Throughput	15.1 samples/s
Inference Time	231s (3500 samples)

Per-Field Scores

Field	SBERT	NLI	Levenshtein	Token F1	SBERT+NLI	Weight
type	78.0%	65.4%	70.5%	55.6%	68.3%	2.5x
color	79.3%	53.9%	60.6%	33.2%	68.6%	1.0x
pattern	64.2%	59.7%	60.1%	43.1%	52.9%	1.0x
closure	40.8%	30.7%	39.6%	24.7%	32.3%	1.0x
sleeve	62.6%	87.2%	62.1%	64.3%	78.1%	1.0x
neckline	64.0%	69.7%	61.8%	55.5%	60.0%	1.0x
defect	55.0%	55.2%	54.5%	54.1%	54.9%	2.0x
brand	86.6%	86.6%	87.0%	85.8%	86.4%	1.5x
size	99.0%	98.9%	98.9%	98.9%	98.9%	1.5x

Visualizations

Full Leaderboard

Rank	Model	Weighted	SBERT+NLI	JSON Parse	Throughput	Inference
1	qwen3-vl-8b-sft+grpo	80.9%	78.7%	100%	7.5/s	464s
2	qwen3-vl-2b-sft-grpo-v9	79.9%	78.5%	100%	15.9/s	220s
3	qwen3-vl-8b-instruct-base	78.1%	75.6%	100%	5.5/s	640s
4	qwen3-vl-8b-instruct-nvfp4	77.8%	75.0%	100%	8.2/s	424s
5	qwen35-2b-base	76.2%	73.0%	100%	6.6/s	534s
6	qwen3-vl-2b-sft-grpo-v9-nvfp4	74.6%	74.1%	100%	17.2/s	203s
7	qwen3-vl-2b-instruct-base >>>	68.0%	66.7%	100%	15.1/s	231s
8	internvl3-2b-grpo-gtpo-full	67.5%	64.3%	100%	11.8/s	297s
9	internvl3-2b-grpo-gtpo-fp8	67.1%	63.8%	100%	14.3/s	244s
10	internvl3-2b-base	66.8%	63.7%	100%	11.8/s	297s
11	moondream2-base	63.8%	61.8%	100%	1.4/s	2416s
12	qwen35-2b-sft-grpo-gtpo-v8	60.7%	60.1%	100%	11.3/s	309s
13	qwen35-2b-sft-v7	58.6%	58.9%	100%	11.6/s	302s
14	qwen35-35b-a3b-gptq-int4	51.5%	48.7%	14%	1.6/s	2124s
15	qwen35-9b-nvfp4-v10	48.9%	46.0%	8%	1.7/s	2075s
16	qwen35-9b-sft-nvfp4-v11	48.3%	45.5%	8%	1.7/s	2023s
17	qwen35-2b-base-nvfp4-v10	45.9%	42.9%	0%	4.0/s	878s
18	qwen3.5-122b-a10b-nvfp4	45.9%	42.9%	0%	1.2/s	2893s
19	qwen35-2b-sft-nvfp4-v11	45.9%	42.9%	0%	4.0/s	876s
20	qwen35-2b-sft-grpo-gtpo-nvfp4	45.9%	42.9%	0%	3.9/s	907s
21	qwen3-vl-8b-sft-grpo	0.0%	0.0%	100%	0.0/s	462s

Comparative Analysis

vs qwen3-vl-2b-sft-grpo-v9 (fine-tuned, #1): -13.1pp weighted. Fine-tuning improved this model by 13.1pp.

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric	Description
SBERT Cosine	Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score	Natural language inference entailment scoring
Levenshtein Ratio	Fuzzy string matching
Token F1	Token-level precision/recall
Weighted Score	Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen3-vl-2b-instruct-base,
  title={Qwen3-VL-2B-Instruct (Base)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen3-vl-2b-instruct-base}
}

License

This model is released under the Apache 2.0 License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Denali-AI/qwen3-vl-2b-instruct-base

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

(210)

this model

Collection including Denali-AI/qwen3-vl-2b-instruct-base

Qwen3-VL Models

Collection

Garment classification models based on Qwen3-VL (2B) • 7 items • Updated Mar 31