You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3-VL-2B-Instruct (Base)

Zero-shot baseline of Qwen3-VL-2B-Instruct for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #7/21 on the Denali-AI eval_hard_3500 benchmark with 76.4% weighted score (zero-shot).

Model Details

Property Value
Architecture Qwen3-VL
Parameters 2B
Base Model Qwen/Qwen3-VL-2B-Instruct
Training None (zero-shot baseline)
Task Garment Attribute Extraction (9-field JSON)
Output Format Structured JSON

Key Highlights

  • Zero-shot baseline — no task-specific fine-tuning applied
  • 100% JSON parse rate — model produces valid structured JSON out of the box
  • Serves as the comparison point for Denali-AI fine-tuned variants
  • Throughput: 15.1 samples/s

Benchmark Results

Rank #7/21 on eval_hard_3500

Metric Score
Weighted Score 76.4%
SBERT+NLI Combined 66.7%
JSON Parse Rate 100%
Throughput 15.1 samples/s
Inference Time 231s (3500 samples)

Per-Field Scores

Field SBERT NLI Levenshtein Token F1 SBERT+NLI Weight
type 78.0% 65.4% 70.5% 55.6% 68.3% 2.5x
color 79.3% 53.9% 60.6% 33.2% 68.6% 1.0x
pattern 64.2% 59.7% 60.1% 43.1% 52.9% 1.0x
closure 40.8% 30.7% 39.6% 24.7% 32.3% 1.0x
sleeve 62.6% 87.2% 62.1% 64.3% 78.1% 1.0x
neckline 64.0% 69.7% 61.8% 55.5% 60.0% 1.0x
defect 55.0% 55.2% 54.5% 54.1% 54.9% 2.0x
brand 86.6% 86.6% 87.0% 85.8% 86.4% 1.5x
size 99.0% 98.9% 98.9% 98.9% 98.9% 1.5x

Visualizations

Radar Chart Leaderboard Metrics Throughput

Full Leaderboard

Rank Model Weighted SBERT+NLI JSON Parse Throughput Inference
1 qwen3-vl-8b-sft+grpo 80.9% 78.7% 100% 7.5/s 464s
2 qwen3-vl-2b-sft-grpo-v9 79.9% 78.5% 100% 15.9/s 220s
3 qwen3-vl-8b-instruct-base 78.1% 75.6% 100% 5.5/s 640s
4 qwen3-vl-8b-instruct-nvfp4 77.8% 75.0% 100% 8.2/s 424s
5 qwen35-2b-base 76.2% 73.0% 100% 6.6/s 534s
6 qwen3-vl-2b-sft-grpo-v9-nvfp4 74.6% 74.1% 100% 17.2/s 203s
7 qwen3-vl-2b-instruct-base >>> 68.0% 66.7% 100% 15.1/s 231s
8 internvl3-2b-grpo-gtpo-full 67.5% 64.3% 100% 11.8/s 297s
9 internvl3-2b-grpo-gtpo-fp8 67.1% 63.8% 100% 14.3/s 244s
10 internvl3-2b-base 66.8% 63.7% 100% 11.8/s 297s
11 moondream2-base 63.8% 61.8% 100% 1.4/s 2416s
12 qwen35-2b-sft-grpo-gtpo-v8 60.7% 60.1% 100% 11.3/s 309s
13 qwen35-2b-sft-v7 58.6% 58.9% 100% 11.6/s 302s
14 qwen35-35b-a3b-gptq-int4 51.5% 48.7% 14% 1.6/s 2124s
15 qwen35-9b-nvfp4-v10 48.9% 46.0% 8% 1.7/s 2075s
16 qwen35-9b-sft-nvfp4-v11 48.3% 45.5% 8% 1.7/s 2023s
17 qwen35-2b-base-nvfp4-v10 45.9% 42.9% 0% 4.0/s 878s
18 qwen3.5-122b-a10b-nvfp4 45.9% 42.9% 0% 1.2/s 2893s
19 qwen35-2b-sft-nvfp4-v11 45.9% 42.9% 0% 4.0/s 876s
20 qwen35-2b-sft-grpo-gtpo-nvfp4 45.9% 42.9% 0% 3.9/s 907s
21 qwen3-vl-8b-sft-grpo 0.0% 0.0% 100% 0.0/s 462s

Comparative Analysis

  • vs qwen3-vl-2b-sft-grpo-v9 (fine-tuned, #1): -13.1pp weighted. Fine-tuning improved this model by 13.1pp.

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric Description
SBERT Cosine Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score Natural language inference entailment scoring
Levenshtein Ratio Fuzzy string matching
Token F1 Token-level precision/recall
Weighted Score Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen3-vl-2b-instruct-base,
  title={Qwen3-VL-2B-Instruct (Base)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen3-vl-2b-instruct-base}
}

License

This model is released under the Apache 2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Denali-AI/qwen3-vl-2b-instruct-base

Finetuned
(175)
this model

Collection including Denali-AI/qwen3-vl-2b-instruct-base