You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3.5-2B (Base)

Zero-shot baseline of Qwen3.5-2B for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #5/21 on the Denali-AI eval_hard_3500 benchmark with 84.4% weighted score (zero-shot).

Model Details

Property	Value
Architecture	Qwen3.5-VL
Parameters	2B
Base Model	Qwen/Qwen3.5-2B
Training	None (zero-shot baseline)
Task	Garment Attribute Extraction (9-field JSON)
Output Format	Structured JSON

Key Highlights

Zero-shot baseline — no task-specific fine-tuning applied
100% JSON parse rate — model produces valid structured JSON out of the box
Serves as the comparison point for Denali-AI fine-tuned variants
Throughput: 6.6 samples/s

Benchmark Results

Rank #5/21 on eval_hard_3500

Metric	Score
Weighted Score	84.4%
SBERT+NLI Combined	73.0%
JSON Parse Rate	100%
Throughput	6.6 samples/s
Inference Time	534s (3500 samples)

Per-Field Scores

Field	SBERT	NLI	Levenshtein	Token F1	SBERT+NLI	Weight
type	78.7%	66.8%	70.9%	58.7%	68.4%	2.5x
color	75.6%	63.2%	61.9%	35.8%	67.3%	1.0x
pattern	62.9%	69.6%	56.0%	40.1%	60.0%	1.0x
closure	42.4%	35.2%	40.3%	27.4%	34.0%	1.0x
sleeve	68.8%	89.1%	69.6%	71.8%	80.4%	1.0x
neckline	64.8%	63.9%	63.2%	56.4%	57.4%	1.0x
defect	96.0%	96.2%	95.7%	95.3%	96.0%	2.0x
brand	94.4%	94.3%	94.6%	93.5%	94.2%	1.5x
size	99.3%	99.3%	99.3%	99.3%	99.3%	1.5x

Visualizations

Full Leaderboard

Rank	Model	Weighted	SBERT+NLI	JSON Parse	Throughput	Inference
1	qwen3-vl-8b-sft+grpo	80.9%	78.7%	100%	7.5/s	464s
2	qwen3-vl-2b-sft-grpo-v9	79.9%	78.5%	100%	15.9/s	220s
3	qwen3-vl-8b-instruct-base	78.1%	75.6%	100%	5.5/s	640s
4	qwen3-vl-8b-instruct-nvfp4	77.8%	75.0%	100%	8.2/s	424s
5	qwen35-2b-base >>>	76.2%	73.0%	100%	6.6/s	534s
6	qwen3-vl-2b-sft-grpo-v9-nvfp4	74.6%	74.1%	100%	17.2/s	203s
7	qwen3-vl-2b-instruct-base	68.0%	66.7%	100%	15.1/s	231s
8	internvl3-2b-grpo-gtpo-full	67.5%	64.3%	100%	11.8/s	297s
9	internvl3-2b-grpo-gtpo-fp8	67.1%	63.8%	100%	14.3/s	244s
10	internvl3-2b-base	66.8%	63.7%	100%	11.8/s	297s
11	moondream2-base	63.8%	61.8%	100%	1.4/s	2416s
12	qwen35-2b-sft-grpo-gtpo-v8	60.7%	60.1%	100%	11.3/s	309s
13	qwen35-2b-sft-v7	58.6%	58.9%	100%	11.6/s	302s
14	qwen35-35b-a3b-gptq-int4	51.5%	48.7%	14%	1.6/s	2124s
15	qwen35-9b-nvfp4-v10	48.9%	46.0%	8%	1.7/s	2075s
16	qwen35-9b-sft-nvfp4-v11	48.3%	45.5%	8%	1.7/s	2023s
17	qwen35-2b-base-nvfp4-v10	45.9%	42.9%	0%	4.0/s	878s
18	qwen3.5-122b-a10b-nvfp4	45.9%	42.9%	0%	1.2/s	2893s
19	qwen35-2b-sft-nvfp4-v11	45.9%	42.9%	0%	4.0/s	876s
20	qwen35-2b-sft-grpo-gtpo-nvfp4	45.9%	42.9%	0%	3.9/s	907s
21	qwen3-vl-8b-sft-grpo	0.0%	0.0%	100%	0.0/s	462s

Comparative Analysis

vs qwen35-2b-sft-grpo-gtpo-v8 (fine-tuned, #8): +19.0pp weighted. Fine-tuning improved this model by 19.0pp.

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric	Description
SBERT Cosine	Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score	Natural language inference entailment scoring
Levenshtein Ratio	Fuzzy string matching
Token F1	Token-level precision/recall
Weighted Score	Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen35-2b-base,
  title={Qwen3.5-2B (Base)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen35-2b-base}
}