qwen35-2b-base / README.md

msudharsanan

Upload README.md with huggingface_hub

85b01db verified 18 days ago

preview code

raw

history blame contribute delete

4.75 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen3.5-2B
tags:
  - vision-language
  - qwen35vl
  - zero-shot
  - baseline
  - garment-classification

Qwen3.5-2B (Base)

Zero-shot baseline of Qwen3.5-2B for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #5/21 on the Denali-AI eval_hard_3500 benchmark with 84.4% weighted score (zero-shot).

Model Details

Property	Value
Architecture	Qwen3.5-VL
Parameters	2B
Base Model	Qwen/Qwen3.5-2B
Training	None (zero-shot baseline)
Task	Garment Attribute Extraction (9-field JSON)
Output Format	Structured JSON

Key Highlights

Zero-shot baseline — no task-specific fine-tuning applied
100% JSON parse rate — model produces valid structured JSON out of the box
Serves as the comparison point for Denali-AI fine-tuned variants
Throughput: 6.6 samples/s

Benchmark Results

Rank #5/21 on eval_hard_3500

Metric	Score
Weighted Score	84.4%
SBERT+NLI Combined	73.0%
JSON Parse Rate	100%
Throughput	6.6 samples/s
Inference Time	534s (3500 samples)

Per-Field Scores

Field	SBERT	NLI	Levenshtein	Token F1	SBERT+NLI	Weight
type	78.7%	66.8%	70.9%	58.7%	68.4%	2.5x
color	75.6%	63.2%	61.9%	35.8%	67.3%	1.0x
pattern	62.9%	69.6%	56.0%	40.1%	60.0%	1.0x
closure	42.4%	35.2%	40.3%	27.4%	34.0%	1.0x
sleeve	68.8%	89.1%	69.6%	71.8%	80.4%	1.0x
neckline	64.8%	63.9%	63.2%	56.4%	57.4%	1.0x
defect	96.0%	96.2%	95.7%	95.3%	96.0%	2.0x
brand	94.4%	94.3%	94.6%	93.5%	94.2%	1.5x
size	99.3%	99.3%	99.3%	99.3%	99.3%	1.5x

Visualizations

Full Leaderboard

Rank	Model	Weighted	SBERT+NLI	JSON Parse	Throughput	Inference
1	qwen3-vl-8b-sft+grpo	80.9%	78.7%	100%	7.5/s	464s
2	qwen3-vl-2b-sft-grpo-v9	79.9%	78.5%	100%	15.9/s	220s
3	qwen3-vl-8b-instruct-base	78.1%	75.6%	100%	5.5/s	640s
4	qwen3-vl-8b-instruct-nvfp4	77.8%	75.0%	100%	8.2/s	424s
5	qwen35-2b-base >>>	76.2%	73.0%	100%	6.6/s	534s
6	qwen3-vl-2b-sft-grpo-v9-nvfp4	74.6%	74.1%	100%	17.2/s	203s
7	qwen3-vl-2b-instruct-base	68.0%	66.7%	100%	15.1/s	231s
8	internvl3-2b-grpo-gtpo-full	67.5%	64.3%	100%	11.8/s	297s
9	internvl3-2b-grpo-gtpo-fp8	67.1%	63.8%	100%	14.3/s	244s
10	internvl3-2b-base	66.8%	63.7%	100%	11.8/s	297s
11	moondream2-base	63.8%	61.8%	100%	1.4/s	2416s
12	qwen35-2b-sft-grpo-gtpo-v8	60.7%	60.1%	100%	11.3/s	309s
13	qwen35-2b-sft-v7	58.6%	58.9%	100%	11.6/s	302s
14	qwen35-35b-a3b-gptq-int4	51.5%	48.7%	14%	1.6/s	2124s
15	qwen35-9b-nvfp4-v10	48.9%	46.0%	8%	1.7/s	2075s
16	qwen35-9b-sft-nvfp4-v11	48.3%	45.5%	8%	1.7/s	2023s
17	qwen35-2b-base-nvfp4-v10	45.9%	42.9%	0%	4.0/s	878s
18	qwen3.5-122b-a10b-nvfp4	45.9%	42.9%	0%	1.2/s	2893s
19	qwen35-2b-sft-nvfp4-v11	45.9%	42.9%	0%	4.0/s	876s
20	qwen35-2b-sft-grpo-gtpo-nvfp4	45.9%	42.9%	0%	3.9/s	907s
21	qwen3-vl-8b-sft-grpo	0.0%	0.0%	100%	0.0/s	462s

Comparative Analysis

vs qwen35-2b-sft-grpo-gtpo-v8 (fine-tuned, #8): +19.0pp weighted. Fine-tuning improved this model by 19.0pp.

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric	Description
SBERT Cosine	Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score	Natural language inference entailment scoring
Levenshtein Ratio	Fuzzy string matching
Token F1	Token-level precision/recall
Weighted Score	Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen35-2b-base,
  title={Qwen3.5-2B (Base)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen35-2b-base}
}

License

This model is released under the Apache 2.0 License.