qwen35-2b-base / README.md
msudharsanan's picture
Upload README.md with huggingface_hub
85b01db verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3.5-2B
tags:
  - vision-language
  - qwen35vl
  - zero-shot
  - baseline
  - garment-classification

Qwen3.5-2B (Base)

Zero-shot baseline of Qwen3.5-2B for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #5/21 on the Denali-AI eval_hard_3500 benchmark with 84.4% weighted score (zero-shot).

Model Details

Property Value
Architecture Qwen3.5-VL
Parameters 2B
Base Model Qwen/Qwen3.5-2B
Training None (zero-shot baseline)
Task Garment Attribute Extraction (9-field JSON)
Output Format Structured JSON

Key Highlights

  • Zero-shot baseline — no task-specific fine-tuning applied
  • 100% JSON parse rate — model produces valid structured JSON out of the box
  • Serves as the comparison point for Denali-AI fine-tuned variants
  • Throughput: 6.6 samples/s

Benchmark Results

Rank #5/21 on eval_hard_3500

Metric Score
Weighted Score 84.4%
SBERT+NLI Combined 73.0%
JSON Parse Rate 100%
Throughput 6.6 samples/s
Inference Time 534s (3500 samples)

Per-Field Scores

Field SBERT NLI Levenshtein Token F1 SBERT+NLI Weight
type 78.7% 66.8% 70.9% 58.7% 68.4% 2.5x
color 75.6% 63.2% 61.9% 35.8% 67.3% 1.0x
pattern 62.9% 69.6% 56.0% 40.1% 60.0% 1.0x
closure 42.4% 35.2% 40.3% 27.4% 34.0% 1.0x
sleeve 68.8% 89.1% 69.6% 71.8% 80.4% 1.0x
neckline 64.8% 63.9% 63.2% 56.4% 57.4% 1.0x
defect 96.0% 96.2% 95.7% 95.3% 96.0% 2.0x
brand 94.4% 94.3% 94.6% 93.5% 94.2% 1.5x
size 99.3% 99.3% 99.3% 99.3% 99.3% 1.5x

Visualizations

Radar Chart Leaderboard Metrics Throughput

Full Leaderboard

Rank Model Weighted SBERT+NLI JSON Parse Throughput Inference
1 qwen3-vl-8b-sft+grpo 80.9% 78.7% 100% 7.5/s 464s
2 qwen3-vl-2b-sft-grpo-v9 79.9% 78.5% 100% 15.9/s 220s
3 qwen3-vl-8b-instruct-base 78.1% 75.6% 100% 5.5/s 640s
4 qwen3-vl-8b-instruct-nvfp4 77.8% 75.0% 100% 8.2/s 424s
5 qwen35-2b-base >>> 76.2% 73.0% 100% 6.6/s 534s
6 qwen3-vl-2b-sft-grpo-v9-nvfp4 74.6% 74.1% 100% 17.2/s 203s
7 qwen3-vl-2b-instruct-base 68.0% 66.7% 100% 15.1/s 231s
8 internvl3-2b-grpo-gtpo-full 67.5% 64.3% 100% 11.8/s 297s
9 internvl3-2b-grpo-gtpo-fp8 67.1% 63.8% 100% 14.3/s 244s
10 internvl3-2b-base 66.8% 63.7% 100% 11.8/s 297s
11 moondream2-base 63.8% 61.8% 100% 1.4/s 2416s
12 qwen35-2b-sft-grpo-gtpo-v8 60.7% 60.1% 100% 11.3/s 309s
13 qwen35-2b-sft-v7 58.6% 58.9% 100% 11.6/s 302s
14 qwen35-35b-a3b-gptq-int4 51.5% 48.7% 14% 1.6/s 2124s
15 qwen35-9b-nvfp4-v10 48.9% 46.0% 8% 1.7/s 2075s
16 qwen35-9b-sft-nvfp4-v11 48.3% 45.5% 8% 1.7/s 2023s
17 qwen35-2b-base-nvfp4-v10 45.9% 42.9% 0% 4.0/s 878s
18 qwen3.5-122b-a10b-nvfp4 45.9% 42.9% 0% 1.2/s 2893s
19 qwen35-2b-sft-nvfp4-v11 45.9% 42.9% 0% 4.0/s 876s
20 qwen35-2b-sft-grpo-gtpo-nvfp4 45.9% 42.9% 0% 3.9/s 907s
21 qwen3-vl-8b-sft-grpo 0.0% 0.0% 100% 0.0/s 462s

Comparative Analysis

  • vs qwen35-2b-sft-grpo-gtpo-v8 (fine-tuned, #8): +19.0pp weighted. Fine-tuning improved this model by 19.0pp.

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric Description
SBERT Cosine Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score Natural language inference entailment scoring
Levenshtein Ratio Fuzzy string matching
Token F1 Token-level precision/recall
Weighted Score Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen35-2b-base,
  title={Qwen3.5-2B (Base)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen35-2b-base}
}

License

This model is released under the Apache 2.0 License.