You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3.5-2B (Base)

Zero-shot baseline of Qwen3.5-2B for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #5/21 on the Denali-AI eval_hard_3500 benchmark with 84.4% weighted score (zero-shot).

Model Details

Property Value
Architecture Qwen3.5-VL
Parameters 2B
Base Model Qwen/Qwen3.5-2B
Training None (zero-shot baseline)
Task Garment Attribute Extraction (9-field JSON)
Output Format Structured JSON

Key Highlights

  • Zero-shot baseline — no task-specific fine-tuning applied
  • 100% JSON parse rate — model produces valid structured JSON out of the box
  • Serves as the comparison point for Denali-AI fine-tuned variants
  • Throughput: 6.6 samples/s

Benchmark Results

Rank #5/21 on eval_hard_3500

Metric Score
Weighted Score 84.4%
SBERT+NLI Combined 73.0%
JSON Parse Rate 100%
Throughput 6.6 samples/s
Inference Time 534s (3500 samples)

Per-Field Scores

Field SBERT NLI Levenshtein Token F1 SBERT+NLI Weight
type 78.7% 66.8% 70.9% 58.7% 68.4% 2.5x
color 75.6% 63.2% 61.9% 35.8% 67.3% 1.0x
pattern 62.9% 69.6% 56.0% 40.1% 60.0% 1.0x
closure 42.4% 35.2% 40.3% 27.4% 34.0% 1.0x
sleeve 68.8% 89.1% 69.6% 71.8% 80.4% 1.0x
neckline 64.8% 63.9% 63.2% 56.4% 57.4% 1.0x
defect 96.0% 96.2% 95.7% 95.3% 96.0% 2.0x
brand 94.4% 94.3% 94.6% 93.5% 94.2% 1.5x
size 99.3% 99.3% 99.3% 99.3% 99.3% 1.5x

Visualizations

Radar Chart Leaderboard Metrics Throughput

Full Leaderboard

Rank Model Weighted SBERT+NLI JSON Parse Throughput Inference
1 qwen3-vl-8b-sft+grpo 80.9% 78.7% 100% 7.5/s 464s
2 qwen3-vl-2b-sft-grpo-v9 79.9% 78.5% 100% 15.9/s 220s
3 qwen3-vl-8b-instruct-base 78.1% 75.6% 100% 5.5/s 640s
4 qwen3-vl-8b-instruct-nvfp4 77.8% 75.0% 100% 8.2/s 424s
5 qwen35-2b-base >>> 76.2% 73.0% 100% 6.6/s 534s
6 qwen3-vl-2b-sft-grpo-v9-nvfp4 74.6% 74.1% 100% 17.2/s 203s
7 qwen3-vl-2b-instruct-base 68.0% 66.7% 100% 15.1/s 231s
8 internvl3-2b-grpo-gtpo-full 67.5% 64.3% 100% 11.8/s 297s
9 internvl3-2b-grpo-gtpo-fp8 67.1% 63.8% 100% 14.3/s 244s
10 internvl3-2b-base 66.8% 63.7% 100% 11.8/s 297s
11 moondream2-base 63.8% 61.8% 100% 1.4/s 2416s
12 qwen35-2b-sft-grpo-gtpo-v8 60.7% 60.1% 100% 11.3/s 309s
13 qwen35-2b-sft-v7 58.6% 58.9% 100% 11.6/s 302s
14 qwen35-35b-a3b-gptq-int4 51.5% 48.7% 14% 1.6/s 2124s
15 qwen35-9b-nvfp4-v10 48.9% 46.0% 8% 1.7/s 2075s
16 qwen35-9b-sft-nvfp4-v11 48.3% 45.5% 8% 1.7/s 2023s
17 qwen35-2b-base-nvfp4-v10 45.9% 42.9% 0% 4.0/s 878s
18 qwen3.5-122b-a10b-nvfp4 45.9% 42.9% 0% 1.2/s 2893s
19 qwen35-2b-sft-nvfp4-v11 45.9% 42.9% 0% 4.0/s 876s
20 qwen35-2b-sft-grpo-gtpo-nvfp4 45.9% 42.9% 0% 3.9/s 907s
21 qwen3-vl-8b-sft-grpo 0.0% 0.0% 100% 0.0/s 462s

Comparative Analysis

  • vs qwen35-2b-sft-grpo-gtpo-v8 (fine-tuned, #8): +19.0pp weighted. Fine-tuning improved this model by 19.0pp.

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric Description
SBERT Cosine Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score Natural language inference entailment scoring
Levenshtein Ratio Fuzzy string matching
Token F1 Token-level precision/recall
Weighted Score Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen35-2b-base,
  title={Qwen3.5-2B (Base)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen35-2b-base}
}

License

This model is released under the Apache 2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Denali-AI/qwen35-2b-base

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(81)
this model

Collection including Denali-AI/qwen35-2b-base