Qwen3.5-VL Models
Collection
Garment classification models based on Qwen3.5-VL (0.8B, 2B, 122B) • 1 item • Updated
Zero-shot baseline of Qwen3.5-2B for garment classification. This is the base model before any Denali-AI fine-tuning. Ranked #5/21 on the Denali-AI eval_hard_3500 benchmark with 84.4% weighted score (zero-shot).
| Property | Value |
|---|---|
| Architecture | Qwen3.5-VL |
| Parameters | 2B |
| Base Model | Qwen/Qwen3.5-2B |
| Training | None (zero-shot baseline) |
| Task | Garment Attribute Extraction (9-field JSON) |
| Output Format | Structured JSON |
Rank #5/21 on eval_hard_3500
| Metric | Score |
|---|---|
| Weighted Score | 84.4% |
| SBERT+NLI Combined | 73.0% |
| JSON Parse Rate | 100% |
| Throughput | 6.6 samples/s |
| Inference Time | 534s (3500 samples) |
| Field | SBERT | NLI | Levenshtein | Token F1 | SBERT+NLI | Weight |
|---|---|---|---|---|---|---|
| type | 78.7% | 66.8% | 70.9% | 58.7% | 68.4% | 2.5x |
| color | 75.6% | 63.2% | 61.9% | 35.8% | 67.3% | 1.0x |
| pattern | 62.9% | 69.6% | 56.0% | 40.1% | 60.0% | 1.0x |
| closure | 42.4% | 35.2% | 40.3% | 27.4% | 34.0% | 1.0x |
| sleeve | 68.8% | 89.1% | 69.6% | 71.8% | 80.4% | 1.0x |
| neckline | 64.8% | 63.9% | 63.2% | 56.4% | 57.4% | 1.0x |
| defect | 96.0% | 96.2% | 95.7% | 95.3% | 96.0% | 2.0x |
| brand | 94.4% | 94.3% | 94.6% | 93.5% | 94.2% | 1.5x |
| size | 99.3% | 99.3% | 99.3% | 99.3% | 99.3% | 1.5x |
| Rank | Model | Weighted | SBERT+NLI | JSON Parse | Throughput | Inference |
|---|---|---|---|---|---|---|
| 1 | qwen3-vl-8b-sft+grpo | 80.9% | 78.7% | 100% | 7.5/s | 464s |
| 2 | qwen3-vl-2b-sft-grpo-v9 | 79.9% | 78.5% | 100% | 15.9/s | 220s |
| 3 | qwen3-vl-8b-instruct-base | 78.1% | 75.6% | 100% | 5.5/s | 640s |
| 4 | qwen3-vl-8b-instruct-nvfp4 | 77.8% | 75.0% | 100% | 8.2/s | 424s |
| 5 | qwen35-2b-base >>> | 76.2% | 73.0% | 100% | 6.6/s | 534s |
| 6 | qwen3-vl-2b-sft-grpo-v9-nvfp4 | 74.6% | 74.1% | 100% | 17.2/s | 203s |
| 7 | qwen3-vl-2b-instruct-base | 68.0% | 66.7% | 100% | 15.1/s | 231s |
| 8 | internvl3-2b-grpo-gtpo-full | 67.5% | 64.3% | 100% | 11.8/s | 297s |
| 9 | internvl3-2b-grpo-gtpo-fp8 | 67.1% | 63.8% | 100% | 14.3/s | 244s |
| 10 | internvl3-2b-base | 66.8% | 63.7% | 100% | 11.8/s | 297s |
| 11 | moondream2-base | 63.8% | 61.8% | 100% | 1.4/s | 2416s |
| 12 | qwen35-2b-sft-grpo-gtpo-v8 | 60.7% | 60.1% | 100% | 11.3/s | 309s |
| 13 | qwen35-2b-sft-v7 | 58.6% | 58.9% | 100% | 11.6/s | 302s |
| 14 | qwen35-35b-a3b-gptq-int4 | 51.5% | 48.7% | 14% | 1.6/s | 2124s |
| 15 | qwen35-9b-nvfp4-v10 | 48.9% | 46.0% | 8% | 1.7/s | 2075s |
| 16 | qwen35-9b-sft-nvfp4-v11 | 48.3% | 45.5% | 8% | 1.7/s | 2023s |
| 17 | qwen35-2b-base-nvfp4-v10 | 45.9% | 42.9% | 0% | 4.0/s | 878s |
| 18 | qwen3.5-122b-a10b-nvfp4 | 45.9% | 42.9% | 0% | 1.2/s | 2893s |
| 19 | qwen35-2b-sft-nvfp4-v11 | 45.9% | 42.9% | 0% | 4.0/s | 876s |
| 20 | qwen35-2b-sft-grpo-gtpo-nvfp4 | 45.9% | 42.9% | 0% | 3.9/s | 907s |
| 21 | qwen3-vl-8b-sft-grpo | 0.0% | 0.0% | 100% | 0.0/s | 462s |
Models are evaluated on the eval_hard_3500 benchmark using:
| Metric | Description |
|---|---|
| SBERT Cosine | Semantic similarity via sentence-transformers (all-MiniLM-L6-v2) |
| NLI Score | Natural language inference entailment scoring |
| Levenshtein Ratio | Fuzzy string matching |
| Token F1 | Token-level precision/recall |
| Weighted Score | Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x) |
@misc{denali-ai-qwen35-2b-base,
title={Qwen3.5-2B (Base)},
author={Denali AI},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/Denali-AI/qwen35-2b-base}
}
This model is released under the Apache 2.0 License.