Qwen3-VL-8B-Instruct (NVFP4)
NVFP4-quantized Qwen3-VL-8B-Instruct for garment classification. Ranked #4/21 on the Denali-AI eval_hard_3500 benchmark with 77.8% weighted score — only 0.3pp below full precision while being 1.5x faster and 59% smaller.
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen3-VL |
| Parameters | 8B (NVFP4 quantized) |
| Base Model | Qwen/Qwen3-VL-8B-Instruct |
| Quantization | NVFP4 (NVIDIA ModelOpt, group_size=16) |
| Model Size | ~7 GB (vs ~17 GB full precision) |
| Training | None (zero-shot baseline, quantized) |
| Task | Garment Attribute Extraction (9-field JSON) |
Key Highlights
- Only 0.3pp accuracy drop from NVFP4 quantization (77.8% vs 78.1%)
- 1.5x throughput improvement (8.2 vs 5.5 samples/s)
- 100% JSON parse rate preserved after quantization
- 59% model size reduction (7 GB vs 17 GB)
- Vision encoder excluded from quantization for quality preservation
- Confirms Qwen3-VL architecture handles quantization robustly (unlike Qwen3.5-VL which degrades catastrophically)
Benchmark Results
Rank #4/21 on eval_hard_3500
| Metric | Score |
|---|---|
| Weighted Score | 77.8% |
| SBERT+NLI Combined | 75.0% |
| JSON Parse Rate | 100% |
| Throughput | 8.2 samples/s |
| Inference Time | 424s (3500 samples) |
Per-Field Scores
| Field | SBERT | NLI | Levenshtein | Token F1 | SBERT+NLI | Weight |
|---|---|---|---|---|---|---|
| type | 79.0% | 67.0% | 72.2% | 59.6% | 69.9% | 2.5x |
| color | 80.3% | 61.4% | 65.2% | 40.1% | 71.5% | 1.0x |
| pattern | 62.8% | 64.6% | 58.2% | 38.3% | 56.4% | 1.0x |
| closure | 42.7% | 34.7% | 40.2% | 29.9% | 35.5% | 1.0x |
| sleeve | 71.3% | 85.5% | 71.6% | 72.0% | 78.2% | 1.0x |
| neckline | 80.2% | 78.3% | 79.3% | 73.2% | 75.0% | 1.0x |
| defect | 96.7% | 96.7% | 96.5% | 96.1% | 96.5% | 2.0x |
| brand | 93.5% | 93.5% | 93.8% | 92.6% | 93.2% | 1.5x |
| size | 99.2% | 99.1% | 99.2% | 99.1% | 99.2% | 1.5x |
Visualizations
Full Leaderboard
| Rank | Model | Weighted | SBERT+NLI | JSON Parse | Throughput | Inference |
|---|---|---|---|---|---|---|
| 1 | qwen3-vl-8b-sft+grpo | 80.9% | 78.7% | 100% | 7.5/s | 464s |
| 2 | qwen3-vl-2b-sft-grpo-v9 | 79.9% | 78.5% | 100% | 15.9/s | 220s |
| 3 | qwen3-vl-8b-instruct-base | 78.1% | 75.6% | 100% | 5.5/s | 640s |
| 4 | qwen3-vl-8b-instruct-nvfp4 >>> | 77.8% | 75.0% | 100% | 8.2/s | 424s |
| 5 | qwen35-2b-base | 76.2% | 73.0% | 100% | 6.6/s | 534s |
| 6 | qwen3-vl-2b-sft-grpo-v9-nvfp4 | 74.6% | 74.1% | 100% | 17.2/s | 203s |
| 7 | qwen3-vl-2b-instruct-base | 68.0% | 66.7% | 100% | 15.1/s | 231s |
| 8 | internvl3-2b-grpo-gtpo-full | 67.5% | 64.3% | 100% | 11.8/s | 297s |
| 9 | internvl3-2b-grpo-gtpo-fp8 | 67.1% | 63.8% | 100% | 14.3/s | 244s |
| 10 | internvl3-2b-base | 66.8% | 63.7% | 100% | 11.8/s | 297s |
| 11 | moondream2-base | 63.8% | 61.8% | 100% | 1.4/s | 2416s |
| 12 | qwen35-2b-sft-grpo-gtpo-v8 | 60.7% | 60.1% | 100% | 11.3/s | 309s |
| 13 | qwen35-2b-sft-v7 | 58.6% | 58.9% | 100% | 11.6/s | 302s |
| 14 | qwen35-35b-a3b-gptq-int4 | 51.5% | 48.7% | 14% | 1.6/s | 2124s |
| 15 | qwen35-9b-nvfp4-v10 | 48.9% | 46.0% | 8% | 1.7/s | 2075s |
| 16 | qwen35-9b-sft-nvfp4-v11 | 48.3% | 45.5% | 8% | 1.7/s | 2023s |
| 17 | qwen35-2b-base-nvfp4-v10 | 45.9% | 42.9% | 0% | 4.0/s | 878s |
| 18 | qwen3.5-122b-a10b-nvfp4 | 45.9% | 42.9% | 0% | 1.2/s | 2893s |
| 19 | qwen35-2b-sft-nvfp4-v11 | 45.9% | 42.9% | 0% | 4.0/s | 876s |
| 20 | qwen35-2b-sft-grpo-gtpo-nvfp4 | 45.9% | 42.9% | 0% | 3.9/s | 907s |
| 21 | qwen3-vl-8b-sft-grpo | 0.0% | 0.0% | 100% | 0.0/s | 462s |
Comparative Analysis
vs Full-Precision (qwen3-vl-8b-instruct-base)
| Metric | NVFP4 | Full Precision | Delta |
|---|---|---|---|
| Weighted Score | 77.8% | 78.1% | -0.3pp |
| SBERT+NLI | 75.0% | 75.6% | -0.6pp |
| JSON Parse | 100% | 100% | 0pp |
| Throughput | 8.2/s | 5.5/s | 1.5x faster |
| Model Size | ~7 GB | ~17 GB | 59% smaller |
Per-field SBERT+NLI delta:
| Field | NVFP4 | FP | Delta |
|---|---|---|---|
| type | 69.9% | 69.6% | +0.3pp |
| color | 71.5% | 71.2% | +0.3pp |
| pattern | 56.4% | 59.9% | -3.5pp |
| closure | 35.5% | 35.4% | +0.1pp |
| sleeve | 78.2% | 82.9% | -4.6pp |
| neckline | 75.0% | 73.5% | +1.4pp |
| defect | 96.5% | 96.0% | +0.5pp |
| brand | 93.2% | 93.2% | +0.0pp |
| size | 99.2% | 98.7% | +0.5pp |
Quantization Impact Summary
NVFP4 quantization on Qwen3-VL-8B is nearly lossless: the largest per-field degradation is sleeve (-4.7pp) and pattern (-3.5pp), while brand and size are essentially unchanged. This contrasts sharply with Qwen3.5-VL models where NVFP4 destroys JSON parse capability entirely (0% parse rate across all Qwen3.5 NVFP4 variants).
Quantization Details
- Algorithm: NVFP4 (4-bit floating point)
- Tool: NVIDIA ModelOpt 0.42.0
- Group Size: 16
- Calibration: 512 samples from train_10k_balanced_v3
- Excluded Modules:
lm_head,model.visual*(vision encoder kept in bfloat16)
Evaluation Methodology
Models are evaluated on the eval_hard_3500 benchmark using:
| Metric | Description |
|---|---|
| SBERT Cosine | Semantic similarity via sentence-transformers (all-MiniLM-L6-v2) |
| NLI Score | Natural language inference entailment scoring |
| Levenshtein Ratio | Fuzzy string matching |
| Token F1 | Token-level precision/recall |
| Weighted Score | Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x) |
Citation
@misc{denali-ai-qwen3-vl-8b-instruct-nvfp4,
title={Qwen3-VL-8B-Instruct (NVFP4)},
author={Denali AI},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/Denali-AI/qwen3-vl-8b-instruct-nvfp4}
}
License
This model is released under the Apache 2.0 License.
- Downloads last month
- 59
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Denali-AI/qwen3-vl-8b-instruct-nvfp4
Base model
Qwen/Qwen3-VL-8B-InstructCollection including Denali-AI/qwen3-vl-8b-instruct-nvfp4
Collection
Garment classification models based on Qwen3-VL (2B) • 7 items • Updated



