You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3-VL-8B-Instruct (NVFP4)

NVFP4-quantized Qwen3-VL-8B-Instruct for garment classification. Ranked #4/21 on the Denali-AI eval_hard_3500 benchmark with 77.8% weighted score — only 0.3pp below full precision while being 1.5x faster and 59% smaller.

Model Details

Property	Value
Architecture	Qwen3-VL
Parameters	8B (NVFP4 quantized)
Base Model	Qwen/Qwen3-VL-8B-Instruct
Quantization	NVFP4 (NVIDIA ModelOpt, group_size=16)
Model Size	~7 GB (vs ~17 GB full precision)
Training	None (zero-shot baseline, quantized)
Task	Garment Attribute Extraction (9-field JSON)

Key Highlights

Only 0.3pp accuracy drop from NVFP4 quantization (77.8% vs 78.1%)
1.5x throughput improvement (8.2 vs 5.5 samples/s)
100% JSON parse rate preserved after quantization
59% model size reduction (7 GB vs 17 GB)
Vision encoder excluded from quantization for quality preservation
Confirms Qwen3-VL architecture handles quantization robustly (unlike Qwen3.5-VL which degrades catastrophically)

Benchmark Results

Rank #4/21 on eval_hard_3500

Metric	Score
Weighted Score	77.8%
SBERT+NLI Combined	75.0%
JSON Parse Rate	100%
Throughput	8.2 samples/s
Inference Time	424s (3500 samples)

Per-Field Scores

Field	SBERT	NLI	Levenshtein	Token F1	SBERT+NLI	Weight
type	79.0%	67.0%	72.2%	59.6%	69.9%	2.5x
color	80.3%	61.4%	65.2%	40.1%	71.5%	1.0x
pattern	62.8%	64.6%	58.2%	38.3%	56.4%	1.0x
closure	42.7%	34.7%	40.2%	29.9%	35.5%	1.0x
sleeve	71.3%	85.5%	71.6%	72.0%	78.2%	1.0x
neckline	80.2%	78.3%	79.3%	73.2%	75.0%	1.0x
defect	96.7%	96.7%	96.5%	96.1%	96.5%	2.0x
brand	93.5%	93.5%	93.8%	92.6%	93.2%	1.5x
size	99.2%	99.1%	99.2%	99.1%	99.2%	1.5x

Visualizations

Full Leaderboard

Rank	Model	Weighted	SBERT+NLI	JSON Parse	Throughput	Inference
1	qwen3-vl-8b-sft+grpo	80.9%	78.7%	100%	7.5/s	464s
2	qwen3-vl-2b-sft-grpo-v9	79.9%	78.5%	100%	15.9/s	220s
3	qwen3-vl-8b-instruct-base	78.1%	75.6%	100%	5.5/s	640s
4	qwen3-vl-8b-instruct-nvfp4 >>>	77.8%	75.0%	100%	8.2/s	424s
5	qwen35-2b-base	76.2%	73.0%	100%	6.6/s	534s
6	qwen3-vl-2b-sft-grpo-v9-nvfp4	74.6%	74.1%	100%	17.2/s	203s
7	qwen3-vl-2b-instruct-base	68.0%	66.7%	100%	15.1/s	231s
8	internvl3-2b-grpo-gtpo-full	67.5%	64.3%	100%	11.8/s	297s
9	internvl3-2b-grpo-gtpo-fp8	67.1%	63.8%	100%	14.3/s	244s
10	internvl3-2b-base	66.8%	63.7%	100%	11.8/s	297s
11	moondream2-base	63.8%	61.8%	100%	1.4/s	2416s
12	qwen35-2b-sft-grpo-gtpo-v8	60.7%	60.1%	100%	11.3/s	309s
13	qwen35-2b-sft-v7	58.6%	58.9%	100%	11.6/s	302s
14	qwen35-35b-a3b-gptq-int4	51.5%	48.7%	14%	1.6/s	2124s
15	qwen35-9b-nvfp4-v10	48.9%	46.0%	8%	1.7/s	2075s
16	qwen35-9b-sft-nvfp4-v11	48.3%	45.5%	8%	1.7/s	2023s
17	qwen35-2b-base-nvfp4-v10	45.9%	42.9%	0%	4.0/s	878s
18	qwen3.5-122b-a10b-nvfp4	45.9%	42.9%	0%	1.2/s	2893s
19	qwen35-2b-sft-nvfp4-v11	45.9%	42.9%	0%	4.0/s	876s
20	qwen35-2b-sft-grpo-gtpo-nvfp4	45.9%	42.9%	0%	3.9/s	907s
21	qwen3-vl-8b-sft-grpo	0.0%	0.0%	100%	0.0/s	462s

Comparative Analysis

vs Full-Precision (qwen3-vl-8b-instruct-base)

Metric	NVFP4	Full Precision	Delta
Weighted Score	77.8%	78.1%	-0.3pp
SBERT+NLI	75.0%	75.6%	-0.6pp
JSON Parse	100%	100%	0pp
Throughput	8.2/s	5.5/s	1.5x faster
Model Size	~7 GB	~17 GB	59% smaller

Per-field SBERT+NLI delta:

Field	NVFP4	FP	Delta
type	69.9%	69.6%	+0.3pp
color	71.5%	71.2%	+0.3pp
pattern	56.4%	59.9%	-3.5pp
closure	35.5%	35.4%	+0.1pp
sleeve	78.2%	82.9%	-4.6pp
neckline	75.0%	73.5%	+1.4pp
defect	96.5%	96.0%	+0.5pp
brand	93.2%	93.2%	+0.0pp
size	99.2%	98.7%	+0.5pp

Quantization Impact Summary

NVFP4 quantization on Qwen3-VL-8B is nearly lossless: the largest per-field degradation is sleeve (-4.7pp) and pattern (-3.5pp), while brand and size are essentially unchanged. This contrasts sharply with Qwen3.5-VL models where NVFP4 destroys JSON parse capability entirely (0% parse rate across all Qwen3.5 NVFP4 variants).

Quantization Details

Algorithm: NVFP4 (4-bit floating point)
Tool: NVIDIA ModelOpt 0.42.0
Group Size: 16
Calibration: 512 samples from train_10k_balanced_v3
Excluded Modules: lm_head, model.visual* (vision encoder kept in bfloat16)

Evaluation Methodology

Models are evaluated on the eval_hard_3500 benchmark using:

Metric	Description
SBERT Cosine	Semantic similarity via sentence-transformers (all-MiniLM-L6-v2)
NLI Score	Natural language inference entailment scoring
Levenshtein Ratio	Fuzzy string matching
Token F1	Token-level precision/recall
Weighted Score	Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x)

Citation

@misc{denali-ai-qwen3-vl-8b-instruct-nvfp4,
  title={Qwen3-VL-8B-Instruct (NVFP4)},
  author={Denali AI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/Denali-AI/qwen3-vl-8b-instruct-nvfp4}
}

License

This model is released under the Apache 2.0 License.

Downloads last month: -

Safetensors

Model size

5B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Denali-AI/qwen3-vl-8b-instruct-nvfp4

Base model

Qwen/Qwen3-VL-8B-Instruct

Quantized

(83)

this model

Collection including Denali-AI/qwen3-vl-8b-instruct-nvfp4

Qwen3-VL Models

Collection

Garment classification models based on Qwen3-VL (2B) • 7 items • Updated Mar 31