arxiv:2509.17374

Revisiting Vision Language Foundations for No-Reference Image Quality Assessment

Published on Mar 14

Authors:

Abstract

A systematic evaluation of vision transformer backbones for no-reference image quality assessment reveals that sigmoid activation outperforms ReLU and GELU, with a learnable activation selection mechanism achieving state-of-the-art results.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large-scale vision language pre-training has recently shown promise for no-reference image-quality assessment (NR-IQA), yet the relative merits of modern Vision Transformer foundations remain poorly understood. In this work, we present the first systematic evaluation of six prominent pretrained backbones, CLIP, SigLIP2, DINOv2, DINOv3, Perception, and ResNet, for the task of No-Reference Image Quality Assessment (NR-IQA), each finetuned using an identical lightweight MLP head. Our study uncovers two previously overlooked factors: (1) SigLIP2 consistently achieves strong performance; and (2) the choice of activation function plays a surprisingly crucial role, particularly for enhancing the generalization ability of image quality assessment models. Notably, we find that simple sigmoid activations outperform commonly used ReLU and GELU on several benchmarks. Motivated by this finding, we introduce a learnable activation selection mechanism that adaptively determines the nonlinearity for each channel, eliminating the need for manual activation design, and achieving new state-of-the-art SRCC on CLIVE, KADID10K, and AGIQA3K. Extensive ablations confirm the benefits across architectures and regimes, establishing strong, resource-efficient NR-IQA baselines.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2509.17374

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.17374 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.17374 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.