ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers
Abstract
ViTNT-FIQA measures face image quality by analyzing patch embedding stability across Vision Transformer blocks with a single forward pass.
Face Image Quality Assessment (FIQA) is essential for reliable face recognition systems. Current approaches primarily exploit only final-layer representations, while training-free methods require multiple forward passes or backpropagation. We propose ViTNT-FIQA, a training-free approach that measures the stability of patch embedding evolution across intermediate Vision Transformer (ViT) blocks. We demonstrate that high-quality face images exhibit stable feature refinement trajectories across blocks, while degraded images show erratic transformations. Our method computes Euclidean distances between L2-normalized patch embeddings from consecutive transformer blocks and aggregates them into image-level quality scores. We empirically validate this correlation on a quality-labeled synthetic dataset with controlled degradation levels. Unlike existing training-free approaches, ViTNT-FIQA requires only a single forward pass without backpropagation or architectural modifications. Through extensive evaluation on eight benchmarks (LFW, AgeDB-30, CFP-FP, CALFW, Adience, CPLFW, XQLFW, IJB-C), we show that ViTNT-FIQA achieves competitive performance with state-of-the-art methods while maintaining computational efficiency and immediate applicability to any pre-trained ViT-based face recognition model.
Community
ViTNT-FIQA is a training-free Face Image Quality Assessment (FIQA) method that measures the stability of patch embedding evolution across intermediate Vision Transformer (ViT) blocks. Unlike existing approaches that require multiple forward passes, backpropagation, or additional training, our method achieves competitive performance with just a single forward pass through pre-trained ViT-based face recognition models. https://github.com/gurayozgur/ViTNT-FIQA
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DReX: Pure Vision Fusion of Self-Supervised and Convolutional Representations for Image Complexity Prediction (2025)
- Edit2Restore:Few-Shot Image Restoration via Parameter-Efficient Adaptation of Pre-trained Editing Models (2026)
- ARM: A Learnable, Plug-and-Play Module for CLIP-based Open-vocabulary Semantic Segmentation (2025)
- Beyond the Ground Truth: Enhanced Supervision for Image Restoration (2025)
- Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling (2025)
- Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding (2025)
- Learning from a Generative Oracle: Domain Adaptation for Restoration (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper