yahya007
/

mplug2-vp-for-nriqa

Image-Text-to-Text

Model card Files Files and versions

yahya007 commited on Nov 30, 2025

Commit

59807e9

·

verified ·

1 Parent(s): 89e2a74

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ pipeline_tag: image-text-to-text
 💻 **Code**: [GitHub Repository](https://github.com/yahya-ben/mplug2-vp-for-nriqa)
 ## Abstract
-In this paper, we propose a novel parameter-efficient adaptation method for No- Reference Image Quality Assessment (NR-IQA) using visual prompts optimized in pixel-space. Unlike full fine-tuning of Multimodal Large Language Models (MLLMs), our approach trains only 600K parameters at most (< 0.01% of the base model), while keeping the underlying model fully frozen. During inference, these visual prompts are combined with images via addition and processed by mPLUG-Owl2 with the textual query "Rate the technical quality of the image." Evaluations across distortion types (synthetic, realistic, AI-generated) on KADID- 10k, KonIQ-10k, and AGIQA-3k demonstrate competitive performance against full finetuned methods and specialized NR-IQA models, achieving 0.93 SRCC on KADID-10k. To our knowledge, this is the first work to leverage pixel-space visual prompts for NR-IQA, enabling efficient MLLM adaptation for low-level vision tasks. The source code is publicly available at https: // github. com/ yahya-ben/ mplug2-vp-for-nriqa .
 ## Overview
 Pre-trained visual prompt checkpoints for **No-Reference Image Quality Assessment (NR-IQA)** using mPLUG-Owl2-7B. Achieves competitive performance with only **~600K parameters** vs 7B+ for full fine-tuning.

 💻 **Code**: [GitHub Repository](https://github.com/yahya-ben/mplug2-vp-for-nriqa)
 ## Abstract
+In this paper, we propose a novel approach to No-Reference Image Quality Assessment (NR-IQA) by efficiently adapting a Multimodal Large Language Model (MLLM) through pixel-space visual prompts. Unlike full fine-tuning approaches that adapt MLLMs to specific tasks, our method trains only ∼600K parameters at most (<0.01% of the base model), while keeping the underlying model fully frozen. During inference, these visual prompts are combined with images via addition and processed by mPLUG-Owl2 with the textual query “Rate the technical quality of the image.” Evaluations across distortion types (synthetic, realistic, AI-generated) on KADID-10k, KonIQ-10k, and AGIQA-3k demonstrate competitive performance against full finetuned methods and specialized NR-IQA models, achieving 0.93 SRCC on KADID-10k. The source code is publicly available at https: // github. com/ yahya-ben/ mplug2-vp-for-nriqa .
 ## Overview
 Pre-trained visual prompt checkpoints for **No-Reference Image Quality Assessment (NR-IQA)** using mPLUG-Owl2-7B. Achieves competitive performance with only **~600K parameters** vs 7B+ for full fine-tuning.