Image-Text-to-Text
English
yahya007 commited on
Commit
59807e9
·
verified ·
1 Parent(s): 89e2a74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: image-text-to-text
14
  💻 **Code**: [GitHub Repository](https://github.com/yahya-ben/mplug2-vp-for-nriqa)
15
 
16
  ## Abstract
17
- In this paper, we propose a novel parameter-efficient adaptation method for No- Reference Image Quality Assessment (NR-IQA) using visual prompts optimized in pixel-space. Unlike full fine-tuning of Multimodal Large Language Models (MLLMs), our approach trains only 600K parameters at most (< 0.01% of the base model), while keeping the underlying model fully frozen. During inference, these visual prompts are combined with images via addition and processed by mPLUG-Owl2 with the textual query "Rate the technical quality of the image." Evaluations across distortion types (synthetic, realistic, AI-generated) on KADID- 10k, KonIQ-10k, and AGIQA-3k demonstrate competitive performance against full finetuned methods and specialized NR-IQA models, achieving 0.93 SRCC on KADID-10k. To our knowledge, this is the first work to leverage pixel-space visual prompts for NR-IQA, enabling efficient MLLM adaptation for low-level vision tasks. The source code is publicly available at https: // github. com/ yahya-ben/ mplug2-vp-for-nriqa .
18
 
19
  ## Overview
20
  Pre-trained visual prompt checkpoints for **No-Reference Image Quality Assessment (NR-IQA)** using mPLUG-Owl2-7B. Achieves competitive performance with only **~600K parameters** vs 7B+ for full fine-tuning.
 
14
  💻 **Code**: [GitHub Repository](https://github.com/yahya-ben/mplug2-vp-for-nriqa)
15
 
16
  ## Abstract
17
+ In this paper, we propose a novel approach to No-Reference Image Quality Assessment (NR-IQA) by efficiently adapting a Multimodal Large Language Model (MLLM) through pixel-space visual prompts. Unlike full fine-tuning approaches that adapt MLLMs to specific tasks, our method trains only 600K parameters at most (<0.01% of the base model), while keeping the underlying model fully frozen. During inference, these visual prompts are combined with images via addition and processed by mPLUG-Owl2 with the textual query Rate the technical quality of the image.” Evaluations across distortion types (synthetic, realistic, AI-generated) on KADID-10k, KonIQ-10k, and AGIQA-3k demonstrate competitive performance against full finetuned methods and specialized NR-IQA models, achieving 0.93 SRCC on KADID-10k. The source code is publicly available at https: // github. com/ yahya-ben/ mplug2-vp-for-nriqa .
18
 
19
  ## Overview
20
  Pre-trained visual prompt checkpoints for **No-Reference Image Quality Assessment (NR-IQA)** using mPLUG-Owl2-7B. Achieves competitive performance with only **~600K parameters** vs 7B+ for full fine-tuning.