eunkey commited on
Commit
8ab2bb6
·
verified ·
1 Parent(s): a27085e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ We also release **EYE4ALL**, a human-annotated dataset for evaluating multimodal
24
  > However, research progress is hindered by the lack of comprehensive benchmarks and existing evaluation predictors lacking at least one of these key properties: (1) *Alignment with human judgments*, (2) *Long-sequence processing*, (3) *Inference efficiency*, and (4) *Applicability to multi-objective scoring*.
25
  > To address these challenges, we propose a plug-and-play architecture to build a robust predictor, **MULTI-TAP** (**Multi**-Objective **T**ask-**A**ware **P**redictor), capable of both multi and single-objective scoring.
26
  > **MULTI-TAP** can produce a single overall score, utilizing a reward head built on top of a large vision-language model (LVLMs).
27
- > We show that **MULTI-TAP** is robust in terms of application to different LVLM architectures, achieving significantly higher performance than existing metrics (*e.g.*, +42.3 Kendall's $$\tau_{c}$$ compared to IXCREW-S on FlickrExp) and even on par with the GPT-4o-based predictor, G-VEval, with a smaller size (7-8B).
28
  > By training a lightweight ridge regression layer on the frozen hidden states of a pre-trained LVLM, **MULTI-TAP** can produce fine-grained scores for multiple human-interpretable objectives.
29
  > **MULTI-TAP** performs better than VisionREWARD, a high-performing multi-objective reward model, in both performance and efficiency on multi-objective benchmarks and our newly released text-image-to-text dataset, **EYE4ALL**.
30
  > Our new dataset, consisting of chosen/rejected human preferences (**EYE4ALLPref**) and human-annotated fine-grained scores across seven dimensions (**EYE4ALLMulti**), can serve as a foundation for developing more accessible AI systems by capturing the underlying preferences of users, including blind and low-vision (BLV) individuals.
 
24
  > However, research progress is hindered by the lack of comprehensive benchmarks and existing evaluation predictors lacking at least one of these key properties: (1) *Alignment with human judgments*, (2) *Long-sequence processing*, (3) *Inference efficiency*, and (4) *Applicability to multi-objective scoring*.
25
  > To address these challenges, we propose a plug-and-play architecture to build a robust predictor, **MULTI-TAP** (**Multi**-Objective **T**ask-**A**ware **P**redictor), capable of both multi and single-objective scoring.
26
  > **MULTI-TAP** can produce a single overall score, utilizing a reward head built on top of a large vision-language model (LVLMs).
27
+ > We show that **MULTI-TAP** is robust in terms of application to different LVLM architectures, achieving significantly higher performance than existing metrics (*e.g.*, +42.3 Kendall's tau c compared to IXCREW-S on FlickrExp) and even on par with the GPT-4o-based predictor, G-VEval, with a smaller size (7-8B).
28
  > By training a lightweight ridge regression layer on the frozen hidden states of a pre-trained LVLM, **MULTI-TAP** can produce fine-grained scores for multiple human-interpretable objectives.
29
  > **MULTI-TAP** performs better than VisionREWARD, a high-performing multi-objective reward model, in both performance and efficiency on multi-objective benchmarks and our newly released text-image-to-text dataset, **EYE4ALL**.
30
  > Our new dataset, consisting of chosen/rejected human preferences (**EYE4ALLPref**) and human-annotated fine-grained scores across seven dimensions (**EYE4ALLMulti**), can serve as a foundation for developing more accessible AI systems by capturing the underlying preferences of users, including blind and low-vision (BLV) individuals.