baidu
/

ERNIE-Image-Aes

Model card Files Files and versions

ERNIE-Image-Aes / README.md

olenet's picture

Update README.md

8a0bb5e verified 13 days ago

|

history blame contribute delete

2.7 kB

	---
	license: apache-2.0
	---
	# ERNIE-Image-Aes: Robust Image Aesthetics Scoring with Balanced Category Generalization

	<!-- ![tier_preview](https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/EVsGxYPd7kVWWIKlBj-d9.png) -->
	<img src="https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/EVsGxYPd7kVWWIKlBj-d9.png" width="100%">

	[📄 Paper]

	## 🌟 Highlights

	ERNIE-Image-Aes is a 8B vision-language model for image aesthetic scoring, initialized from [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and fine-tuned on a diverse, professionally annotated dataset. It substantially outperforms existing aesthetic predictors (LAION-AES, ArtiMuse, UniPercept) in generalization across diverse image categories.

	Key advantages:
	- Balanced predictions across photography, anime, design, everyday snapshots, and film photography
	- No systematic bias toward specific image types (e.g., AI-generated content or black-and-white photos)
	- Swiss-tournament based pairwise annotation for high-quality training labels
	- Achieves 0.7445 SRCC and 0.7598 PLCC on ERIA-1K benchmark

	## 🔍 Motivation

	Off-the-shelf aesthetic predictors exhibit systematic biases:

	\| Model \| Bias \|
	\|-------\|------\|
	\| LAION-Aesthetic \| Disproportionately high scores for AI-generated/anime content \|
	\| ArtiMuse \| Overscores black-and-white photography and casual everyday snapshots \|
	\| UniPercept \| Strong preference for monochrome images; overscores casual snapshots \|

	ERNIE-Image-Aes addresses these failure modes through a purpose-built annotation pipeline with explicit category balance.

	## 📊 Results on ERIA-1K Benchmark

	\| Model \| SRCC \| PLCC \|
	\|-------\|------\|------\|
	\| LAION AES \| 0.2944 \| 0.3138 \|
	\| ArtiMuse \| 0.4277 \| 0.4704 \|
	\| UniPercept \| 0.4533 \| 0.4748 \|
	\| ERNIE-Image-Aes \| 0.7445 \| 0.7598 \|


	Annotation Protocol:
	- Pairwise Swiss-system tournament for stable and reproducible rankings
	- Tier labels from 1 to 10
	- Annotators recruited from professional backgrounds (Central Academy of Fine Arts, Sichuan Fine Arts Institute, Communication University of China, etc.)
	- All annotators passed aesthetic calibration screening prior to participation

	## ⚙️ Setup

	Please follow the setup instructions in the [ArtiMuse repository](https://huggingface.co/Thunderbolt215215/ArtiMuse).


	## 🙏 Acknowledgements

	Our work builds upon [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and [InternVL-3](https://github.com/OpenGVLab/InternVL). We sincerely thank the authors for their excellent contributions to the community.

	## ✒️ Citation

	If you find this work useful, please consider citing:

	```bibtex
	```