| --- |
| license: apache-2.0 |
| --- |
| # ERNIE-Image-Aes: Robust Image Aesthetics Scoring with Balanced Category Generalization |
|
|
| <!--  --> |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/EVsGxYPd7kVWWIKlBj-d9.png" width="100%"> |
|
|
| [π Paper] |
|
|
| ## π Highlights |
|
|
| ERNIE-Image-Aes is a 8B vision-language model for image aesthetic scoring, initialized from [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and fine-tuned on a diverse, professionally annotated dataset. It substantially outperforms existing aesthetic predictors (LAION-AES, ArtiMuse, UniPercept) in generalization across diverse image categories. |
|
|
| **Key advantages:** |
| - Balanced predictions across photography, anime, design, everyday snapshots, and film photography |
| - No systematic bias toward specific image types (e.g., AI-generated content or black-and-white photos) |
| - Swiss-tournament based pairwise annotation for high-quality training labels |
| - Achieves **0.7445 SRCC** and **0.7598 PLCC** on ERIA-1K benchmark |
|
|
| ## π Motivation |
|
|
| Off-the-shelf aesthetic predictors exhibit systematic biases: |
|
|
| | Model | Bias | |
| |-------|------| |
| | LAION-Aesthetic | Disproportionately high scores for AI-generated/anime content | |
| | ArtiMuse | Overscores black-and-white photography and casual everyday snapshots | |
| | UniPercept | Strong preference for monochrome images; overscores casual snapshots | |
|
|
| ERNIE-Image-Aes addresses these failure modes through a purpose-built annotation pipeline with explicit category balance. |
|
|
| ## π Results on ERIA-1K Benchmark |
|
|
| | Model | SRCC | PLCC | |
| |-------|------|------| |
| | LAION AES | 0.2944 | 0.3138 | |
| | ArtiMuse | 0.4277 | 0.4704 | |
| | UniPercept | 0.4533 | 0.4748 | |
| | **ERNIE-Image-Aes** | **0.7445** | **0.7598** | |
|
|
|
|
| **Annotation Protocol:** |
| - Pairwise Swiss-system tournament for stable and reproducible rankings |
| - Tier labels from 1 to 10 |
| - Annotators recruited from professional backgrounds (Central Academy of Fine Arts, Sichuan Fine Arts Institute, Communication University of China, etc.) |
| - All annotators passed aesthetic calibration screening prior to participation |
|
|
| ## βοΈ Setup |
|
|
| Please follow the setup instructions in the [ArtiMuse repository](https://huggingface.co/Thunderbolt215215/ArtiMuse). |
|
|
|
|
| ## π Acknowledgements |
|
|
| Our work builds upon [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and [InternVL-3](https://github.com/OpenGVLab/InternVL). We sincerely thank the authors for their excellent contributions to the community. |
|
|
| ## βοΈ Citation |
|
|
| If you find this work useful, please consider citing: |
|
|
| ```bibtex |
| ``` |
|
|
|
|