--- license: apache-2.0 --- # ERNIE-Image-Aes: Robust Image Aesthetics Scoring with Balanced Category Generalization [📄 Paper] ## 🌟 Highlights ERNIE-Image-Aes is a 8B vision-language model for image aesthetic scoring, initialized from [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and fine-tuned on a diverse, professionally annotated dataset. It substantially outperforms existing aesthetic predictors (LAION-AES, ArtiMuse, UniPercept) in generalization across diverse image categories. **Key advantages:** - Balanced predictions across photography, anime, design, everyday snapshots, and film photography - No systematic bias toward specific image types (e.g., AI-generated content or black-and-white photos) - Swiss-tournament based pairwise annotation for high-quality training labels - Achieves **0.7445 SRCC** and **0.7598 PLCC** on ERIA-1K benchmark ## 🔍 Motivation Off-the-shelf aesthetic predictors exhibit systematic biases: | Model | Bias | |-------|------| | LAION-Aesthetic | Disproportionately high scores for AI-generated/anime content | | ArtiMuse | Overscores black-and-white photography and casual everyday snapshots | | UniPercept | Strong preference for monochrome images; overscores casual snapshots | ERNIE-Image-Aes addresses these failure modes through a purpose-built annotation pipeline with explicit category balance. ## 📊 Results on ERIA-1K Benchmark | Model | SRCC | PLCC | |-------|------|------| | LAION AES | 0.2944 | 0.3138 | | ArtiMuse | 0.4277 | 0.4704 | | UniPercept | 0.4533 | 0.4748 | | **ERNIE-Image-Aes** | **0.7445** | **0.7598** | **Annotation Protocol:** - Pairwise Swiss-system tournament for stable and reproducible rankings - Tier labels from 1 to 10 - Annotators recruited from professional backgrounds (Central Academy of Fine Arts, Sichuan Fine Arts Institute, Communication University of China, etc.) - All annotators passed aesthetic calibration screening prior to participation ## ⚙️ Setup Please follow the setup instructions in the [ArtiMuse repository](https://huggingface.co/Thunderbolt215215/ArtiMuse). ## 🙏 Acknowledgements Our work builds upon [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and [InternVL-3](https://github.com/OpenGVLab/InternVL). We sincerely thank the authors for their excellent contributions to the community. ## ✒️ Citation If you find this work useful, please consider citing: ```bibtex ```