ERNIE-Image-Aes / README.md
olenet's picture
Update README.md
8a0bb5e verified
---
license: apache-2.0
---
# ERNIE-Image-Aes: Robust Image Aesthetics Scoring with Balanced Category Generalization
<!-- ![tier_preview](https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/EVsGxYPd7kVWWIKlBj-d9.png) -->
<img src="https://cdn-uploads.huggingface.co/production/uploads/5f8d780e5d083370c711f575/EVsGxYPd7kVWWIKlBj-d9.png" width="100%">
[πŸ“„ Paper]
## 🌟 Highlights
ERNIE-Image-Aes is a 8B vision-language model for image aesthetic scoring, initialized from [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and fine-tuned on a diverse, professionally annotated dataset. It substantially outperforms existing aesthetic predictors (LAION-AES, ArtiMuse, UniPercept) in generalization across diverse image categories.
**Key advantages:**
- Balanced predictions across photography, anime, design, everyday snapshots, and film photography
- No systematic bias toward specific image types (e.g., AI-generated content or black-and-white photos)
- Swiss-tournament based pairwise annotation for high-quality training labels
- Achieves **0.7445 SRCC** and **0.7598 PLCC** on ERIA-1K benchmark
## πŸ” Motivation
Off-the-shelf aesthetic predictors exhibit systematic biases:
| Model | Bias |
|-------|------|
| LAION-Aesthetic | Disproportionately high scores for AI-generated/anime content |
| ArtiMuse | Overscores black-and-white photography and casual everyday snapshots |
| UniPercept | Strong preference for monochrome images; overscores casual snapshots |
ERNIE-Image-Aes addresses these failure modes through a purpose-built annotation pipeline with explicit category balance.
## πŸ“Š Results on ERIA-1K Benchmark
| Model | SRCC | PLCC |
|-------|------|------|
| LAION AES | 0.2944 | 0.3138 |
| ArtiMuse | 0.4277 | 0.4704 |
| UniPercept | 0.4533 | 0.4748 |
| **ERNIE-Image-Aes** | **0.7445** | **0.7598** |
**Annotation Protocol:**
- Pairwise Swiss-system tournament for stable and reproducible rankings
- Tier labels from 1 to 10
- Annotators recruited from professional backgrounds (Central Academy of Fine Arts, Sichuan Fine Arts Institute, Communication University of China, etc.)
- All annotators passed aesthetic calibration screening prior to participation
## βš™οΈ Setup
Please follow the setup instructions in the [ArtiMuse repository](https://huggingface.co/Thunderbolt215215/ArtiMuse).
## πŸ™ Acknowledgements
Our work builds upon [ArtiMuse](https://github.com/thunderbolt215/ArtiMuse) and [InternVL-3](https://github.com/OpenGVLab/InternVL). We sincerely thank the authors for their excellent contributions to the community.
## βœ’οΈ Citation
If you find this work useful, please consider citing:
```bibtex
```