Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks

Zhichao Yang^1†, Jianjie Wang^1†, Zhixianhe Zhang¹, Pangu Xie¹, Xiangfei Sheng¹, Pengfei Chen¹, Leida Li^1,2*

¹School of Artificial Intelligence, ²State Key Laboratory of EMIM, Xidian University

^†Equal contribution ^*Corresponding author

News

[2026-04-10] ✨✨ The Inference Code and Pre-trained Weights, are now publicly available. A demo video demonstrating FGAesQ's application in LivePhoto Cover Recommendation is also provided.
[2026-04-09] 🎉🎉 Congratulations! Our paper has been accepted for an Oral Presentation at CVPR 2026.
[2026-02-21] 🎉🎉 Our paper, "Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks", has been accepted to CVPR 2026!

## Applicatons (More scenarios will be uncovered)

## Quick Start This guide will help you get started with FGAesQ inference in minutes. ### 1. Installation Clone the repository and install the required dependencies: ```bash git clone https://github.com/yzc-ippl/FG-IAA.git cd FG-IAA pip install -r requirements.txt ``` > **Note:** The CLIP dependency is installed directly from the official OpenAI repository and will be fetched automatically via `pip install -r requirements.txt`. ### 2. Download Pre-trained Weights Download the pre-trained model weights from: [**(Hugging Face)**](https://huggingface.co/yzc002/FGAesQ) | [**(Baidu Netdisk)**](#) Place the downloaded weight file at a path of your choice and set `MODEL_PATH` accordingly in the inference scripts. The expected project structure is as follows: ``` FG-IAA/ FGAesQ_Inference/ ├──utils/ ├── FGAesQ.py # Model definition ├── DiffToken.py # Differential token preprocessing ├── data_utils.py └── clip_vit_base_16_224.pt ├── inference_series.py # Series-mode inference ├── inference_single.py # Single-image inference ├── requirements.txt README.md ``` ### 3. Run Inference FGAesQ supports two inference modes: **Series Mode** for photo series ranking, and **Single Mode** for individual image scoring. --- #### 🖼️ Mode 1 — Single Image / Folder Scoring Use `inference_single.py` to score a single image or all images within a folder. **Configuration** (edit the `main()` function in `inference_single.py`): ```python MODEL_PATH = "path/to/your/model.pt" # Path to the pre-trained weights INPUT_PATH = "path/to/image_or_folder" # Single image file or folder of images OUTPUT_TXT = "path/to/output.txt" # Output txt path (folder mode only; set None to auto-generate) DEVICE = "cuda" BATCH_SIZE = 128 ``` **Run:** ```bash python inference_single.py ``` **Output format** (`single_result.txt`): ``` Total: 3 ============================================================ 1. photo_A.jpg 0.872314 2. photo_B.jpg 0.751203 3. photo_C.jpg 0.634891 ``` - **Single image**: the predicted aesthetic score is printed directly to the terminal. - **Folder**: a ranked list of all images with scores is saved to `OUTPUT_TXT`. --- #### 📂 Mode 2 — Photo Series Ranking Use `inference_series.py` to rank images within multiple photo series simultaneously. The input folder should contain one sub-folder per series, with image files named in the format `{series_id}-{index}.jpg` (e.g., `000009-01.jpg`, `000009-02.jpg`). ``` input_folder/ 000009/ ├── 000009-01.jpg ├── 000009-02.jpg └── 000009-03.jpg 000010/ ├── 000010-01.jpg └── 000010-02.jpg ... ``` **Configuration** (edit the `main()` function in `inference_series.py`): ```python MODEL_PATH = "path/to/your/model.pt" # Path to the pre-trained weights INPUT_FOLDER = "path/to/series_folder" # Root folder containing all series sub-folders OUTPUT_FOLDER = "path/to/series_result" # Output directory for per-series result txt files DEVICE = "cuda:0" BATCH_SIZE = 64 MAX_SIZE = 2048 # Max image resolution (long edge). Use None for no limit. # Recommended: 2048 if many images exceed this resolution. ``` **Run:** ```bash python inference_series.py ``` **Output format** (one `{series_id}_result.txt` per series in `OUTPUT_FOLDER`): ``` Series: 9 Count: 3 ============================================================ Ranking: 000009-02.jpg 000009-01.jpg 000009-03.jpg Scores: 0.8812 0.7654 0.6231 Order: 000009-02.jpg > 000009-01.jpg > 000009-03.jpg ``` Each output file contains the predicted ranking and aesthetic scores for all images in that series, sorted from best to worst. --- ## Citation If you find this work useful, please cite our paper! ```bibtex @article{yang2026fine, title={Fine-grained Image Aesthetic Assessment: Learning Discriminative Scores from Relative Ranks}, author={Yang, Zhichao and Wang, Jianjie and Zhang, Zhixianhe and Xie, Pangu and Sheng, Xiangfei and Chen, Pengfei and Li, Leida}, journal={arXiv preprint arXiv:2603.03907}, year={2026} } ```