SA-IQA-model / README.md
Jin Song
Add files using upload-large-folder tool
d9a5819 verified
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: image-text-to-text
base_model: AIDC-AI/Ovis2.5-9B
datasets:
- SA-BENCH
tags:
- multimodal
- vision-language
- image-quality-assessment
- aesthetics
- spatial-aesthetics
- interior-design
---
# SA-IQA Model
SA-IQA is a multimodal image quality assessment model released with **β€œBeyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics.”**
The released final checkpoint is **`sa-iqa-prompt4`**, a fine-tuned model based on **Ovis2.5-9B** for assessing interior-image spatial aesthetics.
## Hugging Face Release Layout
This Hugging Face repository is released as a full model bundle. Download the whole repository to `./SA-IQA-model` when using it with the SA-IQA codebase.
The `sa-iqa-prompt4/` directory is the released final fine-tuned checkpoint for inference. The `Ovis2.5-9B/` directory is the bundled base model copy used by `tools/train_sft.sh` for training and reproducibility.
Because this repository contains two model directories, automatic loading from the repository root is not expected to work. Load the fine-tuned checkpoint from `SA-IQA-model/sa-iqa-prompt4`, or pass that path through the SA-IQA inference script with `--model_path`.
## Model Details
### Model Description
- **Model type:** multimodal vision-language model for image quality assessment
- **Base model:** Ovis2.5-9B
- **Fine-tuned checkpoint:** sa-iqa-prompt4
- **Input:** image plus a dimension-specific text prompt
- **Output:** textual quality label and token log-probabilities used to compute a continuous score
- **Dimensions:** distortion, harmony, layout, lighting
### Intended Use
SA-IQA is intended for research, evaluation, and application use, including:
- spatial aesthetic assessment of interior images
- image quality benchmarking on SA-BENCH
- reward-model research for image generation and best-of-N selection
- comparison of prompt variants for spatial aesthetic assessment
### Out-of-Scope Use
The model is not intended for:
- universal aesthetic judgment outside the interior-scene domain
- safety-critical or legally binding decision making
## Usage
Use the SA-IQA inference script from the code repository:
```bash
python tools/infer.py --prompt_version 4 --mode all --dimension lighting
```
When running from the release bundle root, the default model path is:
```text
SA-IQA-model/sa-iqa-prompt4
```
If you downloaded this Hugging Face repository to another local path, pass the nested `sa-iqa-prompt4` checkpoint path through `--model_path`.
## Release Bundle Structure
```text
SA-IQA-model/
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ Ovis2.5-9B/ # Base model used by training scripts
β”‚ β”œβ”€β”€ LICENSE
β”‚ β”œβ”€β”€ NOTICE
β”‚ β”œβ”€β”€ config.json
β”‚ β”œβ”€β”€ modeling_ovis2_5.py
β”‚ β”œβ”€β”€ model-00001-of-00004.safetensors
β”‚ β”œβ”€β”€ model-00002-of-00004.safetensors
β”‚ β”œβ”€β”€ model-00003-of-00004.safetensors
β”‚ β”œβ”€β”€ model-00004-of-00004.safetensors
β”‚ └── ...
└── sa-iqa-prompt4/ # Fine-tuned checkpoint used for inference
β”œβ”€β”€ config.json
β”œβ”€β”€ modeling_ovis2_5.py
β”œβ”€β”€ model-00001-of-00004.safetensors
β”œβ”€β”€ model-00002-of-00004.safetensors
β”œβ”€β”€ model-00003-of-00004.safetensors
β”œβ”€β”€ model-00004-of-00004.safetensors
└── ...
```
## Training Data
The model is fine-tuned and evaluated on SA-BENCH, a 17,768-example benchmark for spatial aesthetics in interior scenes.
## Limitations
- The model is designed for interior images and may not generalize to other image domains.
- Predictions are based on the SA-BENCH annotation protocol and prompt design.
- The output should be treated as an assessment signal, not as a definitive human aesthetic judgment.
## License
The released SA-IQA model weights are licensed under the Apache License 2.0. See `LICENSE` for the full license text.
This model is fine-tuned from Ovis2.5-9B, which is also released under the Apache License 2.0. When redistributing or modifying this model, retain attribution and relevant notices from the base model:
- `Ovis2.5-9B/LICENSE`
- `Ovis2.5-9B/NOTICE`
## Citation
If you use this model, please cite:
```bibtex
@inproceedings{gao2025beyond,
title={Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics},
author={Gao, Yuan and Song, Jin and Fei, Yiyun and Li, Gongzhe and Yang, Ruigao},
booktitle={CVPR 2025 Workshop},
year={2025}
}
```