SA-IQA Model

SA-IQA is a multimodal image quality assessment model released with “Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics.”

The released final checkpoint is sa-iqa-prompt4, a fine-tuned model based on Ovis2.5-9B for assessing interior-image spatial aesthetics.

Hugging Face Release Layout

This Hugging Face repository is released as a full model bundle. Download the whole repository to ./SA-IQA-model when using it with the SA-IQA codebase.

The sa-iqa-prompt4/ directory is the released final fine-tuned checkpoint for inference. The Ovis2.5-9B/ directory is the bundled base model copy used by tools/train_sft.sh for training and reproducibility.

Because this repository contains two model directories, automatic loading from the repository root is not expected to work. Load the fine-tuned checkpoint from SA-IQA-model/sa-iqa-prompt4, or pass that path through the SA-IQA inference script with --model_path.

Model Details

Model Description

Model type: multimodal vision-language model for image quality assessment
Base model: Ovis2.5-9B
Fine-tuned checkpoint: sa-iqa-prompt4
Input: image plus a dimension-specific text prompt
Output: textual quality label and token log-probabilities used to compute a continuous score
Dimensions: distortion, harmony, layout, lighting

Intended Use

SA-IQA is intended for research, evaluation, and application use, including:

spatial aesthetic assessment of interior images
image quality benchmarking on SA-BENCH
reward-model research for image generation and best-of-N selection
comparison of prompt variants for spatial aesthetic assessment

Out-of-Scope Use

The model is not intended for:

universal aesthetic judgment outside the interior-scene domain
safety-critical or legally binding decision making

Usage

Use the SA-IQA inference script from the code repository:

python tools/infer.py --prompt_version 4 --mode all --dimension lighting

When running from the release bundle root, the default model path is:

SA-IQA-model/sa-iqa-prompt4

If you downloaded this Hugging Face repository to another local path, pass the nested sa-iqa-prompt4 checkpoint path through --model_path.

Release Bundle Structure

SA-IQA-model/
├── LICENSE
├── README.md
├── Ovis2.5-9B/                  # Base model used by training scripts
│   ├── LICENSE
│   ├── NOTICE
│   ├── config.json
│   ├── modeling_ovis2_5.py
│   ├── model-00001-of-00004.safetensors
│   ├── model-00002-of-00004.safetensors
│   ├── model-00003-of-00004.safetensors
│   ├── model-00004-of-00004.safetensors
│   └── ...
└── sa-iqa-prompt4/              # Fine-tuned checkpoint used for inference
    ├── config.json
    ├── modeling_ovis2_5.py
    ├── model-00001-of-00004.safetensors
    ├── model-00002-of-00004.safetensors
    ├── model-00003-of-00004.safetensors
    ├── model-00004-of-00004.safetensors
    └── ...

Training Data

The model is fine-tuned and evaluated on SA-BENCH, a 17,768-example benchmark for spatial aesthetics in interior scenes.

Limitations

The model is designed for interior images and may not generalize to other image domains.
Predictions are based on the SA-BENCH annotation protocol and prompt design.
The output should be treated as an assessment signal, not as a definitive human aesthetic judgment.

License

The released SA-IQA model weights are licensed under the Apache License 2.0. See LICENSE for the full license text.

This model is fine-tuned from Ovis2.5-9B, which is also released under the Apache License 2.0. When redistributing or modifying this model, retain attribution and relevant notices from the base model:

Ovis2.5-9B/LICENSE
Ovis2.5-9B/NOTICE

Citation

If you use this model, please cite:

@inproceedings{gao2025beyond,
  title={Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics},
  author={Gao, Yuan and Song, Jin and Fei, Yiyun and Li, Gongzhe and Yang, Ruigao},
  booktitle={CVPR 2025 Workshop},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AliHome3D/SA-IQA-model

Base model

ATH-MaaS/Ovis2.5-9B

Finetuned

(2)

this model