--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: image-text-to-text base_model: AIDC-AI/Ovis2.5-9B datasets: - SA-BENCH tags: - multimodal - vision-language - image-quality-assessment - aesthetics - spatial-aesthetics - interior-design --- # SA-IQA Model SA-IQA is a multimodal image quality assessment model released with **“Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics.”** The released final checkpoint is **`sa-iqa-prompt4`**, a fine-tuned model based on **Ovis2.5-9B** for assessing interior-image spatial aesthetics. ## Hugging Face Release Layout This Hugging Face repository is released as a full model bundle. Download the whole repository to `./SA-IQA-model` when using it with the SA-IQA codebase. The `sa-iqa-prompt4/` directory is the released final fine-tuned checkpoint for inference. The `Ovis2.5-9B/` directory is the bundled base model copy used by `tools/train_sft.sh` for training and reproducibility. Because this repository contains two model directories, automatic loading from the repository root is not expected to work. Load the fine-tuned checkpoint from `SA-IQA-model/sa-iqa-prompt4`, or pass that path through the SA-IQA inference script with `--model_path`. ## Model Details ### Model Description - **Model type:** multimodal vision-language model for image quality assessment - **Base model:** Ovis2.5-9B - **Fine-tuned checkpoint:** sa-iqa-prompt4 - **Input:** image plus a dimension-specific text prompt - **Output:** textual quality label and token log-probabilities used to compute a continuous score - **Dimensions:** distortion, harmony, layout, lighting ### Intended Use SA-IQA is intended for research, evaluation, and application use, including: - spatial aesthetic assessment of interior images - image quality benchmarking on SA-BENCH - reward-model research for image generation and best-of-N selection - comparison of prompt variants for spatial aesthetic assessment ### Out-of-Scope Use The model is not intended for: - universal aesthetic judgment outside the interior-scene domain - safety-critical or legally binding decision making ## Usage Use the SA-IQA inference script from the code repository: ```bash python tools/infer.py --prompt_version 4 --mode all --dimension lighting ``` When running from the release bundle root, the default model path is: ```text SA-IQA-model/sa-iqa-prompt4 ``` If you downloaded this Hugging Face repository to another local path, pass the nested `sa-iqa-prompt4` checkpoint path through `--model_path`. ## Release Bundle Structure ```text SA-IQA-model/ ├── LICENSE ├── README.md ├── Ovis2.5-9B/ # Base model used by training scripts │ ├── LICENSE │ ├── NOTICE │ ├── config.json │ ├── modeling_ovis2_5.py │ ├── model-00001-of-00004.safetensors │ ├── model-00002-of-00004.safetensors │ ├── model-00003-of-00004.safetensors │ ├── model-00004-of-00004.safetensors │ └── ... └── sa-iqa-prompt4/ # Fine-tuned checkpoint used for inference ├── config.json ├── modeling_ovis2_5.py ├── model-00001-of-00004.safetensors ├── model-00002-of-00004.safetensors ├── model-00003-of-00004.safetensors ├── model-00004-of-00004.safetensors └── ... ``` ## Training Data The model is fine-tuned and evaluated on SA-BENCH, a 17,768-example benchmark for spatial aesthetics in interior scenes. ## Limitations - The model is designed for interior images and may not generalize to other image domains. - Predictions are based on the SA-BENCH annotation protocol and prompt design. - The output should be treated as an assessment signal, not as a definitive human aesthetic judgment. ## License The released SA-IQA model weights are licensed under the Apache License 2.0. See `LICENSE` for the full license text. This model is fine-tuned from Ovis2.5-9B, which is also released under the Apache License 2.0. When redistributing or modifying this model, retain attribution and relevant notices from the base model: - `Ovis2.5-9B/LICENSE` - `Ovis2.5-9B/NOTICE` ## Citation If you use this model, please cite: ```bibtex @inproceedings{gao2025beyond, title={Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics}, author={Gao, Yuan and Song, Jin and Fei, Yiyun and Li, Gongzhe and Yang, Ruigao}, booktitle={CVPR 2025 Workshop}, year={2025} } ```