Jin Song

Add files using upload-large-folder tool

d9a5819 verified 14 days ago

4.6 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: image-text-to-text
	base_model: AIDC-AI/Ovis2.5-9B
	datasets:
	- SA-BENCH
	tags:
	- multimodal
	- vision-language
	- image-quality-assessment
	- aesthetics
	- spatial-aesthetics
	- interior-design
	---

	# SA-IQA Model

	SA-IQA is a multimodal image quality assessment model released with “Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics.”

	The released final checkpoint is `sa-iqa-prompt4`, a fine-tuned model based on Ovis2.5-9B for assessing interior-image spatial aesthetics.

	## Hugging Face Release Layout

	This Hugging Face repository is released as a full model bundle. Download the whole repository to `./SA-IQA-model` when using it with the SA-IQA codebase.

	The `sa-iqa-prompt4/` directory is the released final fine-tuned checkpoint for inference. The `Ovis2.5-9B/` directory is the bundled base model copy used by `tools/train_sft.sh` for training and reproducibility.

	Because this repository contains two model directories, automatic loading from the repository root is not expected to work. Load the fine-tuned checkpoint from `SA-IQA-model/sa-iqa-prompt4`, or pass that path through the SA-IQA inference script with `--model_path`.

	## Model Details

	### Model Description

	- Model type: multimodal vision-language model for image quality assessment
	- Base model: Ovis2.5-9B
	- Fine-tuned checkpoint: sa-iqa-prompt4
	- Input: image plus a dimension-specific text prompt
	- Output: textual quality label and token log-probabilities used to compute a continuous score
	- Dimensions: distortion, harmony, layout, lighting

	### Intended Use

	SA-IQA is intended for research, evaluation, and application use, including:

	- spatial aesthetic assessment of interior images
	- image quality benchmarking on SA-BENCH
	- reward-model research for image generation and best-of-N selection
	- comparison of prompt variants for spatial aesthetic assessment

	### Out-of-Scope Use

	The model is not intended for:

	- universal aesthetic judgment outside the interior-scene domain
	- safety-critical or legally binding decision making

	## Usage

	Use the SA-IQA inference script from the code repository:

	```bash
	python tools/infer.py --prompt_version 4 --mode all --dimension lighting
	```

	When running from the release bundle root, the default model path is:

	```text
	SA-IQA-model/sa-iqa-prompt4
	```

	If you downloaded this Hugging Face repository to another local path, pass the nested `sa-iqa-prompt4` checkpoint path through `--model_path`.

	## Release Bundle Structure

	```text
	SA-IQA-model/
	├── LICENSE
	├── README.md
	├── Ovis2.5-9B/ # Base model used by training scripts
	│ ├── LICENSE
	│ ├── NOTICE
	│ ├── config.json
	│ ├── modeling_ovis2_5.py
	│ ├── model-00001-of-00004.safetensors
	│ ├── model-00002-of-00004.safetensors
	│ ├── model-00003-of-00004.safetensors
	│ ├── model-00004-of-00004.safetensors
	│ └── ...
	└── sa-iqa-prompt4/ # Fine-tuned checkpoint used for inference
	├── config.json
	├── modeling_ovis2_5.py
	├── model-00001-of-00004.safetensors
	├── model-00002-of-00004.safetensors
	├── model-00003-of-00004.safetensors
	├── model-00004-of-00004.safetensors
	└── ...
	```

	## Training Data

	The model is fine-tuned and evaluated on SA-BENCH, a 17,768-example benchmark for spatial aesthetics in interior scenes.

	## Limitations

	- The model is designed for interior images and may not generalize to other image domains.
	- Predictions are based on the SA-BENCH annotation protocol and prompt design.
	- The output should be treated as an assessment signal, not as a definitive human aesthetic judgment.

	## License

	The released SA-IQA model weights are licensed under the Apache License 2.0. See `LICENSE` for the full license text.

	This model is fine-tuned from Ovis2.5-9B, which is also released under the Apache License 2.0. When redistributing or modifying this model, retain attribution and relevant notices from the base model:

	- `Ovis2.5-9B/LICENSE`
	- `Ovis2.5-9B/NOTICE`

	## Citation

	If you use this model, please cite:

	```bibtex
	@inproceedings{gao2025beyond,
	title={Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics},
	author={Gao, Yuan and Song, Jin and Fei, Yiyun and Li, Gongzhe and Yang, Ruigao},
	booktitle={CVPR 2025 Workshop},
	year={2025}
	}
	```