Image-Text-to-Text
Transformers
Safetensors
English
multimodal
vision-language
image-quality-assessment
aesthetics
spatial-aesthetics
interior-design
Instructions to use AliHome3D/SA-IQA-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AliHome3D/SA-IQA-model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="AliHome3D/SA-IQA-model")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AliHome3D/SA-IQA-model", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AliHome3D/SA-IQA-model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AliHome3D/SA-IQA-model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AliHome3D/SA-IQA-model", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AliHome3D/SA-IQA-model
- SGLang
How to use AliHome3D/SA-IQA-model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AliHome3D/SA-IQA-model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AliHome3D/SA-IQA-model", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AliHome3D/SA-IQA-model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AliHome3D/SA-IQA-model", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AliHome3D/SA-IQA-model with Docker Model Runner:
docker model run hf.co/AliHome3D/SA-IQA-model
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: image-text-to-text | |
| base_model: AIDC-AI/Ovis2.5-9B | |
| datasets: | |
| - SA-BENCH | |
| tags: | |
| - multimodal | |
| - vision-language | |
| - image-quality-assessment | |
| - aesthetics | |
| - spatial-aesthetics | |
| - interior-design | |
| # SA-IQA Model | |
| SA-IQA is a multimodal image quality assessment model released with **βBeyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics.β** | |
| The released final checkpoint is **`sa-iqa-prompt4`**, a fine-tuned model based on **Ovis2.5-9B** for assessing interior-image spatial aesthetics. | |
| ## Hugging Face Release Layout | |
| This Hugging Face repository is released as a full model bundle. Download the whole repository to `./SA-IQA-model` when using it with the SA-IQA codebase. | |
| The `sa-iqa-prompt4/` directory is the released final fine-tuned checkpoint for inference. The `Ovis2.5-9B/` directory is the bundled base model copy used by `tools/train_sft.sh` for training and reproducibility. | |
| Because this repository contains two model directories, automatic loading from the repository root is not expected to work. Load the fine-tuned checkpoint from `SA-IQA-model/sa-iqa-prompt4`, or pass that path through the SA-IQA inference script with `--model_path`. | |
| ## Model Details | |
| ### Model Description | |
| - **Model type:** multimodal vision-language model for image quality assessment | |
| - **Base model:** Ovis2.5-9B | |
| - **Fine-tuned checkpoint:** sa-iqa-prompt4 | |
| - **Input:** image plus a dimension-specific text prompt | |
| - **Output:** textual quality label and token log-probabilities used to compute a continuous score | |
| - **Dimensions:** distortion, harmony, layout, lighting | |
| ### Intended Use | |
| SA-IQA is intended for research, evaluation, and application use, including: | |
| - spatial aesthetic assessment of interior images | |
| - image quality benchmarking on SA-BENCH | |
| - reward-model research for image generation and best-of-N selection | |
| - comparison of prompt variants for spatial aesthetic assessment | |
| ### Out-of-Scope Use | |
| The model is not intended for: | |
| - universal aesthetic judgment outside the interior-scene domain | |
| - safety-critical or legally binding decision making | |
| ## Usage | |
| Use the SA-IQA inference script from the code repository: | |
| ```bash | |
| python tools/infer.py --prompt_version 4 --mode all --dimension lighting | |
| ``` | |
| When running from the release bundle root, the default model path is: | |
| ```text | |
| SA-IQA-model/sa-iqa-prompt4 | |
| ``` | |
| If you downloaded this Hugging Face repository to another local path, pass the nested `sa-iqa-prompt4` checkpoint path through `--model_path`. | |
| ## Release Bundle Structure | |
| ```text | |
| SA-IQA-model/ | |
| βββ LICENSE | |
| βββ README.md | |
| βββ Ovis2.5-9B/ # Base model used by training scripts | |
| β βββ LICENSE | |
| β βββ NOTICE | |
| β βββ config.json | |
| β βββ modeling_ovis2_5.py | |
| β βββ model-00001-of-00004.safetensors | |
| β βββ model-00002-of-00004.safetensors | |
| β βββ model-00003-of-00004.safetensors | |
| β βββ model-00004-of-00004.safetensors | |
| β βββ ... | |
| βββ sa-iqa-prompt4/ # Fine-tuned checkpoint used for inference | |
| βββ config.json | |
| βββ modeling_ovis2_5.py | |
| βββ model-00001-of-00004.safetensors | |
| βββ model-00002-of-00004.safetensors | |
| βββ model-00003-of-00004.safetensors | |
| βββ model-00004-of-00004.safetensors | |
| βββ ... | |
| ``` | |
| ## Training Data | |
| The model is fine-tuned and evaluated on SA-BENCH, a 17,768-example benchmark for spatial aesthetics in interior scenes. | |
| ## Limitations | |
| - The model is designed for interior images and may not generalize to other image domains. | |
| - Predictions are based on the SA-BENCH annotation protocol and prompt design. | |
| - The output should be treated as an assessment signal, not as a definitive human aesthetic judgment. | |
| ## License | |
| The released SA-IQA model weights are licensed under the Apache License 2.0. See `LICENSE` for the full license text. | |
| This model is fine-tuned from Ovis2.5-9B, which is also released under the Apache License 2.0. When redistributing or modifying this model, retain attribution and relevant notices from the base model: | |
| - `Ovis2.5-9B/LICENSE` | |
| - `Ovis2.5-9B/NOTICE` | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @inproceedings{gao2025beyond, | |
| title={Beyond Pixels: Benchmarking and Reward-Based Assessing Framework for Visual Spatial Aesthetics}, | |
| author={Gao, Yuan and Song, Jin and Fei, Yiyun and Li, Gongzhe and Yang, Ruigao}, | |
| booktitle={CVPR 2025 Workshop}, | |
| year={2025} | |
| } | |
| ``` | |