Instructions to use q-future/Q-ReAlign-Mini-0.8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use q-future/Q-ReAlign-Mini-0.8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="q-future/Q-ReAlign-Mini-0.8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("q-future/Q-ReAlign-Mini-0.8B")
model = AutoModelForMultimodalLM.from_pretrained("q-future/Q-ReAlign-Mini-0.8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use q-future/Q-ReAlign-Mini-0.8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "q-future/Q-ReAlign-Mini-0.8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "q-future/Q-ReAlign-Mini-0.8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/q-future/Q-ReAlign-Mini-0.8B

SGLang

How to use q-future/Q-ReAlign-Mini-0.8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "q-future/Q-ReAlign-Mini-0.8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "q-future/Q-ReAlign-Mini-0.8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "q-future/Q-ReAlign-Mini-0.8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "q-future/Q-ReAlign-Mini-0.8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use q-future/Q-ReAlign-Mini-0.8B with Docker Model Runner:
```
docker model run hf.co/q-future/Q-ReAlign-Mini-0.8B
```

Q-ReAlign — Mini (0.8B)

Lightweight, human-aligned multimodal quality judge built on a modern Qwen3.5 vision-language backbone.

Q-Align-level quality at 0.8B — the fast, tiny variant: 26.7 img/s on an RTX 4090, and still beats the original Q-Align on average.

GitHub · Method · Adapting guide · Mini (0.8B) · Lite (4B) · Pro (9B)

What this is

Q-ReAlign scores the perceptual quality / aesthetic appeal of an image or video the way Q-Align does: the model is asked to rate quality, and the probability mass it places on the discrete words excellent / good / fair / poor / bad is collapsed — via a fixed weighting [1.0, 0.75, 0.5, 0.25, 0.0] — into a single scalar in [0, 1].

Mini (0.8B) is the smallest of three sizes (Mini 0.8B · Lite 4B · Pro 9B) and the throughput champion. Despite its size it matches or beats the original Q-Align on average across the seven QA benchmarks — ideal for high-volume scoring, edge / consumer GPUs, and reward-model loops.

Backbone: Qwen3.5-VL (model_type: qwen3_5), hybrid linear/full attention text tower + SigLIP-style vision encoder
Tasks: IQA (image quality) · IAA (image aesthetics) · VQA (video quality) — the unified ONE-ALIGN setting
Training: full-parameter SFT in bf16 via ms-swift, vision tower + projector trainable
Precision: bfloat16 · dtype auto

Results

Per-dataset SRCC / PLCC on seven QA benchmarks. Mini (0.8B) reaches avg SRCC 0.879 vs. Q-Align's 0.869.

Model	KonIQ	SPAQ	KADID	AGI	LIVE	AVA	LSVQ	Avg.
Q-Align	0.942 / 0.944	0.932 / 0.933	0.912 / 0.920	0.738 / 0.781	0.897 / 0.870	0.798 / 0.796	0.867 / 0.866	0.869 / 0.873
Mini (0.8B)	0.935 / 0.938	0.931 / 0.933	0.903 / 0.907	0.811 / 0.848	0.907 / 0.873	0.797 / 0.794	0.869 / 0.869	0.879 / 0.880

_{Each cell is SRCC / PLCC, on the full evaluation sets (KonIQ, SPAQ, KADID, AGI, LIVE, AVA, LSVQ).}

Speed

On the SPAQ dataset, Mini tops out at 26.7 img/s @ batch size 4 on a consumer RTX 4090 and 61.1 img/s @ batch size 14 on an H200 141GB.

Quick start

import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
# transformers >= 5.2.0 for Qwen3.5 support

CKPT, IMAGE = "q-future/Q-ReAlign-Mini-0.8B", "photo.jpg"
LEVELS  = ["excellent", "good", "fair", "poor", "bad"]
WEIGHTS = [1.0, 0.75, 0.5, 0.25, 0.0]

device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained(CKPT)
model = AutoModelForImageTextToText.from_pretrained(CKPT, dtype="auto").to(device).eval()

messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "How would you rate the quality of this image?"},
]}]
text = processor.apply_chat_template(messages, add_generation_prompt=True) + "The quality of the image is"
inputs = processor(text=[text], images=[Image.open(IMAGE).convert("RGB")], return_tensors="pt").to(device)

ids = [processor.tokenizer(" " + w, add_special_tokens=False).input_ids[0] for w in LEVELS]
probs = model(**inputs).logits[0, -1, ids].softmax(-1)
score = (probs * torch.tensor(WEIGHTS, device=device)).sum().item()
print(f"quality score: {score:.4f}")   # 0 (worst) .. 1 (best)

The score is the expected value of the level weights under the model's next-token distribution over the five level words — no sampling, one forward pass.

Aesthetics or video

Swap the prompt for the task:

Aesthetics (IAA): "How would you rate the aesthetics of this image?" → stem "The aesthetics of the image is"
Video (VQA): sample N frames (default 8) and pass them as the image sequence; prompt "How would you rate the quality of this video?" → stem "The quality of the video is"

Model details

	Mini (0.8B)
Architecture	`Qwen3_5ForConditionalGeneration`
Text hidden size	1024
Text layers	24 (linear attention with full-attention every 4th layer)
Vision encoder depth	12, hidden 768, patch 16, spatial merge 2
Vocab	248320
Context length	up to 262144
Tied embeddings	yes
Tensor dtype	bfloat16
Weights	single safetensors (~2.2 GB)

Scoring contract

Level vocabulary: excellent, good, fair, poor, bad
Weights: [1.0, 0.75, 0.5, 0.25, 0.0]
Output: scalar in [0, 1], higher = better
The five level tokens are matched with a leading space (" excellent", …); keep that when porting to other tokenizers.

Intended use & limitations

Use: no-reference image/video quality assessment, aesthetic scoring, dataset curation, ranking and filtering generated media, reward signals for generative pipelines — especially where throughput matters.
Out of scope: safety/content moderation, factual or identity judgments, medical/forensic grading. Quality is perceptual and dataset-conditioned.
Scores are calibrated to the training MOS distribution; absolute values are most meaningful relative to one another. Re-calibrate before mixing with other scales.

Acknowledgements & citation

Built on the shoulders of Q-Align (the discrete text-defined-levels method and ONE-ALIGN), ms-swift (training/inference backbone), and Qwen3.5-VL (the vision-language backbone). If you use this model, please also cite the originals:

@inproceedings{wu2024qalign,
  title     = {Q-Align: Teaching {LMM}s for Visual Scoring via Discrete Text-Defined Levels},
  author    = {Wu, Haoning and Zhang, Zicheng and Zhang, Weixia and Chen, Chaofeng and
               Liao, Liang and Li, Chunyi and Gao, Yixuan and Wang, Annan and Zhang, Erli and
               Sun, Wenxiu and Yan, Qiong and Min, Xiongkuo and Zhai, Guangtao and Lin, Weisi},
  booktitle = {Proceedings of the 41st International Conference on Machine Learning (ICML)},
  year      = {2024}
}

@inproceedings{swift2025,
  title     = {{SWIFT}: A Scalable Lightweight Infrastructure for Fine-Tuning},
  author    = {ModelScope Team},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2025},
  note      = {\url{https://github.com/modelscope/ms-swift}}
}

@misc{qwen3_5,
  title        = {Qwen3.5: Towards Native Multimodal Agents},
  author       = {Qwen Team},
  year         = {2025},
  howpublished = {\url{https://github.com/QwenLM/Qwen3-VL}}
}

Downloads last month: 57

Safetensors

Model size

1B params

Tensor type

BF16

Collection including q-future/Q-ReAlign-Mini-0.8B

Q-ReAlign

Collection

The Q-ReAlign Collection • 3 items • Updated 3 days ago • 1