wudq
/

EmoCaliber

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

EmoCaliber / README.md

wudq's picture

Improve model card: Add tags and links (#1)

47715dc verified 1 day ago

|

history blame contribute delete

2.13 kB

	---
	license: cc-by-4.0
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	Welcome to EmoCaliber, an MLLM for reliable visual emotion comprehension.

	Paper: [EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration](https://huggingface.co/papers/2512.15528)
	Code / Project Page: [https://github.com/wdqqdw/EmoCaliber](https://github.com/wdqqdw/EmoCaliber)

	Given an image, EmoCaliber is trained to produce structured affective reasoning following this pipeline: (1) identifying prominent visual elements in the image; (2) providing detailed descriptions of human subjects, if present; (3) describing contextual elements beyond the subjects; (4) discussing how these elements interact; and (5) deriving an emotional conclusion based on the preceding observations. The final emotion prediction integrates these visual cues. After outputting the prediction, EmoCaliber also emits a confidence score wrapped in a \<confidence\> tag, which reflects the model’s self-assessed certainty about its answer.

	EmoCaliber is implemented based on Qwen2.5-VL-7B and can perform both inference and training in an identical manner.

	Standard prompt templates:

	For emotion recognition:

	```json
	{
	"conversations": [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": "IMAGE_PATH"},
	{
	"type": "text",
	"text": "Which emotion might this image evoke? Choose the most likely one from ['EMOTION_CATEGORIES']. Think step by step. Respond in the format: <think>{your reasoning}</think><answer>{your final answer}</answer>."
	}
	]
	}
	]
	}
	```

	For sentiment analysis:

	```json
	{
	"conversations": [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": "IMAGE_PATH"},
	{
	"type": "text",
	"text": "What sentiment might this image evoke? Choose the most likely one from ['positive', 'negative']. Think step by step. Respond in the format: <think>{your reasoning}</think><answer>{your final answer}</answer>."
	}
	]
	}
	]
	}
	```