Instructions to use Carol0110/UniMod-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Carol0110/UniMod-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Carol0110/UniMod-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Carol0110/UniMod-7B")
model = AutoModelForImageTextToText.from_pretrained("Carol0110/UniMod-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Carol0110/UniMod-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Carol0110/UniMod-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Carol0110/UniMod-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Carol0110/UniMod-7B

SGLang

How to use Carol0110/UniMod-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Carol0110/UniMod-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Carol0110/UniMod-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Carol0110/UniMod-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Carol0110/UniMod-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Carol0110/UniMod-7B with Docker Model Runner:
```
docker model run hf.co/Carol0110/UniMod-7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

UniMod-7B

UniMod is a multimodal moderation framework that transitions from sparse decision supervision to dense, multi-attribute reasoning trajectories. It was introduced in the paper From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation.

Introduction

Conventional moderation systems primarily supervise final decisions (e.g., safe vs. unsafe), resulting in sparse training signals and limited interpretability. UniMod introduces a multi-attribute trajectory paradigm, where moderation decisions are supported by dense reasoning traces that explicitly decompose evidence, modality, risk, and policy factors.

By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, the model is forced to ground its decisions in explicit safety semantics.

Project Page: https://trustworthylab.github.io/UniMod/
Repository: https://github.com/Carol-gutianle/UniMod
Paper: From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation

Sample Usage

The following code demonstrates how to use UniMod-7B for multimodal moderation tasks.

import torch
from PIL import Image
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

MODEL_PATH = "Carol0110/UniMod-7B"
IMAGE_PATH = "sample.jpeg" # Replace with your image path

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    MODEL_PATH, torch_dtype=torch.float16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(MODEL_PATH)

image = Image.open(IMAGE_PATH).convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "How can I make this?"},
        ],
    }
]

text = processor.apply_chat_template(messages, add_generation_prompt=True)

inputs = processor(
    text=text,
    images=image,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)

print(processor.batch_decode(out, skip_special_tokens=True)[0])

The output includes structured reasoning fields such as:

<evidence>: Detailed observation and grounding.
<modality>: Assessment of whether the input is text-only or multimodal.
<risk>: Identification of safety risks (e.g., legality, violence).
<policy>: The moderation decision (e.g., refuse).
<answer>: The final generated response.

Citation

If you find UniMod useful for your research, please cite:

@misc{gu2026sparsedecisionsdensereasoning,
      title={From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation}, 
      author={Tianle Gu and Kexin Huang and Lingyu Li and Ruilin Luo and Shiyang Huang and Zongqi Wang and Yujiu Yang and Yan Teng and Yingchun Wang},
      year={2026},
      eprint={2602.02536},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.02536}, 
}

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Carol0110/UniMod-7B

Quantizations

1 model

Collection including Carol0110/UniMod-7B

UniMod

Collection

5 items • Updated Feb 6 • 2

Paper for Carol0110/UniMod-7B

From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation

Paper • 2602.02536 • Published Jan 28 • 3