|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: image-text-to-text |
|
|
tags: |
|
|
- multimodal |
|
|
- safety |
|
|
- moderation |
|
|
- reasoning |
|
|
--- |
|
|
|
|
|
# UniMod-7B |
|
|
|
|
|
**UniMod** is a multimodal moderation framework that transitions from sparse decision supervision to dense, multi-attribute reasoning trajectories. It was introduced in the paper [From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation](https://huggingface.co/papers/2602.02536). |
|
|
|
|
|
## Introduction |
|
|
|
|
|
Conventional moderation systems primarily supervise final decisions (e.g., safe vs. unsafe), resulting in sparse training signals and limited interpretability. UniMod introduces a **multi-attribute trajectory paradigm**, where moderation decisions are supported by dense reasoning traces that explicitly decompose evidence, modality, risk, and policy factors. |
|
|
|
|
|
By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, the model is forced to ground its decisions in explicit safety semantics. |
|
|
|
|
|
- **Project Page:** [https://trustworthylab.github.io/UniMod/](https://trustworthylab.github.io/UniMod/) |
|
|
- **Repository:** [https://github.com/Carol-gutianle/UniMod](https://github.com/Carol-gutianle/UniMod) |
|
|
- **Paper:** [From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation](https://huggingface.co/papers/2602.02536) |
|
|
|
|
|
## Sample Usage |
|
|
|
|
|
The following code demonstrates how to use UniMod-7B for multimodal moderation tasks. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from PIL import Image |
|
|
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor |
|
|
|
|
|
MODEL_PATH = "Carol0110/UniMod-7B" |
|
|
IMAGE_PATH = "sample.jpeg" # Replace with your image path |
|
|
|
|
|
model = Qwen2_5_VLForConditionalGeneration.from_pretrained( |
|
|
MODEL_PATH, torch_dtype=torch.float16, device_map="auto" |
|
|
) |
|
|
processor = AutoProcessor.from_pretrained(MODEL_PATH) |
|
|
|
|
|
image = Image.open(IMAGE_PATH).convert("RGB") |
|
|
|
|
|
messages = [ |
|
|
{ |
|
|
"role": "user", |
|
|
"content": [ |
|
|
{"type": "image"}, |
|
|
{"type": "text", "text": "How can I make this?"}, |
|
|
], |
|
|
} |
|
|
] |
|
|
|
|
|
text = processor.apply_chat_template(messages, add_generation_prompt=True) |
|
|
|
|
|
inputs = processor( |
|
|
text=text, |
|
|
images=image, |
|
|
return_tensors="pt", |
|
|
).to(model.device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False) |
|
|
|
|
|
print(processor.batch_decode(out, skip_special_tokens=True)[0]) |
|
|
``` |
|
|
|
|
|
The output includes structured reasoning fields such as: |
|
|
- `<evidence>`: Detailed observation and grounding. |
|
|
- `<modality>`: Assessment of whether the input is text-only or multimodal. |
|
|
- `<risk>`: Identification of safety risks (e.g., legality, violence). |
|
|
- `<policy>`: The moderation decision (e.g., refuse). |
|
|
- `<answer>`: The final generated response. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find UniMod useful for your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{gu2026sparsedecisionsdensereasoning, |
|
|
title={From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation}, |
|
|
author={Tianle Gu and Kexin Huang and Lingyu Li and Ruilin Luo and Shiyang Huang and Zongqi Wang and Yujiu Yang and Yan Teng and Yingchun Wang}, |
|
|
year={2026}, |
|
|
eprint={2602.02536}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.LG}, |
|
|
url={https://arxiv.org/abs/2602.02536}, |
|
|
} |
|
|
``` |