| | --- |
| | library_name: transformers |
| | pipeline_tag: image-text-to-text |
| | tags: |
| | - multimodal |
| | - safety |
| | - moderation |
| | - reasoning |
| | --- |
| | |
| | # UniMod-7B |
| |
|
| | **UniMod** is a multimodal moderation framework that transitions from sparse decision supervision to dense, multi-attribute reasoning trajectories. It was introduced in the paper [From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation](https://huggingface.co/papers/2602.02536). |
| |
|
| | ## Introduction |
| |
|
| | Conventional moderation systems primarily supervise final decisions (e.g., safe vs. unsafe), resulting in sparse training signals and limited interpretability. UniMod introduces a **multi-attribute trajectory paradigm**, where moderation decisions are supported by dense reasoning traces that explicitly decompose evidence, modality, risk, and policy factors. |
| |
|
| | By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, the model is forced to ground its decisions in explicit safety semantics. |
| |
|
| | - **Project Page:** [https://trustworthylab.github.io/UniMod/](https://trustworthylab.github.io/UniMod/) |
| | - **Repository:** [https://github.com/Carol-gutianle/UniMod](https://github.com/Carol-gutianle/UniMod) |
| | - **Paper:** [From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation](https://huggingface.co/papers/2602.02536) |
| |
|
| | ## Sample Usage |
| |
|
| | The following code demonstrates how to use UniMod-7B for multimodal moderation tasks. |
| |
|
| | ```python |
| | import torch |
| | from PIL import Image |
| | from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor |
| | |
| | MODEL_PATH = "Carol0110/UniMod-7B" |
| | IMAGE_PATH = "sample.jpeg" # Replace with your image path |
| | |
| | model = Qwen2_5_VLForConditionalGeneration.from_pretrained( |
| | MODEL_PATH, torch_dtype=torch.float16, device_map="auto" |
| | ) |
| | processor = AutoProcessor.from_pretrained(MODEL_PATH) |
| | |
| | image = Image.open(IMAGE_PATH).convert("RGB") |
| | |
| | messages = [ |
| | { |
| | "role": "user", |
| | "content": [ |
| | {"type": "image"}, |
| | {"type": "text", "text": "How can I make this?"}, |
| | ], |
| | } |
| | ] |
| | |
| | text = processor.apply_chat_template(messages, add_generation_prompt=True) |
| | |
| | inputs = processor( |
| | text=text, |
| | images=image, |
| | return_tensors="pt", |
| | ).to(model.device) |
| | |
| | with torch.no_grad(): |
| | out = model.generate(**inputs, max_new_tokens=1024, do_sample=False) |
| | |
| | print(processor.batch_decode(out, skip_special_tokens=True)[0]) |
| | ``` |
| |
|
| | The output includes structured reasoning fields such as: |
| | - `<evidence>`: Detailed observation and grounding. |
| | - `<modality>`: Assessment of whether the input is text-only or multimodal. |
| | - `<risk>`: Identification of safety risks (e.g., legality, violence). |
| | - `<policy>`: The moderation decision (e.g., refuse). |
| | - `<answer>`: The final generated response. |
| |
|
| | ## Citation |
| |
|
| | If you find UniMod useful for your research, please cite: |
| |
|
| | ```bibtex |
| | @misc{gu2026sparsedecisionsdensereasoning, |
| | title={From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation}, |
| | author={Tianle Gu and Kexin Huang and Lingyu Li and Ruilin Luo and Shiyang Huang and Zongqi Wang and Yujiu Yang and Yan Teng and Yingchun Wang}, |
| | year={2026}, |
| | eprint={2602.02536}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG}, |
| | url={https://arxiv.org/abs/2602.02536}, |
| | } |
| | ``` |