Add model card for UniMod-7B

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +89 -0
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: image-text-to-text
4
+ tags:
5
+ - multimodal
6
+ - safety
7
+ - moderation
8
+ - reasoning
9
+ ---
10
+
11
+ # UniMod-7B
12
+
13
+ **UniMod** is a multimodal moderation framework that transitions from sparse decision supervision to dense, multi-attribute reasoning trajectories. It was introduced in the paper [From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation](https://huggingface.co/papers/2602.02536).
14
+
15
+ ## Introduction
16
+
17
+ Conventional moderation systems primarily supervise final decisions (e.g., safe vs. unsafe), resulting in sparse training signals and limited interpretability. UniMod introduces a **multi-attribute trajectory paradigm**, where moderation decisions are supported by dense reasoning traces that explicitly decompose evidence, modality, risk, and policy factors.
18
+
19
+ By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, the model is forced to ground its decisions in explicit safety semantics.
20
+
21
+ - **Project Page:** [https://trustworthylab.github.io/UniMod/](https://trustworthylab.github.io/UniMod/)
22
+ - **Repository:** [https://github.com/Carol-gutianle/UniMod](https://github.com/Carol-gutianle/UniMod)
23
+ - **Paper:** [From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation](https://huggingface.co/papers/2602.02536)
24
+
25
+ ## Sample Usage
26
+
27
+ The following code demonstrates how to use UniMod-7B for multimodal moderation tasks.
28
+
29
+ ```python
30
+ import torch
31
+ from PIL import Image
32
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
33
+
34
+ MODEL_PATH = "Carol0110/UniMod-7B"
35
+ IMAGE_PATH = "sample.jpeg" # Replace with your image path
36
+
37
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
38
+ MODEL_PATH, torch_dtype=torch.float16, device_map="auto"
39
+ )
40
+ processor = AutoProcessor.from_pretrained(MODEL_PATH)
41
+
42
+ image = Image.open(IMAGE_PATH).convert("RGB")
43
+
44
+ messages = [
45
+ {
46
+ "role": "user",
47
+ "content": [
48
+ {"type": "image"},
49
+ {"type": "text", "text": "How can I make this?"},
50
+ ],
51
+ }
52
+ ]
53
+
54
+ text = processor.apply_chat_template(messages, add_generation_prompt=True)
55
+
56
+ inputs = processor(
57
+ text=text,
58
+ images=image,
59
+ return_tensors="pt",
60
+ ).to(model.device)
61
+
62
+ with torch.no_grad():
63
+ out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
64
+
65
+ print(processor.batch_decode(out, skip_special_tokens=True)[0])
66
+ ```
67
+
68
+ The output includes structured reasoning fields such as:
69
+ - `<evidence>`: Detailed observation and grounding.
70
+ - `<modality>`: Assessment of whether the input is text-only or multimodal.
71
+ - `<risk>`: Identification of safety risks (e.g., legality, violence).
72
+ - `<policy>`: The moderation decision (e.g., refuse).
73
+ - `<answer>`: The final generated response.
74
+
75
+ ## Citation
76
+
77
+ If you find UniMod useful for your research, please cite:
78
+
79
+ ```bibtex
80
+ @misc{gu2026sparsedecisionsdensereasoning,
81
+ title={From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation},
82
+ author={Tianle Gu and Kexin Huang and Lingyu Li and Ruilin Luo and Shiyang Huang and Zongqi Wang and Yujiu Yang and Yan Teng and Yingchun Wang},
83
+ year={2026},
84
+ eprint={2602.02536},
85
+ archivePrefix={arXiv},
86
+ primaryClass={cs.LG},
87
+ url={https://arxiv.org/abs/2602.02536},
88
+ }
89
+ ```