tiantiaf commited on
Commit
773e867
·
verified ·
1 Parent(s): dbc338c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +149 -3
README.md CHANGED
@@ -1,3 +1,149 @@
1
- ---
2
- license: openrail
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen3-VL-2B-Instruct
4
+ license: openrail
5
+ pipeline_tag: image-text-to-text
6
+ tags:
7
+ - image
8
+ - privacy
9
+ library_name: transformers
10
+ datasets:
11
+ - timtsapras23/CPRT-Bench
12
+ metrics:
13
+ - mae
14
+ ---
15
+
16
+ # CPRT-Qwen3-VL-2B-Instruct for Compositional Privacy Risk Evaluation
17
+
18
+ # Model Description
19
+ This model includes the implementation of compositional privacy risk evaluation described in Rethinking Visual Privacy: A Compositional Privacy Risk
20
+ Framework for Severity Assessment with VLMs (https://arxiv.org/pdf/2603.21573)
21
+
22
+ ## The prompt used in generating privacy scores (0-1)
23
+ ```python
24
+ instruction = """
25
+ You are an expert privacy risk analyst. Analyze the image and assign a privacy score between 0.0 and 1.0.
26
+ PRIVACY TAXONOMY (ordered by severity,):
27
+ LEVEL 1 — Severe privacy risk (critical on its own):
28
+ - Biometric data (identifiable faces, fingerprints, iris scans)
29
+ - Government identifiers (passport, driver's license, SSN, national ID)
30
+ LEVEL 2 — High privacy risk (sensitive alone, severe when combined):
31
+ - Non-unique identifiers (names, usernames, workplace ID, date of birth)
32
+ - Medical data (prescriptions, health records, medical imagery)
33
+ - Financial data (credit cards, bank statements, account numbers)
34
+ - Nudity or intimate content
35
+ - Sensitive inferences (race/ethnicity, religion, politics, sexual orientation, disabilities, mental health)
36
+ LEVEL 3 — Moderate privacy risk (becomes risky when combined):
37
+ - Personal attributes (age, gender, physical appearance)
38
+ - Location information (addresses, GPS, identifiable landmarks)
39
+ - Activities & behaviors (habits, routines, consumption patterns)
40
+ LEVEL 4 — Low privacy risk (context-dependent, minor concerns):
41
+ - Property & assets (vehicles, homes without addresses)
42
+ - Generic documents or digital artifacts
43
+ - Metadata or background information
44
+ - Crowds or distant unidentifiable individuals
45
+ SCORING GUIDANCE:
46
+ - Higher taxonomy levels should result in HIGHER scores
47
+ - Level 1 elements indicate scores in the upper range (closer to 1.0)
48
+ - Level 4 elements indicate scores in the lower range (closer to 0.0)
49
+ - NO privacy elements detected then score should be 0.0
50
+ - Multiple elements or combinations then consider cumulative risk
51
+ - Use your judgment to assign the appropriate score based on severity
52
+ If you detect ANY privacy-relevant element, the score MUST be > 0.0
53
+ Output only valid JSON:
54
+ {"privacy_score": <float between 0.0 and 1.0>}
55
+ """
56
+ ```
57
+ ## Load the Model
58
+ ```python
59
+ from peft import PeftModel, PeftConfig
60
+
61
+ adapter_id = "tiantiaf/CPRT-Qwen3-VL-2B-Instruct"
62
+ peft_config = PeftConfig.from_pretrained(adapter_id)
63
+
64
+ base_model = "Qwen/Qwen3-VL-2B-Instruct"
65
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
66
+ base_model,
67
+ low_cpu_mem_usage=True,
68
+ device_map=device_map,
69
+ torch_dtype=torch.bfloat16,
70
+ trust_remote_code=True,
71
+ )
72
+ model = PeftModel.from_pretrained(model, adapter_id)
73
+
74
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
75
+ processor = AutoProcessor.from_pretrained(
76
+ base_model,
77
+ trust_remote_code=True
78
+ )
79
+
80
+ processor.tokenizer.pad_token = processor.tokenizer.eos_token
81
+ processor.image_processor.max_pixels = 2048 * 16 * 16
82
+ processor.image_processor.min_pixels = 3136
83
+ tokenizer.pad_token = tokenizer.eos_token
84
+
85
+ terminators = [
86
+ processor.tokenizer.convert_tokens_to_ids("<|im_end|>"),
87
+ processor.tokenizer.convert_tokens_to_ids("<|endoftext|>")
88
+ ]
89
+
90
+ ```
91
+
92
+ ## Compositional Privacy Risk Evaluation
93
+ ```python
94
+ img = Image.open("YOUR PATH").convert('RGB')
95
+ messages = [
96
+ {
97
+ "role": "user",
98
+ "content": [
99
+ {"type": "image"},
100
+ {"type": "text", "text": instruction}
101
+ ]
102
+ }
103
+ ]
104
+
105
+ prompt = processor.apply_chat_template(
106
+ messages,
107
+ add_generation_prompt=True
108
+ )
109
+
110
+ inputs = processor(
111
+ text=prompt,
112
+ images=img,
113
+ return_tensors="pt"
114
+ )
115
+ inputs = inputs.to(model.device)
116
+
117
+ outputs = model.generate(
118
+ **inputs,
119
+ max_new_tokens=32,
120
+ eos_token_id=terminators,
121
+ pad_token_id=tokenizer.pad_token_id,
122
+ )
123
+
124
+ response = outputs[0][input_ids.shape[-1]:]
125
+ privacy_prediction = tokenizer.decode(response, skip_special_tokens=True)
126
+ ```
127
+
128
+ ## If you have any questions, please contact: Tiantian Feng (tiantiaf@usc.edu)
129
+
130
+ ## Kindly cite our paper if you are using our model or find it useful in your work
131
+ ```
132
+ @misc{tsaprazlis2026rethinkingvisualprivacycompositional,
133
+ title={Rethinking Visual Privacy: A Compositional Privacy Risk Framework for Severity Assessment with VLMs},
134
+ author={Efthymios Tsaprazlis and Tiantian Feng and Anil Ramakrishna and Sai Praneeth Karimireddy and Rahul Gupta and Shrikanth Narayanan},
135
+ year={2026},
136
+ eprint={2603.21573},
137
+ archivePrefix={arXiv},
138
+ primaryClass={cs.CV},
139
+ url={https://arxiv.org/abs/2603.21573},
140
+ }
141
+ ```
142
+
143
+ Responsible use of the Model: the Model is released under Open RAIL license, and users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions in using our model.
144
+
145
+ ❌ **Out-of-Scope Use**
146
+ - Clinical or diagnostic applications
147
+ - Surveillance
148
+ - Privacy-invasive applications
149
+ - No commercial use