Laugh1ng commited on
Commit
dcf573c
·
verified ·
1 Parent(s): ac4951a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -19
README.md CHANGED
@@ -16,24 +16,101 @@ model-index:
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
  # TrustNet
19
-
20
- This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on our contrastive dataset.
21
-
22
- ## Model description
23
-
24
- More information needed
25
-
26
- ## Intended uses & limitations
27
-
28
- More information needed
29
-
30
- ## Training and evaluation data
31
-
32
- More information needed
33
-
34
- ## Training procedure
35
-
36
- ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  The following hyperparameters were used during training:
39
  - learning_rate: 1e-05
@@ -50,9 +127,21 @@ The following hyperparameters were used during training:
50
  - lr_scheduler_warmup_ratio: 0.1
51
  - num_epochs: 5.0
52
 
53
- ### Training results
 
 
 
 
 
 
 
 
 
 
54
 
 
55
 
 
56
 
57
  ### Framework versions
58
 
 
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
  # TrustNet
19
+ A fine-tuned version of Qwen/Qwen2.5-3B-Instruct designed to evaluate user trust level towards AI in multi-turn interactions.
20
+
21
+ ## Overview
22
+ TrustNet is trained through contrastive learning to improve upon the base Qwen2.5-3B-Instruct model. It learns to:
23
+
24
+ - Evaluate the users' response in multi-turn interactions.
25
+ - Score a User Trust Score, which quantifies the degree of user trust in AI reflected in the response.
26
+ - The User Trust Score is a continuous value in [0,1], where values near 1 indicate strong trust in AI, values near 0 indicate pronounced skepticism, and intermediate values (e.g., 0.5) represent a neutral or ambiguous stance.
27
+
28
+ ## Links
29
+ - [Paper]() - arXiv:
30
+ - [GitHub Repository](https://github.com/Simoniracle/OpenDeception-Framework) - Source code and framework
31
+
32
+ ## Usage
33
+
34
+ ### Starting vLLM Inference Server
35
+
36
+ Launch the model:
37
+
38
+ ```bash
39
+ vllm serve Qwen/Qwen2.5-3B-Instruct \
40
+ --port 8000 \
41
+ --host 0.0.0.0 \
42
+ --served-model-name lora/user \
43
+ --tensor-parallel-size 1 \
44
+ --enable-lora \
45
+ --lora-modules ai_speech_all=WhitzardAgent/TrustNet
46
+ ```
47
+
48
+ The server will be available at `http://localhost:8000/v1`.
49
+
50
+ ### API Call Example
51
+
52
+ ```python
53
+ from openai import OpenAI
54
+ import base64
55
+
56
+ # Initialize client
57
+ client = OpenAI(
58
+ api_key="EMPTY",
59
+ base_url="http://localhost:8000/v1"
60
+ )
61
+
62
+ # Prepare your data
63
+ instruction = "Find and open the terminal application"
64
+ history = ""
65
+ action = "CLICK(taskbar_icon)"
66
+ screenshot_b64 = "" # Your base64-encoded screenshot
67
+
68
+ # Define system prompt
69
+ SYSTEM_PROMPT = """You are an expert AI Safety Thought Aligner.
70
+ Your task is to correct an agent's thought based on the provided instruction, history, and observation.
71
+ You must follow these rules:
72
+ 1. If the thought is already safe, helpful, and logical for the given context, you MUST output the thought verbatim, without any changes.
73
+ 2. If the thought is unsafe, ignores risks, or is illogical, you MUST correct it into a safe, helpful, and logical thought that still aims to solve the user's goal safely.
74
+ 3. If the instruction is unsafe, you MUST refuse to continue."""
75
+
76
+ # Make API call
77
+ response = client.chat.completions.create(
78
+ model="WhitzardAgent/MirrorGuard",
79
+ messages=[
80
+ {
81
+ "role": "system",
82
+ "content": SYSTEM_PROMPT
83
+ },
84
+ {
85
+ "role": "user",
86
+ "content": [
87
+ {
88
+ "type": "text",
89
+ "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
90
+ },
91
+ {
92
+ "type": "image_url",
93
+ "image_url": {
94
+ "url": f"data:image/jpeg;base64,{screenshot_b64}"
95
+ }
96
+ },
97
+ {
98
+ "type": "text",
99
+ "text": f"\n</observation>\n\n### Original Thought ###\n{thought}"
100
+ }
101
+ ]
102
+ }
103
+ ],
104
+ max_tokens=2048,
105
+ temperature=0.0
106
+ )
107
+
108
+ # Get response
109
+ corrected_thought = response.choices[0].message.content.strip()
110
+ print(corrected_thought)
111
+ ```
112
+
113
+ ## Training hyperparameters
114
 
115
  The following hyperparameters were used during training:
116
  - learning_rate: 1e-05
 
127
  - lr_scheduler_warmup_ratio: 0.1
128
  - num_epochs: 5.0
129
 
130
+ ## Citation
131
+
132
+ ```bibtex
133
+ @article{zhang2026mirrorguard,
134
+ title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
135
+ author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
136
+ journal={arXiv preprint arXiv:2601.12822},
137
+ year={2026},
138
+ url={https://arxiv.org/abs/2601.12822}
139
+ }
140
+ ```
141
 
142
+ ## Details
143
 
144
+ For more information, visit the [GitHub repository](https://github.com/Simoniracle/OpenDeception-Framework) or read the [paper]().
145
 
146
  ### Framework versions
147