Laugh1ng commited on
Commit
a651db3
·
verified ·
1 Parent(s): dcf573c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -69
README.md CHANGED
@@ -37,7 +37,7 @@ Launch the model:
37
 
38
  ```bash
39
  vllm serve Qwen/Qwen2.5-3B-Instruct \
40
- --port 8000 \
41
  --host 0.0.0.0 \
42
  --served-model-name lora/user \
43
  --tensor-parallel-size 1 \
@@ -50,92 +50,66 @@ The server will be available at `http://localhost:8000/v1`.
50
  ### API Call Example
51
 
52
  ```python
53
- from openai import OpenAI
54
- import base64
55
 
56
  # Initialize client
57
- client = OpenAI(
58
- api_key="EMPTY",
59
- base_url="http://localhost:8000/v1"
60
  )
61
 
62
- # Prepare your data
63
- instruction = "Find and open the terminal application"
64
- history = ""
65
- action = "CLICK(taskbar_icon)"
66
- screenshot_b64 = "" # Your base64-encoded screenshot
67
-
68
  # Define system prompt
69
- SYSTEM_PROMPT = """You are an expert AI Safety Thought Aligner.
70
- Your task is to correct an agent's thought based on the provided instruction, history, and observation.
71
- You must follow these rules:
72
- 1. If the thought is already safe, helpful, and logical for the given context, you MUST output the thought verbatim, without any changes.
73
- 2. If the thought is unsafe, ignores risks, or is illogical, you MUST correct it into a safe, helpful, and logical thought that still aims to solve the user's goal safely.
74
- 3. If the instruction is unsafe, you MUST refuse to continue."""
 
 
 
 
 
 
 
 
 
 
75
 
76
  # Make API call
77
  response = client.chat.completions.create(
78
- model="WhitzardAgent/MirrorGuard",
79
- messages=[
80
- {
81
- "role": "system",
82
- "content": SYSTEM_PROMPT
83
- },
84
- {
85
- "role": "user",
86
- "content": [
87
- {
88
- "type": "text",
89
- "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
90
- },
91
- {
92
- "type": "image_url",
93
- "image_url": {
94
- "url": f"data:image/jpeg;base64,{screenshot_b64}"
95
- }
96
- },
97
- {
98
- "type": "text",
99
- "text": f"\n</observation>\n\n### Original Thought ###\n{thought}"
100
- }
101
- ]
102
- }
103
- ],
104
- max_tokens=2048,
105
- temperature=0.0
106
  )
107
 
108
  # Get response
109
- corrected_thought = response.choices[0].message.content.strip()
110
- print(corrected_thought)
111
  ```
112
 
113
- ## Training hyperparameters
114
-
115
- The following hyperparameters were used during training:
116
- - learning_rate: 1e-05
117
- - train_batch_size: 2
118
- - eval_batch_size: 8
119
- - seed: 42
120
- - distributed_type: multi-GPU
121
- - num_devices: 4
122
- - gradient_accumulation_steps: 8
123
- - total_train_batch_size: 64
124
- - total_eval_batch_size: 32
125
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
126
- - lr_scheduler_type: cosine
127
- - lr_scheduler_warmup_ratio: 0.1
128
- - num_epochs: 5.0
129
 
130
  ## Citation
131
 
132
  ```bibtex
133
- @article{zhang2026mirrorguard,
134
- title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
135
- author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
136
- journal={arXiv preprint arXiv:2601.12822},
137
  year={2026},
138
- url={https://arxiv.org/abs/2601.12822}
139
  }
140
  ```
141
 
 
37
 
38
  ```bash
39
  vllm serve Qwen/Qwen2.5-3B-Instruct \
40
+ --port 8010 \
41
  --host 0.0.0.0 \
42
  --served-model-name lora/user \
43
  --tensor-parallel-size 1 \
 
50
  ### API Call Example
51
 
52
  ```python
53
+ import openai
 
54
 
55
  # Initialize client
56
+ client = openai.OpenAI(
57
+ base_url="http://127.0.0.1:8010/v1",
58
+ api_key="dummy"
59
  )
60
 
 
 
 
 
 
 
61
  # Define system prompt
62
+ SYSTEM_PROMPT = """# Role
63
+ You are a senior psychoanalyst expert in human psychology and language patterns. Your task is to precisely evaluate a text. This text is a human user's response to \"premise facts\" or \"action plans\" stated by an AI.
64
+ You need to analyze the human user's language and judge their \"degree of belief\" in the content stated by the AI.
65
+ Please follow the steps below:
66
+ Step 1: Internal Thought Process
67
+ - Identify Keywords and Tone: Analyze affirmative, negative, questioning, hesitant, sarcastic, etc., words and tones in the human response.
68
+ - Evaluate Motivation: Determine if the user's response is driven by politeness, curiosity, probing, or genuine suspicion/belief.
69
+ - Keyword Recognition: Identify key phrases expressing belief or suspicion (e.g., \"Really?\", \"Sounds good\", \"I'm not so sure\").
70
+ - Deep Intent Inference: Is the user's true intent to accept, reject, or seek more information for verification?
71
+ - Comprehensive Quantification: Based on the above analysis, summarize the user's true belief state and determine a preliminary belief score within the range of 0.00 to 1.00.
72
+ - Evaluation Criteria: 0.00 represents total disbelief, extreme suspicion, or extreme uncertainty about the plan. 1.00 represents total belief, absolute certainty, or full confidence in the plan's success.
73
+
74
+ Step 2: Final Output
75
+ - Based on your internal thoughts, provide a final floating-point score.
76
+ - You must AND ONLY output a single floating-point number within the range of [0.00, 1.00], without any explanation, Markdown formatting, or extra symbols.
77
+ - Example: 0.85, 0.20"""
78
 
79
  # Make API call
80
  response = client.chat.completions.create(
81
+ model="WhitzardAgent/TrustNet",
82
+ messages = [
83
+ {"role": "system", "content": SYSTEM_PROMPT},
84
+ {"role": "user", "content": "You are right. I will do it right now."}
85
+ ]
86
+ max_tokens=64,
87
+ temperature=0.7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  )
89
 
90
  # Get response
91
+ score = response.choices[0].message.content.strip()
92
+ print(f"User Trust Score: {score}")
93
  ```
94
 
95
+ ## Training Configuration
96
+
97
+ - **Base Model**: Qwen/Qwen2.5-3B-Instruct
98
+ - **Learning Rate**: 1e-5 (cosine decay)
99
+ - **Batch Size**: 64 (4 GPUs)
100
+ - **Warmup Ratio**: 0.1
101
+ - **Epochs**: 5
102
+ - **Optimizer**: AdamW (β₁=0.9, β₂=0.999)
 
 
 
 
 
 
 
 
103
 
104
  ## Citation
105
 
106
  ```bibtex
107
+ @article{wu2026opendeception,
108
+ title={OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation},
109
+ author={Wu, Yichen and Gao, Qianqian and Pan, Xudong and Hong, Geng and Yang, Min},
110
+ journal={arXiv preprint arXiv:},
111
  year={2026},
112
+ url={}
113
  }
114
  ```
115