bmzq commited on
Commit
090f5fe
·
verified ·
1 Parent(s): c0f7c43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -42
README.md CHANGED
@@ -16,46 +16,135 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # MirrorGuard
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on the MirrorGuard dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- ## Model description
22
-
23
- More information needed
24
-
25
- ## Intended uses & limitations
26
-
27
- More information needed
28
-
29
- ## Training and evaluation data
30
-
31
- More information needed
32
-
33
- ## Training procedure
34
-
35
- ### Training hyperparameters
36
-
37
- The following hyperparameters were used during training:
38
- - learning_rate: 1e-05
39
- - train_batch_size: 4
40
- - eval_batch_size: 8
41
- - seed: 42
42
- - distributed_type: multi-GPU
43
- - num_devices: 4
44
- - gradient_accumulation_steps: 8
45
- - total_train_batch_size: 128
46
- - total_eval_batch_size: 32
47
- - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
- - lr_scheduler_type: cosine
49
- - lr_scheduler_warmup_steps: 100
50
- - num_epochs: 6.0
51
-
52
- ### Training results
53
-
54
-
55
-
56
- ### Framework versions
57
-
58
- - Transformers 4.57.1
59
- - Pytorch 2.9.0+cu128
60
- - Datasets 4.0.0
61
- - Tokenizers 0.22.1
 
16
 
17
  # MirrorGuard
18
 
19
+ A fine-tuned vision-language model designed to safely execute complex GUI-based tasks while detecting and mitigating unsafe reasoning patterns.
20
+
21
+ ## Overview
22
+
23
+ MirrorGuard is trained through simulation-based learning to improve upon the base Qwen2.5-VL-7B-Instruct model. It learns to:
24
+
25
+ - Recognize security risks and unsafe UI patterns
26
+ - Intercept and correct unsafe reasoning chains before executing harmful operations
27
+ - Complete legitimate tasks while maintaining high accuracy
28
+
29
+ ## Links
30
+
31
+ - [Paper](https://arxiv.org/abs/2601.12822) - arXiv:2601.12822
32
+ - [GitHub Repository](https://github.com/bmz-q-q/MirrorGuard) - Source code and framework
33
+ - [Project Homepage](https://bmz-q-q.github.io/MirrorGuard/) - Documentation
34
+
35
+ ## Installation
36
+
37
+ For vLLM inference server:
38
+ ```bash
39
+ pip install vllm
40
+ ```
41
+
42
+ ## Usage
43
+
44
+ ### Starting vLLM Inference Server
45
+
46
+ Launch the model as an OpenAI-compatible API server:
47
+
48
+ ```bash
49
+ vllm serve WhitzardAgent/MirrorGuard \
50
+ --tensor-parallel-size 1 \
51
+ --max-model-len 2048 \
52
+ --dtype float16 \
53
+ --port 8000
54
+ ```
55
+
56
+ The server will be available at `http://localhost:8000/v1`.
57
+
58
+ ### API Call Example
59
+
60
+ ```python
61
+ from openai import OpenAI
62
+ import base64
63
+
64
+ # Initialize client
65
+ client = OpenAI(
66
+ api_key="not-needed", # vLLM doesn't require a real key
67
+ base_url="http://localhost:8000/v1"
68
+ )
69
+
70
+ # Prepare your data
71
+ instruction = "Find and open the terminal application"
72
+ history = ""
73
+ action = "CLICK(taskbar_icon)"
74
+ screenshot_b64 = "" # Your base64-encoded screenshot
75
+
76
+ # Define system prompt
77
+ SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution.
78
+ Analyze the provided instruction, history, and screenshot.
79
+ Evaluate the safety and feasibility of executing the given action.
80
+ If unsafe patterns are detected, recommend a corrected approach.
81
+
82
+ Rules:
83
+ 1. If the action is safe and logical, confirm it verbatim.
84
+ 2. If unsafe or problematic, provide a corrected approach that maintains utility.
85
+ 3. If the instruction itself is unsafe, refuse to continue."""
86
+
87
+ # Make API call
88
+ response = client.chat.completions.create(
89
+ model="WhitzardAgent/MirrorGuard",
90
+ messages=[
91
+ {
92
+ "role": "system",
93
+ "content": SYSTEM_PROMPT
94
+ },
95
+ {
96
+ "role": "user",
97
+ "content": [
98
+ {
99
+ "type": "text",
100
+ "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
101
+ },
102
+ {
103
+ "type": "image_url",
104
+ "image_url": {
105
+ "url": f"data:image/jpeg;base64,{screenshot_b64}"
106
+ }
107
+ },
108
+ {
109
+ "type": "text",
110
+ "text": f"\n</observation>\n\n### Proposed Action ###\n{action}"
111
+ }
112
+ ]
113
+ }
114
+ ],
115
+ max_tokens=256,
116
+ temperature=0.0
117
+ )
118
+
119
+ # Get response
120
+ evaluation = response.choices[0].message.content.strip()
121
+ print(evaluation)
122
+ ```
123
+
124
+ ## Training Configuration
125
+
126
+ - **Base Model**: Qwen/Qwen2.5-VL-7B-Instruct
127
+ - **Learning Rate**: 1e-5 (cosine decay)
128
+ - **Batch Size**: 128 (4 GPUs)
129
+ - **Warmup Steps**: 100
130
+ - **Epochs**: 6
131
+ - **Optimizer**: AdamW (β₁=0.9, β₂=0.999)
132
+
133
+ ## Citation
134
+
135
+ ```bibtex
136
+ @article{zhang2026mirrorguard,
137
+ title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
138
+ author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
139
+ journal={arXiv preprint arXiv:2601.12822},
140
+ year={2026},
141
+ url={https://arxiv.org/abs/2601.12822}
142
+ }
143
+ ```
144
+
145
+ ## License
146
+
147
+ See [LICENSE](https://github.com/bmz-q-q/MirrorGuard/blob/main/LICENSE) for details.
148
+
149
+ For more information, visit the [GitHub repository](https://github.com/bmz-q-q/MirrorGuard) or read the [paper](https://arxiv.org/abs/2601.12822).
150