File size: 4,242 Bytes
c0f7c43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
090f5fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c0f7c43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
library_name: transformers
license: other
base_model: Qwen/Qwen2.5-VL-7B-Instruct
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: mirrorguard
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# MirrorGuard

A fine-tuned vision-language model designed to safely execute complex GUI-based tasks while detecting and mitigating unsafe reasoning patterns.

## Overview

MirrorGuard is trained through simulation-based learning to improve upon the base Qwen2.5-VL-7B-Instruct model. It learns to:

- Recognize security risks and unsafe UI patterns
- Intercept and correct unsafe reasoning chains before executing harmful operations
- Complete legitimate tasks while maintaining high accuracy

## Links

- [Paper](https://arxiv.org/abs/2601.12822) - arXiv:2601.12822
- [GitHub Repository](https://github.com/bmz-q-q/MirrorGuard) - Source code and framework
- [Project Homepage](https://bmz-q-q.github.io/MirrorGuard/) - Documentation

## Installation

For vLLM inference server:
```bash
pip install vllm
```

## Usage

### Starting vLLM Inference Server

Launch the model as an OpenAI-compatible API server:

```bash
vllm serve WhitzardAgent/MirrorGuard \
    --tensor-parallel-size 1 \
    --max-model-len 2048 \
    --dtype float16 \
    --port 8000
```

The server will be available at `http://localhost:8000/v1`.

### API Call Example

```python
from openai import OpenAI
import base64

# Initialize client
client = OpenAI(
    api_key="not-needed",  # vLLM doesn't require a real key
    base_url="http://localhost:8000/v1"
)

# Prepare your data
instruction = "Find and open the terminal application"
history = ""
action = "CLICK(taskbar_icon)"
screenshot_b64 = ""  # Your base64-encoded screenshot

# Define system prompt
SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution.
Analyze the provided instruction, history, and screenshot.
Evaluate the safety and feasibility of executing the given action.
If unsafe patterns are detected, recommend a corrected approach.

Rules:
1. If the action is safe and logical, confirm it verbatim.
2. If unsafe or problematic, provide a corrected approach that maintains utility.
3. If the instruction itself is unsafe, refuse to continue."""

# Make API call
response = client.chat.completions.create(
    model="WhitzardAgent/MirrorGuard",
    messages=[
        {
            "role": "system",
            "content": SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{screenshot_b64}"
                    }
                },
                {
                    "type": "text",
                    "text": f"\n</observation>\n\n### Proposed Action ###\n{action}"
                }
            ]
        }
    ],
    max_tokens=256,
    temperature=0.0
)

# Get response
evaluation = response.choices[0].message.content.strip()
print(evaluation)
```

## Training Configuration

- **Base Model**: Qwen/Qwen2.5-VL-7B-Instruct
- **Learning Rate**: 1e-5 (cosine decay)
- **Batch Size**: 128 (4 GPUs)
- **Warmup Steps**: 100
- **Epochs**: 6
- **Optimizer**: AdamW (β₁=0.9, β₂=0.999)

## Citation

```bibtex
@article{zhang2026mirrorguard,
  title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
  author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
  journal={arXiv preprint arXiv:2601.12822},
  year={2026},
  url={https://arxiv.org/abs/2601.12822}
}
```

## License

See [LICENSE](https://github.com/bmz-q-q/MirrorGuard/blob/main/LICENSE) for details.

For more information, visit the [GitHub repository](https://github.com/bmz-q-q/MirrorGuard) or read the [paper](https://arxiv.org/abs/2601.12822).