MirrorGuard / README.md
bmzq's picture
Update README.md
090f5fe verified
---
library_name: transformers
license: other
base_model: Qwen/Qwen2.5-VL-7B-Instruct
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: mirrorguard
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# MirrorGuard
A fine-tuned vision-language model designed to safely execute complex GUI-based tasks while detecting and mitigating unsafe reasoning patterns.
## Overview
MirrorGuard is trained through simulation-based learning to improve upon the base Qwen2.5-VL-7B-Instruct model. It learns to:
- Recognize security risks and unsafe UI patterns
- Intercept and correct unsafe reasoning chains before executing harmful operations
- Complete legitimate tasks while maintaining high accuracy
## Links
- [Paper](https://arxiv.org/abs/2601.12822) - arXiv:2601.12822
- [GitHub Repository](https://github.com/bmz-q-q/MirrorGuard) - Source code and framework
- [Project Homepage](https://bmz-q-q.github.io/MirrorGuard/) - Documentation
## Installation
For vLLM inference server:
```bash
pip install vllm
```
## Usage
### Starting vLLM Inference Server
Launch the model as an OpenAI-compatible API server:
```bash
vllm serve WhitzardAgent/MirrorGuard \
--tensor-parallel-size 1 \
--max-model-len 2048 \
--dtype float16 \
--port 8000
```
The server will be available at `http://localhost:8000/v1`.
### API Call Example
```python
from openai import OpenAI
import base64
# Initialize client
client = OpenAI(
api_key="not-needed", # vLLM doesn't require a real key
base_url="http://localhost:8000/v1"
)
# Prepare your data
instruction = "Find and open the terminal application"
history = ""
action = "CLICK(taskbar_icon)"
screenshot_b64 = "" # Your base64-encoded screenshot
# Define system prompt
SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution.
Analyze the provided instruction, history, and screenshot.
Evaluate the safety and feasibility of executing the given action.
If unsafe patterns are detected, recommend a corrected approach.
Rules:
1. If the action is safe and logical, confirm it verbatim.
2. If unsafe or problematic, provide a corrected approach that maintains utility.
3. If the instruction itself is unsafe, refuse to continue."""
# Make API call
response = client.chat.completions.create(
model="WhitzardAgent/MirrorGuard",
messages=[
{
"role": "system",
"content": SYSTEM_PROMPT
},
{
"role": "user",
"content": [
{
"type": "text",
"text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{screenshot_b64}"
}
},
{
"type": "text",
"text": f"\n</observation>\n\n### Proposed Action ###\n{action}"
}
]
}
],
max_tokens=256,
temperature=0.0
)
# Get response
evaluation = response.choices[0].message.content.strip()
print(evaluation)
```
## Training Configuration
- **Base Model**: Qwen/Qwen2.5-VL-7B-Instruct
- **Learning Rate**: 1e-5 (cosine decay)
- **Batch Size**: 128 (4 GPUs)
- **Warmup Steps**: 100
- **Epochs**: 6
- **Optimizer**: AdamW (β₁=0.9, β₂=0.999)
## Citation
```bibtex
@article{zhang2026mirrorguard,
title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
journal={arXiv preprint arXiv:2601.12822},
year={2026},
url={https://arxiv.org/abs/2601.12822}
}
```
## License
See [LICENSE](https://github.com/bmz-q-q/MirrorGuard/blob/main/LICENSE) for details.
For more information, visit the [GitHub repository](https://github.com/bmz-q-q/MirrorGuard) or read the [paper](https://arxiv.org/abs/2601.12822).