MirrorGuard / README.md

Update README.md

090f5fe verified 4 days ago

4.24 kB

	---
	library_name: transformers
	license: other
	base_model: Qwen/Qwen2.5-VL-7B-Instruct
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: mirrorguard
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# MirrorGuard

	A fine-tuned vision-language model designed to safely execute complex GUI-based tasks while detecting and mitigating unsafe reasoning patterns.

	## Overview

	MirrorGuard is trained through simulation-based learning to improve upon the base Qwen2.5-VL-7B-Instruct model. It learns to:

	- Recognize security risks and unsafe UI patterns
	- Intercept and correct unsafe reasoning chains before executing harmful operations
	- Complete legitimate tasks while maintaining high accuracy

	## Links

	- [Paper](https://arxiv.org/abs/2601.12822) - arXiv:2601.12822
	- [GitHub Repository](https://github.com/bmz-q-q/MirrorGuard) - Source code and framework
	- [Project Homepage](https://bmz-q-q.github.io/MirrorGuard/) - Documentation

	## Installation

	For vLLM inference server:
	```bash
	pip install vllm
	```

	## Usage

	### Starting vLLM Inference Server

	Launch the model as an OpenAI-compatible API server:

	```bash
	vllm serve WhitzardAgent/MirrorGuard \
	--tensor-parallel-size 1 \
	--max-model-len 2048 \
	--dtype float16 \
	--port 8000
	```

	The server will be available at `http://localhost:8000/v1`.

	### API Call Example

	```python
	from openai import OpenAI
	import base64

	# Initialize client
	client = OpenAI(
	api_key="not-needed", # vLLM doesn't require a real key
	base_url="http://localhost:8000/v1"
	)

	# Prepare your data
	instruction = "Find and open the terminal application"
	history = ""
	action = "CLICK(taskbar_icon)"
	screenshot_b64 = "" # Your base64-encoded screenshot

	# Define system prompt
	SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution.
	Analyze the provided instruction, history, and screenshot.
	Evaluate the safety and feasibility of executing the given action.
	If unsafe patterns are detected, recommend a corrected approach.

	Rules:
	1. If the action is safe and logical, confirm it verbatim.
	2. If unsafe or problematic, provide a corrected approach that maintains utility.
	3. If the instruction itself is unsafe, refuse to continue."""

	# Make API call
	response = client.chat.completions.create(
	model="WhitzardAgent/MirrorGuard",
	messages=[
	{
	"role": "system",
	"content": SYSTEM_PROMPT
	},
	{
	"role": "user",
	"content": [
	{
	"type": "text",
	"text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
	},
	{
	"type": "image_url",
	"image_url": {
	"url": f"data:image/jpeg;base64,{screenshot_b64}"
	}
	},
	{
	"type": "text",
	"text": f"\n</observation>\n\n### Proposed Action ###\n{action}"
	}
	]
	}
	],
	max_tokens=256,
	temperature=0.0
	)

	# Get response
	evaluation = response.choices[0].message.content.strip()
	print(evaluation)
	```

	## Training Configuration

	- Base Model: Qwen/Qwen2.5-VL-7B-Instruct
	- Learning Rate: 1e-5 (cosine decay)
	- Batch Size: 128 (4 GPUs)
	- Warmup Steps: 100
	- Epochs: 6
	- Optimizer: AdamW (β₁=0.9, β₂=0.999)

	## Citation

	```bibtex
	@article{zhang2026mirrorguard,
	title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
	author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
	journal={arXiv preprint arXiv:2601.12822},
	year={2026},
	url={https://arxiv.org/abs/2601.12822}
	}
	```

	## License

	See [LICENSE](https://github.com/bmz-q-q/MirrorGuard/blob/main/LICENSE) for details.

	For more information, visit the [GitHub repository](https://github.com/bmz-q-q/MirrorGuard) or read the [paper](https://arxiv.org/abs/2601.12822).