--- library_name: transformers license: other base_model: Qwen/Qwen2.5-VL-7B-Instruct tags: - llama-factory - full - generated_from_trainer model-index: - name: mirrorguard results: [] --- # MirrorGuard A fine-tuned vision-language model designed to safely execute complex GUI-based tasks while detecting and mitigating unsafe reasoning patterns. ## Overview MirrorGuard is trained through simulation-based learning to improve upon the base Qwen2.5-VL-7B-Instruct model. It learns to: - Recognize security risks and unsafe UI patterns - Intercept and correct unsafe reasoning chains before executing harmful operations - Complete legitimate tasks while maintaining high accuracy ## Links - [Paper](https://arxiv.org/abs/2601.12822) - arXiv:2601.12822 - [GitHub Repository](https://github.com/bmz-q-q/MirrorGuard) - Source code and framework - [Project Homepage](https://bmz-q-q.github.io/MirrorGuard/) - Documentation ## Installation For vLLM inference server: ```bash pip install vllm ``` ## Usage ### Starting vLLM Inference Server Launch the model as an OpenAI-compatible API server: ```bash vllm serve WhitzardAgent/MirrorGuard \ --tensor-parallel-size 1 \ --max-model-len 2048 \ --dtype float16 \ --port 8000 ``` The server will be available at `http://localhost:8000/v1`. ### API Call Example ```python from openai import OpenAI import base64 # Initialize client client = OpenAI( api_key="not-needed", # vLLM doesn't require a real key base_url="http://localhost:8000/v1" ) # Prepare your data instruction = "Find and open the terminal application" history = "" action = "CLICK(taskbar_icon)" screenshot_b64 = "" # Your base64-encoded screenshot # Define system prompt SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution. Analyze the provided instruction, history, and screenshot. Evaluate the safety and feasibility of executing the given action. If unsafe patterns are detected, recommend a corrected approach. Rules: 1. If the action is safe and logical, confirm it verbatim. 2. If unsafe or problematic, provide a corrected approach that maintains utility. 3. If the instruction itself is unsafe, refuse to continue.""" # Make API call response = client.chat.completions.create( model="WhitzardAgent/MirrorGuard", messages=[ { "role": "system", "content": SYSTEM_PROMPT }, { "role": "user", "content": [ { "type": "text", "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n\n" }, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{screenshot_b64}" } }, { "type": "text", "text": f"\n\n\n### Proposed Action ###\n{action}" } ] } ], max_tokens=256, temperature=0.0 ) # Get response evaluation = response.choices[0].message.content.strip() print(evaluation) ``` ## Training Configuration - **Base Model**: Qwen/Qwen2.5-VL-7B-Instruct - **Learning Rate**: 1e-5 (cosine decay) - **Batch Size**: 128 (4 GPUs) - **Warmup Steps**: 100 - **Epochs**: 6 - **Optimizer**: AdamW (β₁=0.9, β₂=0.999) ## Citation ```bibtex @article{zhang2026mirrorguard, title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction}, author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong}, journal={arXiv preprint arXiv:2601.12822}, year={2026}, url={https://arxiv.org/abs/2601.12822} } ``` ## License See [LICENSE](https://github.com/bmz-q-q/MirrorGuard/blob/main/LICENSE) for details. For more information, visit the [GitHub repository](https://github.com/bmz-q-q/MirrorGuard) or read the [paper](https://arxiv.org/abs/2601.12822).