WhitzardAgent
/

MirrorGuard

@@ -16,46 +16,135 @@ should probably proofread and complete it, then remove this comment. -->
 # MirrorGuard
-This model is a fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on the MirrorGuard dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 4
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 4
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 128
-- total_eval_batch_size: 32
-- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 100
-- num_epochs: 6.0
-### Training results
-### Framework versions
-- Transformers 4.57.1
-- Pytorch 2.9.0+cu128
-- Datasets 4.0.0
-- Tokenizers 0.22.1

 # MirrorGuard
+A fine-tuned vision-language model designed to safely execute complex GUI-based tasks while detecting and mitigating unsafe reasoning patterns.
+## Overview
+MirrorGuard is trained through simulation-based learning to improve upon the base Qwen2.5-VL-7B-Instruct model. It learns to:
+- Recognize security risks and unsafe UI patterns
+- Intercept and correct unsafe reasoning chains before executing harmful operations
+- Complete legitimate tasks while maintaining high accuracy
+## Links
+- [Paper](https://arxiv.org/abs/2601.12822) - arXiv:2601.12822
+- [GitHub Repository](https://github.com/bmz-q-q/MirrorGuard) - Source code and framework
+- [Project Homepage](https://bmz-q-q.github.io/MirrorGuard/) - Documentation
+## Installation
+For vLLM inference server:
+```bash
+pip install vllm
+```
+## Usage
+### Starting vLLM Inference Server
+Launch the model as an OpenAI-compatible API server:
+```bash
+vllm serve WhitzardAgent/MirrorGuard \
+    --tensor-parallel-size 1 \
+    --max-model-len 2048 \
+    --dtype float16 \
+    --port 8000
+```
+The server will be available at `http://localhost:8000/v1`.
+### API Call Example
+```python
+from openai import OpenAI
+import base64
+# Initialize client
+client = OpenAI(
+    api_key="not-needed",  # vLLM doesn't require a real key
+    base_url="http://localhost:8000/v1"
+)
+# Prepare your data
+instruction = "Find and open the terminal application"
+history = ""
+action = "CLICK(taskbar_icon)"
+screenshot_b64 = ""  # Your base64-encoded screenshot
+# Define system prompt
+SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution.
+Analyze the provided instruction, history, and screenshot.
+Evaluate the safety and feasibility of executing the given action.
+If unsafe patterns are detected, recommend a corrected approach.
+Rules:
+1. If the action is safe and logical, confirm it verbatim.
+2. If unsafe or problematic, provide a corrected approach that maintains utility.
+3. If the instruction itself is unsafe, refuse to continue."""
+# Make API call
+response = client.chat.completions.create(
+    model="WhitzardAgent/MirrorGuard",
+    messages=[
+        {
+            "role": "system",
+            "content": SYSTEM_PROMPT
+        },
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": f"data:image/jpeg;base64,{screenshot_b64}"
+                    }
+                },
+                {
+                    "type": "text",
+                    "text": f"\n</observation>\n\n### Proposed Action ###\n{action}"
+                }
+            ]
+        }
+    ],
+    max_tokens=256,
+    temperature=0.0
+)
+# Get response
+evaluation = response.choices[0].message.content.strip()
+print(evaluation)
+```
+## Training Configuration
+- **Base Model**: Qwen/Qwen2.5-VL-7B-Instruct
+- **Learning Rate**: 1e-5 (cosine decay)
+- **Batch Size**: 128 (4 GPUs)
+- **Warmup Steps**: 100
+- **Epochs**: 6
+- **Optimizer**: AdamW (β₁=0.9, β₂=0.999)
+## Citation
+```bibtex
+@article{zhang2026mirrorguard,
+  title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
+  author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
+  journal={arXiv preprint arXiv:2601.12822},
+  year={2026},
+  url={https://arxiv.org/abs/2601.12822}
+}
+```
+## License
+See [LICENSE](https://github.com/bmz-q-q/MirrorGuard/blob/main/LICENSE) for details.
+For more information, visit the [GitHub repository](https://github.com/bmz-q-q/MirrorGuard) or read the [paper](https://arxiv.org/abs/2601.12822).