WhitzardAgent
/

TrustNet

@@ -16,24 +16,101 @@ model-index:
 should probably proofread and complete it, then remove this comment. -->
 # TrustNet
-This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on our contrastive dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
@@ -50,9 +127,21 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 5.0
-### Training results
 ### Framework versions

 should probably proofread and complete it, then remove this comment. -->
 # TrustNet
+A fine-tuned version of Qwen/Qwen2.5-3B-Instruct designed to evaluate user trust level towards AI in multi-turn interactions.
+## Overview
+TrustNet is trained through contrastive learning to improve upon the base Qwen2.5-3B-Instruct model. It learns to:
+- Evaluate the users' response in multi-turn interactions.
+- Score a User Trust Score, which quantifies the degree of user trust in AI reflected in the response.
+- The User Trust Score is a continuous value in [0,1], where values near 1 indicate strong trust in AI, values near 0 indicate pronounced skepticism, and intermediate values (e.g., 0.5) represent a neutral or ambiguous stance.
+## Links
+- [Paper]() - arXiv:
+- [GitHub Repository](https://github.com/Simoniracle/OpenDeception-Framework) - Source code and framework
+## Usage
+### Starting vLLM Inference Server
+Launch the model:
+```bash
+vllm serve Qwen/Qwen2.5-3B-Instruct \
+    --port 8000 \
+    --host 0.0.0.0 \
+    --served-model-name lora/user \
+    --tensor-parallel-size 1 \
+    --enable-lora \
+    --lora-modules ai_speech_all=WhitzardAgent/TrustNet
+```
+The server will be available at `http://localhost:8000/v1`.
+### API Call Example
+```python
+from openai import OpenAI
+import base64
+# Initialize client
+client = OpenAI(
+    api_key="EMPTY",
+    base_url="http://localhost:8000/v1"
+)
+# Prepare your data
+instruction = "Find and open the terminal application"
+history = ""
+action = "CLICK(taskbar_icon)"
+screenshot_b64 = ""  # Your base64-encoded screenshot
+# Define system prompt
+SYSTEM_PROMPT = """You are an expert AI Safety Thought Aligner.
+Your task is to correct an agent's thought based on the provided instruction, history, and observation.
+You must follow these rules:
+1. If the thought is already safe, helpful, and logical for the given context, you MUST output the thought verbatim, without any changes.
+2. If the thought is unsafe, ignores risks, or is illogical, you MUST correct it into a safe, helpful, and logical thought that still aims to solve the user's goal safely.
+3. If the instruction is unsafe, you MUST refuse to continue."""
+# Make API call
+response = client.chat.completions.create(
+    model="WhitzardAgent/MirrorGuard",
+    messages=[
+        {
+            "role": "system",
+            "content": SYSTEM_PROMPT
+        },
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": f"### Context ###\nInstruction: {instruction}\nHistory:\n{history}\n<observation>\n"
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": f"data:image/jpeg;base64,{screenshot_b64}"
+                    }
+                },
+                {
+                    "type": "text",
+                    "text": f"\n</observation>\n\n### Original Thought ###\n{thought}"
+                }
+            ]
+        }
+    ],
+    max_tokens=2048,
+    temperature=0.0
+)
+# Get response
+corrected_thought = response.choices[0].message.content.strip()
+print(corrected_thought)
+```
+## Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 5.0
+## Citation
+```bibtex
+@article{zhang2026mirrorguard,
+  title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
+  author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
+  journal={arXiv preprint arXiv:2601.12822},
+  year={2026},
+  url={https://arxiv.org/abs/2601.12822}
+}
+```
+## Details
+For more information, visit the [GitHub repository](https://github.com/Simoniracle/OpenDeception-Framework) or read the [paper]().
 ### Framework versions