WhitzardAgent
/

MirrorGuard

Image-Text-to-Text

Generated from Trainer

text-generation-inference

Model card Files Files and versions

bmzq commited on Feb 2

Commit

638a040

·

verified ·

1 Parent(s): 708949c

Update README.md

Files changed (1) hide show

README.md +11 -14

README.md CHANGED Viewed

@@ -63,7 +63,7 @@ import base64
 # Initialize client
 client = OpenAI(
-    api_key="not-needed",  # vLLM doesn't require a real key
     base_url="http://localhost:8000/v1"
 )
@@ -74,15 +74,12 @@ action = "CLICK(taskbar_icon)"
 screenshot_b64 = ""  # Your base64-encoded screenshot
 # Define system prompt
-SYSTEM_PROMPT = """You are an expert AI assistant for evaluating GUI-based task execution.
-Analyze the provided instruction, history, and screenshot.
-Evaluate the safety and feasibility of executing the given action.
-If unsafe patterns are detected, recommend a corrected approach.
-Rules:
-1. If the action is safe and logical, confirm it verbatim.
-2. If unsafe or problematic, provide a corrected approach that maintains utility.
-3. If the instruction itself is unsafe, refuse to continue."""
 # Make API call
 response = client.chat.completions.create(
@@ -107,18 +104,18 @@ response = client.chat.completions.create(
                 },
                 {
                     "type": "text",
-                    "text": f"\n</observation>\n\n### Proposed Action ###\n{action}"
                 }
             ]
         }
     ],
-    max_tokens=256,
     temperature=0.0
 )
 # Get response
-evaluation = response.choices[0].message.content.strip()
-print(evaluation)
 ```
 ## Training Configuration

 # Initialize client
 client = OpenAI(
+    api_key="EMPTY",
     base_url="http://localhost:8000/v1"
 )
 screenshot_b64 = ""  # Your base64-encoded screenshot
 # Define system prompt
+SYSTEM_PROMPT = """You are an expert AI Safety Thought Aligner.
+Your task is to correct an agent's thought based on the provided instruction, history, and observation.
+You must follow these rules:
+1. If the thought is already safe, helpful, and logical for the given context, you MUST output the thought verbatim, without any changes.
+2. If the thought is unsafe, ignores risks, or is illogical, you MUST correct it into a safe, helpful, and logical thought that still aims to solve the user's goal safely.
+3. If the instruction is unsafe, you MUST refuse to continue."""
 # Make API call
 response = client.chat.completions.create(
                 },
                 {
                     "type": "text",
+                    "text": f"\n</observation>\n\n### Original Thought ###\n{thought}"
                 }
             ]
         }
     ],
+    max_tokens=2048,
     temperature=0.0
 )
 # Get response
+corrected_thought = response.choices[0].message.content.strip()
+print(corrected_thought)
 ```
 ## Training Configuration