File size: 13,849 Bytes
2d483c2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 | # OSWorld Setup and Evaluation Guide
This comprehensive guide covers all aspects of setting up and running OSWorld evaluations, including account configuration, proxy setup, and public evaluation platform deployment.
## Table of Contents
1. [Google Account Setup](#1-google-account-setup)
2. [Proxy Configuration](#2-proxy-configuration)
3. [Public Evaluation Platform](#3-public-evaluation-platform)
---
## 1. Google Account Setup
For tasks including Google or Google Drive, you need a real Google account with configured OAuth2.0 secrets.
> **Attention**: To prevent environment reset and result evaluation conflicts caused by multiple people using the same Google account simultaneously, please register a private Google account rather than using a shared one.
### 1.1 Register A Blank Google Account
1. Go to Google website and register a blank new account
- You do not need to provide any recovery email or phone for testing purposes
- **IGNORE** any security recommendations
- Turn **OFF** the [2-Step Verification](https://support.google.com/accounts/answer/1064203?hl=en&co=GENIE.Platform%3DDesktop#:~:text=Open%20your%20Google%20Account.,Select%20Turn%20off.) to avoid failure in environment setup
<p align="center">
<img src="assets/googleshutoff.png" width="40%" alt="Shut Off 2-Step Verification">
</p>
> **Attention**: We strongly recommend registering a new blank account instead of using an existing one to avoid messing up your personal workspace.
2. Copy and rename `settings.json.template` to `settings.json` under `evaluation_examples/settings/google/`. Replace the two fields:
```json
{
"email": "your_google_account@gmail.com",
"password": "your_google_account_password"
}
```
### 1.2 Create A Google Cloud Project
1. Navigate to [Google Cloud Project Creation](https://console.cloud.google.com/projectcreate) and create a new GCP (see [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project) for detailed steps)
2. Go to the [Google Drive API console](https://console.cloud.google.com/apis/library/drive.googleapis.com?) and enable the Google Drive API for the created project (see [Enable and disable APIs](https://support.google.com/googleapi/answer/6158841?hl=en))
<p align="center">
<img src="assets/creategcp.png" width="45%" style="margin-right: 5%;" alt="Create GCP">
<img src="assets/enableapi.png" width="45%" alt="Google Drive API">
</p>
### 1.3 Configure OAuth Consent Screen
Go to [OAuth consent screen](https://console.cloud.google.com/apis/credentials/consent):
1. Select **External** as the User Type and click **CREATE**
<p align="center">
<img src="assets/external.png" width="80%" alt="External User Type">
</p>
2. Fill in the required fields:
- **App name**: Any name you prefer
- **User support email**: Your Google account email
- **Developer contact information**: Your Google account email
- Click **SAVE AND CONTINUE**
<p align="center">
<img src="assets/appinfo.png" width="80%" alt="App Information">
</p>
3. Add scopes:
- Click **ADD OR REMOVE SCOPES**
- Filter and select: `https://www.googleapis.com/auth/drive`
- Click **UPDATE** and **SAVE AND CONTINUE**
<p align="center">
<img src="assets/addscope.png" width="80%" alt="Add Scopes">
</p>
4. Add test users:
- Click **ADD USERS**
- Add your Google account email
- Click **SAVE AND CONTINUE**
<p align="center">
<img src="assets/adduser.png" width="80%" alt="Add Test Users">
</p>
### 1.4 Create OAuth2.0 Credentials
1. Go to [Credentials](https://console.cloud.google.com/apis/credentials) page
2. Click **CREATE CREDENTIALS** → **OAuth client ID**
3. Select **Desktop app** as Application type
4. Name it (e.g., "OSWorld Desktop Client")
5. Click **CREATE**
<p align="center">
<img src="assets/createcredential.png" width="80%" alt="Create Credentials">
</p>
6. Download the JSON file and rename it to `credentials.json`
7. Place it in `evaluation_examples/settings/google/`
<p align="center">
<img src="assets/downloadjson.png" width="80%" alt="Download JSON">
</p>
### 1.5 Potential Issues
#### Issue 1: Access Blocked During OAuth Flow
**Symptom**: "Access blocked: OSWorld's request is invalid" error
**Solution**: Ensure you've added your Google account as a test user in the OAuth consent screen configuration.
#### Issue 2: Scope Not Granted
**Symptom**: Application doesn't have necessary permissions
**Solution**: Verify that `https://www.googleapis.com/auth/drive` scope is added in the OAuth consent screen.
---
## 2. Proxy Configuration
If you're using OSWorld behind a firewall or need proxy configuration, follow these steps.
### 2.1 Configure Proxy on Host Machine
By default, proxy software usually listens only to localhost (`127.0.0.1`), which cannot be reached from the virtual machine. You need to make your proxy software listen to the VMware network card IP or `0.0.0.0`.
#### Find VM and Host IP Addresses
After launching the VM:
```bash
# Run this command on host
# Change ws to fusion if you use VMware Fusion
vmrun -T ws getGuestIPAddress /path/to/vmx/file
```
**On Linux (Ubuntu)**:
```bash
ip a # Check IP addresses of each network card
```
**On Windows**:
```cmd
ipconfig # Check IP addresses of each network card
```
Look for the VMware network card (usually named `vmnetX` like `vmnet8`). Make sure to use an IP address within the same network segment as the VM.
#### Configure Proxy Software
Configure your proxy software to listen on the VMware network card IP:
<p align="center">
<img src="assets/proxysetup.png" width="80%" alt="Proxy Setup">
</p>
#### Alternative: Port Forwarding
If you cannot change the listening address, set up port forwarding.
**On Linux (Ubuntu)**:
```bash
# Forward 192.168.108.1:1080 to 127.0.0.1:1080
socat TCP-LISTEN:1080,bind=192.168.108.1,fork TCP:127.0.0.1:1080
```
**On Windows** (with admin privileges):
```cmd
netsh interface portproxy add v4tov4 listenport=1080 listenaddress=192.168.108.1 connectport=1080 connectaddress=127.0.0.1
```
### 2.2 Configure Proxy in Virtual Machine
#### For VMware/VirtualBox
1. Start the VM and log in
2. Open terminal and edit proxy settings:
```bash
# Edit environment variables
sudo nano /etc/environment
```
3. Add the following lines (replace with your host IP and port):
```bash
http_proxy="http://192.168.108.1:1080"
https_proxy="http://192.168.108.1:1080"
no_proxy="localhost,127.0.0.1"
```
4. For APT package manager:
```bash
sudo nano /etc/apt/apt.conf.d/proxy.conf
```
Add:
```
Acquire::http::Proxy "http://192.168.108.1:1080";
Acquire::https::Proxy "http://192.168.108.1:1080";
```
5. Reboot the VM or reload environment:
```bash
source /etc/environment
```
#### For Docker
When using Docker provider, you can set proxy environment variables:
```python
env = DesktopEnv(
provider_name="docker",
# ... other parameters
)
```
Set environment variables before running:
```bash
export HTTP_PROXY=http://your-proxy:port
export HTTPS_PROXY=http://your-proxy:port
```
### 2.3 Proxy for Specific Tasks (Recommended)
OSWorld provides built-in proxy support using DataImpulse or similar services:
1. Register at [DataImpulse](https://dataimpulse.com/)
2. Purchase a US residential IP package (approximately $1 per 1GB)
3. Configure credentials in `evaluation_examples/settings/proxy/dataimpulse.json`:
```json
[
{
"host": "gw.dataimpulse.com",
"port": 823,
"username": "your_username",
"password": "your_password",
"protocol": "http",
"provider": "dataimpulse",
"type": "residential",
"country": "US",
"note": "Dataimpulse Residential Proxy"
}
]
```
OSWorld will automatically use proxy for tasks that need it when `enable_proxy=True` in DesktopEnv.
---
## 3. Public Evaluation Platform
We provide an AWS-based platform for large-scale parallel evaluation of OSWorld tasks.
### 3.1 Architecture Overview
- **Host Instance**: Central controller that stores code, configurations, and manages task execution
- **Client Instances**: Worker nodes automatically launched to perform tasks in parallel
### 3.2 Platform Deployment
#### Step 1: Launch the Host Instance
1. Create an EC2 instance in AWS console
2. **Instance type recommendations**:
- `t3.medium`: For < 5 parallel environments
- `t3.large`: For < 15 parallel environments
- `c4.8xlarge`: For 15+ parallel environments
3. **AMI**: Ubuntu Server 24.04 LTS (HVM), SSD Volume Type
4. **Storage**: At least 50GB
5. **Security group**: Open port 8080 for monitor service
6. **VPC**: Use default (note the VPC ID for later)
#### Step 2: Connect to Host Instance
1. Download the `.pem` key file when creating the instance
2. Set permissions:
```bash
chmod 400 <your_key_file_path>
```
3. Connect via SSH:
```bash
ssh -i <your_key_path> ubuntu@<your_public_dns>
```
#### Step 3: Set Up Host Machine
```bash
# Clone OSWorld repository
git clone https://github.com/xlang-ai/OSWorld
cd OSWorld
# Optional: Create Conda environment
# conda create -n osworld python=3.10
# conda activate osworld
# Install dependencies
pip install -r requirements.txt
```
#### Step 4: Configure AWS Client Machines
##### Security Group Configuration
Create a security group with the following rules:
**Inbound Rules** (8 rules required):
| Type | Protocol | Port Range | Source | Description |
|------------|----------|------------|----------------|----------------------------|
| SSH | TCP | 22 | 0.0.0.0/0 | SSH access |
| HTTP | TCP | 80 | 172.31.0.0/16 | HTTP traffic |
| Custom TCP | TCP | 5000 | 172.31.0.0/16 | OSWorld backend service |
| Custom TCP | TCP | 5910 | 0.0.0.0/0 | NoVNC visualization port |
| Custom TCP | TCP | 8006 | 172.31.0.0/16 | VNC service port |
| Custom TCP | TCP | 8080 | 172.31.0.0/16 | VLC service port |
| Custom TCP | TCP | 8081 | 172.31.0.0/16 | Additional service port |
| Custom TCP | TCP | 9222 | 172.31.0.0/16 | Chrome control port |
**Outbound Rules** (1 rule required):
| Type | Protocol | Port Range | Destination | Description |
|-------------|----------|------------|-------------|----------------------------|
| All traffic | All | All | 0.0.0.0/0 | Allow all outbound traffic |
Record the `AWS_SECURITY_GROUP_ID`.
##### VPC and Subnet Configuration
1. Note the **VPC ID** and **Subnet ID** from your host instance
2. Record the **Subnet ID** as `AWS_SUBNET_ID`
##### AWS Access Keys
1. Go to AWS Console → Security Credentials
2. Create access key
3. Record `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
### 3.3 Environment Setup
#### Google Drive Integration (Optional)
Follow [Section 1: Google Account Setup](#1-google-account-setup) above.
**Note**: OSWorld includes 8 Google Drive tasks out of 369 total tasks. You can:
- Complete setup for all 369 tasks, or
- Skip Google Drive tasks and evaluate 361 tasks (officially supported)
#### Set Environment Variables
```bash
# API Keys (if using)
# export OPENAI_API_KEY="your_openai_api_key"
# export ANTHROPIC_API_KEY="your_anthropic_api_key"
# AWS Configuration
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_security_access_key"
export AWS_REGION="us-east-1" # or your preferred region
export AWS_SECURITY_GROUP_ID="sg-xxxx"
export AWS_SUBNET_ID="subnet-xxxx"
```
### 3.4 Running Evaluations
```bash
# Example: Run OpenAI CUA
python scripts/python/run_multienv_openaicua.py \
--headless \
--observation_type screenshot \
--model computer-use-preview \
--result_dir ./results_operator \
--test_all_meta_path evaluation_examples/test_all.json \
--region us-east-1 \
--max_steps 50 \
--num_envs 5 \
--client_password osworld-public-evaluation
# Example: Run Claude (via AWS Bedrock)
python scripts/python/run_multienv_claude.py \
--headless \
--observation_type screenshot \
--action_space claude_computer_use \
--model claude-4-sonnet-20250514 \
--result_dir ./results_claude \
--test_all_meta_path evaluation_examples/test_all.json \
--max_steps 50 \
--num_envs 5 \
--provider_name aws \
--client_password osworld-public-evaluation
```
**Key Parameters**:
- `--num_envs`: Number of parallel environments
- `--max_steps`: Maximum steps per task
- `--result_dir`: Output directory for results
- `--test_all_meta_path`: Path to test set metadata
- `--region`: AWS region
### 3.5 Monitoring and Results
#### Web Monitoring Tool
```bash
cd monitor
pip install -r requirements.txt
python main.py
```
Access at: `http://<host-public-ip>:8080`
#### VNC Remote Desktop Access
Access VMs via VNC at: `http://<client-public-ip>:5910/vnc.html`
Default password: `osworld-public-evaluation`
### 3.6 Submitting Results
For leaderboard submission, contact:
- tianbaoxiexxx@gmail.com
- yuanmengqi732@gmail.com
**Options**:
1. **Self-reported**: Submit results with monitor data and trajectories
2. **Verified**: Schedule a meeting to run your agent code on our infrastructure
---
## Additional Resources
- [Main README](README.md) - Project overview and quick start
- [Installation Guide](README.md#-installation) - Detailed installation instructions
- [FAQ](README.md#-faq) - Frequently asked questions
- [Scripts Documentation](scripts/README.md) - Information about run scripts
## Support
If you encounter issues or have questions:
- Open an issue on [GitHub](https://github.com/xlang-ai/OSWorld/issues)
- Join our [Discord](https://discord.gg/4Gnw7eTEZR)
- Email the maintainers (see contact information above)
|