OSWorld / SETUP_GUIDELINE.md
AbdulElahGwaith's picture
Upload folder using huggingface_hub
2d483c2 verified

OSWorld Setup and Evaluation Guide

This comprehensive guide covers all aspects of setting up and running OSWorld evaluations, including account configuration, proxy setup, and public evaluation platform deployment.

Table of Contents

  1. Google Account Setup
  2. Proxy Configuration
  3. Public Evaluation Platform

1. Google Account Setup

For tasks including Google or Google Drive, you need a real Google account with configured OAuth2.0 secrets.

Attention: To prevent environment reset and result evaluation conflicts caused by multiple people using the same Google account simultaneously, please register a private Google account rather than using a shared one.

1.1 Register A Blank Google Account

  1. Go to Google website and register a blank new account
    • You do not need to provide any recovery email or phone for testing purposes
    • IGNORE any security recommendations
    • Turn OFF the 2-Step Verification to avoid failure in environment setup

Shut Off 2-Step Verification

Attention: We strongly recommend registering a new blank account instead of using an existing one to avoid messing up your personal workspace.

  1. Copy and rename settings.json.template to settings.json under evaluation_examples/settings/google/. Replace the two fields:
{
    "email": "your_google_account@gmail.com",
    "password": "your_google_account_password"
}

1.2 Create A Google Cloud Project

  1. Navigate to Google Cloud Project Creation and create a new GCP (see Create a Google Cloud Project for detailed steps)

  2. Go to the Google Drive API console and enable the Google Drive API for the created project (see Enable and disable APIs)

Create GCP Google Drive API

1.3 Configure OAuth Consent Screen

Go to OAuth consent screen:

  1. Select External as the User Type and click CREATE

External User Type

  1. Fill in the required fields:
    • App name: Any name you prefer
    • User support email: Your Google account email
    • Developer contact information: Your Google account email
    • Click SAVE AND CONTINUE

App Information

  1. Add scopes:
    • Click ADD OR REMOVE SCOPES
    • Filter and select: https://www.googleapis.com/auth/drive
    • Click UPDATE and SAVE AND CONTINUE

Add Scopes

  1. Add test users:
    • Click ADD USERS
    • Add your Google account email
    • Click SAVE AND CONTINUE

Add Test Users

1.4 Create OAuth2.0 Credentials

  1. Go to Credentials page
  2. Click CREATE CREDENTIALSOAuth client ID
  3. Select Desktop app as Application type
  4. Name it (e.g., "OSWorld Desktop Client")
  5. Click CREATE

Create Credentials

  1. Download the JSON file and rename it to credentials.json
  2. Place it in evaluation_examples/settings/google/

Download JSON

1.5 Potential Issues

Issue 1: Access Blocked During OAuth Flow

Symptom: "Access blocked: OSWorld's request is invalid" error

Solution: Ensure you've added your Google account as a test user in the OAuth consent screen configuration.

Issue 2: Scope Not Granted

Symptom: Application doesn't have necessary permissions

Solution: Verify that https://www.googleapis.com/auth/drive scope is added in the OAuth consent screen.


2. Proxy Configuration

If you're using OSWorld behind a firewall or need proxy configuration, follow these steps.

2.1 Configure Proxy on Host Machine

By default, proxy software usually listens only to localhost (127.0.0.1), which cannot be reached from the virtual machine. You need to make your proxy software listen to the VMware network card IP or 0.0.0.0.

Find VM and Host IP Addresses

After launching the VM:

# Run this command on host
# Change ws to fusion if you use VMware Fusion
vmrun -T ws getGuestIPAddress /path/to/vmx/file

On Linux (Ubuntu):

ip a  # Check IP addresses of each network card

On Windows:

ipconfig  # Check IP addresses of each network card

Look for the VMware network card (usually named vmnetX like vmnet8). Make sure to use an IP address within the same network segment as the VM.

Configure Proxy Software

Configure your proxy software to listen on the VMware network card IP:

Proxy Setup

Alternative: Port Forwarding

If you cannot change the listening address, set up port forwarding.

On Linux (Ubuntu):

# Forward 192.168.108.1:1080 to 127.0.0.1:1080
socat TCP-LISTEN:1080,bind=192.168.108.1,fork TCP:127.0.0.1:1080

On Windows (with admin privileges):

netsh interface portproxy add v4tov4 listenport=1080 listenaddress=192.168.108.1 connectport=1080 connectaddress=127.0.0.1

2.2 Configure Proxy in Virtual Machine

For VMware/VirtualBox

  1. Start the VM and log in
  2. Open terminal and edit proxy settings:
# Edit environment variables
sudo nano /etc/environment
  1. Add the following lines (replace with your host IP and port):
http_proxy="http://192.168.108.1:1080"
https_proxy="http://192.168.108.1:1080"
no_proxy="localhost,127.0.0.1"
  1. For APT package manager:
sudo nano /etc/apt/apt.conf.d/proxy.conf

Add:

Acquire::http::Proxy "http://192.168.108.1:1080";
Acquire::https::Proxy "http://192.168.108.1:1080";
  1. Reboot the VM or reload environment:
source /etc/environment

For Docker

When using Docker provider, you can set proxy environment variables:

env = DesktopEnv(
    provider_name="docker",
    # ... other parameters
)

Set environment variables before running:

export HTTP_PROXY=http://your-proxy:port
export HTTPS_PROXY=http://your-proxy:port

2.3 Proxy for Specific Tasks (Recommended)

OSWorld provides built-in proxy support using DataImpulse or similar services:

  1. Register at DataImpulse
  2. Purchase a US residential IP package (approximately $1 per 1GB)
  3. Configure credentials in evaluation_examples/settings/proxy/dataimpulse.json:
[
    {
        "host": "gw.dataimpulse.com",
        "port": 823,
        "username": "your_username",
        "password": "your_password",
        "protocol": "http",
        "provider": "dataimpulse",
        "type": "residential",
        "country": "US",
        "note": "Dataimpulse Residential Proxy"
    }
]

OSWorld will automatically use proxy for tasks that need it when enable_proxy=True in DesktopEnv.


3. Public Evaluation Platform

We provide an AWS-based platform for large-scale parallel evaluation of OSWorld tasks.

3.1 Architecture Overview

  • Host Instance: Central controller that stores code, configurations, and manages task execution
  • Client Instances: Worker nodes automatically launched to perform tasks in parallel

3.2 Platform Deployment

Step 1: Launch the Host Instance

  1. Create an EC2 instance in AWS console
  2. Instance type recommendations:
    • t3.medium: For < 5 parallel environments
    • t3.large: For < 15 parallel environments
    • c4.8xlarge: For 15+ parallel environments
  3. AMI: Ubuntu Server 24.04 LTS (HVM), SSD Volume Type
  4. Storage: At least 50GB
  5. Security group: Open port 8080 for monitor service
  6. VPC: Use default (note the VPC ID for later)

Step 2: Connect to Host Instance

  1. Download the .pem key file when creating the instance
  2. Set permissions:
    chmod 400 <your_key_file_path>
    
  3. Connect via SSH:
    ssh -i <your_key_path> ubuntu@<your_public_dns>
    

Step 3: Set Up Host Machine

# Clone OSWorld repository
git clone https://github.com/xlang-ai/OSWorld
cd OSWorld

# Optional: Create Conda environment
# conda create -n osworld python=3.10
# conda activate osworld

# Install dependencies
pip install -r requirements.txt

Step 4: Configure AWS Client Machines

Security Group Configuration

Create a security group with the following rules:

Inbound Rules (8 rules required):

Type Protocol Port Range Source Description
SSH TCP 22 0.0.0.0/0 SSH access
HTTP TCP 80 172.31.0.0/16 HTTP traffic
Custom TCP TCP 5000 172.31.0.0/16 OSWorld backend service
Custom TCP TCP 5910 0.0.0.0/0 NoVNC visualization port
Custom TCP TCP 8006 172.31.0.0/16 VNC service port
Custom TCP TCP 8080 172.31.0.0/16 VLC service port
Custom TCP TCP 8081 172.31.0.0/16 Additional service port
Custom TCP TCP 9222 172.31.0.0/16 Chrome control port

Outbound Rules (1 rule required):

Type Protocol Port Range Destination Description
All traffic All All 0.0.0.0/0 Allow all outbound traffic

Record the AWS_SECURITY_GROUP_ID.

VPC and Subnet Configuration
  1. Note the VPC ID and Subnet ID from your host instance
  2. Record the Subnet ID as AWS_SUBNET_ID
AWS Access Keys
  1. Go to AWS Console → Security Credentials
  2. Create access key
  3. Record AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

3.3 Environment Setup

Google Drive Integration (Optional)

Follow Section 1: Google Account Setup above.

Note: OSWorld includes 8 Google Drive tasks out of 369 total tasks. You can:

  • Complete setup for all 369 tasks, or
  • Skip Google Drive tasks and evaluate 361 tasks (officially supported)

Set Environment Variables

# API Keys (if using)
# export OPENAI_API_KEY="your_openai_api_key"
# export ANTHROPIC_API_KEY="your_anthropic_api_key"

# AWS Configuration
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_security_access_key"
export AWS_REGION="us-east-1"  # or your preferred region
export AWS_SECURITY_GROUP_ID="sg-xxxx"
export AWS_SUBNET_ID="subnet-xxxx"

3.4 Running Evaluations

# Example: Run OpenAI CUA
python scripts/python/run_multienv_openaicua.py \
    --headless \
    --observation_type screenshot \
    --model computer-use-preview \
    --result_dir ./results_operator \
    --test_all_meta_path evaluation_examples/test_all.json \
    --region us-east-1 \
    --max_steps 50 \
    --num_envs 5 \
    --client_password osworld-public-evaluation

# Example: Run Claude (via AWS Bedrock)
python scripts/python/run_multienv_claude.py \
    --headless \
    --observation_type screenshot \
    --action_space claude_computer_use \
    --model claude-4-sonnet-20250514 \
    --result_dir ./results_claude \
    --test_all_meta_path evaluation_examples/test_all.json \
    --max_steps 50 \
    --num_envs 5 \
    --provider_name aws \
    --client_password osworld-public-evaluation

Key Parameters:

  • --num_envs: Number of parallel environments
  • --max_steps: Maximum steps per task
  • --result_dir: Output directory for results
  • --test_all_meta_path: Path to test set metadata
  • --region: AWS region

3.5 Monitoring and Results

Web Monitoring Tool

cd monitor
pip install -r requirements.txt
python main.py

Access at: http://<host-public-ip>:8080

VNC Remote Desktop Access

Access VMs via VNC at: http://<client-public-ip>:5910/vnc.html

Default password: osworld-public-evaluation

3.6 Submitting Results

For leaderboard submission, contact:

Options:

  1. Self-reported: Submit results with monitor data and trajectories
  2. Verified: Schedule a meeting to run your agent code on our infrastructure

Additional Resources

Support

If you encounter issues or have questions:

  • Open an issue on GitHub
  • Join our Discord
  • Email the maintainers (see contact information above)