File size: 3,635 Bytes
2d483c2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | # Scripts Directory
This directory contains all the run scripts for OSWorld, organized by type.
## Structure
```
scripts/
βββ python/ # Python run scripts for various models
β βββ run_*.py # Individual model run scripts
β βββ run_multienv_*.py # Multi-environment run scripts
βββ bash/ # Bash scripts
βββ run_*.sh # Shell scripts for running models
```
## Python Scripts
The `python/` directory contains Python scripts for running different models and agents:
- **Single model scripts**: `run_autoglm.py`, `run_coact.py`, `run_maestro.py`
- **Multi-environment scripts**: `run_multienv_*.py` - Scripts for running models in multiple environments
- **Manual examination**: `manual_examine.py` - Tool for manually verifying and examining specific benchmark tasks
## Bash Scripts
The `bash/` directory contains shell scripts for running specific models:
- `run_dart_gui.sh` - Run DART GUI model
- `run_os_symphony.sh` - Run OS Symphony model
- `run_manual_examine.sh` - Example script for manual task examination with sample task IDs
> **Note**: Due to previous oversight, many bash scripts were not preserved during the reorganization. We will gradually add more bash scripts in future updates. Community contributions are welcome! If you have bash scripts for running specific models or workflows, please feel free to submit a pull request.
## Usage
**Important**: All scripts should be run from the **project root directory** (not from within the scripts/ directory).
### Running Python Scripts
```bash
# From the OSWorld root directory
python scripts/python/run_multienv.py [args]
# Example: Run with OpenAI GPT-4o
python scripts/python/run_multienv.py \
--provider_name docker \
--headless \
--observation_type screenshot \
--model gpt-4o \
--max_steps 15 \
--num_envs 10 \
--client_password password
```
### Running Bash Scripts
```bash
# From the OSWorld root directory
bash scripts/bash/run_dart_gui.sh [args]
```
### Manual Task Examination
For manual verification and examination of specific benchmark tasks:
```bash
# From the OSWorld root directory
python scripts/python/manual_examine.py \
--headless \
--observation_type screenshot \
--result_dir ./results_human_examine \
--test_all_meta_path evaluation_examples/test_all.json \
--domain libreoffice_impress \
--example_id a669ef01-ded5-4099-9ea9-25e99b569840 \
--max_steps 3
```
This tool allows you to:
- Manually execute tasks in the environment
- Verify task correctness and evaluation metrics
- Record the execution process with screenshots and videos
- Examine specific problematic tasks
See `scripts/bash/run_manual_examine.sh` for example task IDs across different domains.
## Technical Details
All Python scripts in this directory have been configured with automatic path resolution to import modules from the project root. This means:
1. **You must run scripts from the project root directory**
2. Scripts automatically add the project root to `sys.path`
3. All imports (like `lib_run_single`, `desktop_env`, `mm_agents`) work correctly
## Adding New Scripts
If you create a new run script, make sure to include the following path setup at the beginning (after standard library imports but before project imports):
```python
# Add project root to path for imports
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "../.."))
# Now you can import project modules
import lib_run_single
from desktop_env.desktop_env import DesktopEnv
from mm_agents.your_agent import YourAgent
```
|