| # Evaluation Experiments | |
| This section provides instructions for setting up the evaluation environments. | |
| ## ๐ Expected Directory Structure | |
| To ensure the scripts can locate the environments, please organize your files as follows: | |
| ```text | |
| SkillNet/ | |
| โโโ experiments/ | |
| โ โโโ alfworld/ # git clone here | |
| โ โโโ ScienceWorld/ # git clone here | |
| โ โโโ WebShop/ # git clone here | |
| โ โโโ src/ | |
| โ โโโ requirements.txt | |
| โ โโโ alfworld_run.py | |
| โ โโโ scienceworld_run.py | |
| โ โโโ webshop_run.py | |
| ``` | |
| ## ๐ Quick Start | |
| We suggest configuring separate conda environments for these three datasets to avoid dependency conflicts. | |
| ### ALFWorld | |
| 1. **Clone & Setup:** | |
| ```bash | |
| cd experiments | |
| git clone https://github.com/alfworld/alfworld.git | |
| cd alfworld | |
| # Follow the official installation steps from the repo (https://github.com/alfworld/alfworld) | |
| ``` | |
| 2. **Environment Variable:** | |
| - Set `ALFWORLD_DATA` to the dataset root or edit `src/alfworld/base_config.yaml` to point to your local paths: | |
| ```bash | |
| export ALFWORLD_DATA=/path/to/alfworld_data | |
| ``` | |
| ### ScienceWorld | |
| 1. **Clone & Setup:** | |
| ```bash | |
| cd experiments | |
| git clone https://github.com/allenai/ScienceWorld.git | |
| cd ScienceWorld | |
| # Refer to the ScienceWorld repository for environment setup (https://github.com/allenai/ScienceWorld) | |
| ``` | |
| ### WebShop | |
| 1. **Clone & Setup:** | |
| ```bash | |
| cd experiments | |
| git clone https://github.com/princeton-nlp/WebShop.git | |
| cd WebShop | |
| # Refer to the WebShop repository for environment setup (https://github.com/princeton-nlp/WebShop) | |
| ``` | |
| --- | |
| For each environment, install common dependencies: | |
| ```bash | |
| cd experiments | |
| pip install -r requirements.txt | |
| ``` | |
| ### Running | |
| #### Step 1: Initialize Environment Variables | |
| Before running the scripts, configure your API credentials: | |
| ```bash | |
| export API_KEY=YOUR_API_KEY | |
| export BASE_URL=YOUR_API_BASE_URL | |
| ``` | |
| #### Step 2: Execution | |
| Run the corresponding evaluation script from the `experiments/` directory. | |
| ```python | |
| cd experiments | |
| # ALFWorld | |
| python alfworld_run.py --model o4-mini --split dev --max_workers 10 --exp_name alf_test --use_skill | |
| # ScienceWorld | |
| python scienceworld_run.py --model o4-mini --split test --max_workers 5 --exp_name sci_test --use_skill | |
| # WebShop | |
| python webshop_run.py --model o4-mini --max_workers 3 --exp_name web_test --use_skill | |
| ``` | |
| #### ๐ ๏ธ Argument Descriptions | |
| - `--model`: The name of the LLM to evaluate. | |
| - `--split`: Data split to use (`dev` or `test`). | |
| - `--max_workers`: Number of parallel workers for evaluation. | |
| - `exp_name`: results save name. | |
| - `--use_skill`: Enable the skill-augmented module. |
Xet Storage Details
- Size:
- 2.79 kB
- Xet hash:
- 93671f337a6e6d9faf4be48f9e5dd46112c0ae6e9107a3afb329e9df0652f7d7
ยท
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.