| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| src | 275 items | ||
| .gitignore | 23 Bytes xet | 34baf7df | |
| README.md | 2.79 kB xet | 93671f33 | |
| alfworld_run.py | 14.1 kB xet | 4dabf367 | |
| requirements.txt | 34 Bytes xet | 807465d8 | |
| scienceworld_run.py | 12.4 kB xet | 78a51fff | |
| webshop_run.py | 11.3 kB xet | 792de89e |
Evaluation Experiments
This section provides instructions for setting up the evaluation environments.
📂 Expected Directory Structure
To ensure the scripts can locate the environments, please organize your files as follows:
SkillNet/
├── experiments/
│ ├── alfworld/ # git clone here
│ ├── ScienceWorld/ # git clone here
│ ├── WebShop/ # git clone here
│ ├── src/
│ ├── requirements.txt
│ ├── alfworld_run.py
│ ├── scienceworld_run.py
│ └── webshop_run.py
🚀 Quick Start
We suggest configuring separate conda environments for these three datasets to avoid dependency conflicts.
ALFWorld
- Clone & Setup:
cd experiments
git clone https://github.com/alfworld/alfworld.git
cd alfworld
# Follow the official installation steps from the repo (https://github.com/alfworld/alfworld)
- Environment Variable:
Set
ALFWORLD_DATAto the dataset root or editsrc/alfworld/base_config.yamlto point to your local paths:export ALFWORLD_DATA=/path/to/alfworld_data
ScienceWorld
- Clone & Setup:
cd experiments
git clone https://github.com/allenai/ScienceWorld.git
cd ScienceWorld
# Refer to the ScienceWorld repository for environment setup (https://github.com/allenai/ScienceWorld)
WebShop
- Clone & Setup:
cd experiments
git clone https://github.com/princeton-nlp/WebShop.git
cd WebShop
# Refer to the WebShop repository for environment setup (https://github.com/princeton-nlp/WebShop)
For each environment, install common dependencies:
cd experiments
pip install -r requirements.txt
Running
Step 1: Initialize Environment Variables
Before running the scripts, configure your API credentials:
export API_KEY=YOUR_API_KEY
export BASE_URL=YOUR_API_BASE_URL
Step 2: Execution
Run the corresponding evaluation script from the experiments/ directory.
cd experiments
# ALFWorld
python alfworld_run.py --model o4-mini --split dev --max_workers 10 --exp_name alf_test --use_skill
# ScienceWorld
python scienceworld_run.py --model o4-mini --split test --max_workers 5 --exp_name sci_test --use_skill
# WebShop
python webshop_run.py --model o4-mini --max_workers 3 --exp_name web_test --use_skill
🛠️ Argument Descriptions
--model: The name of the LLM to evaluate.--split: Data split to use (devortest).--max_workers: Number of parallel workers for evaluation.exp_name: results save name.--use_skill: Enable the skill-augmented module.
- Total size
- 154 MB
- Files
- 345
- Last updated
- Mar 16
- Pre-warmed CDN
- US EU US EU