File size: 3,968 Bytes
0c51b93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# 📊 Model Evaluation

This section describes how to deploy and evaluate trained models (e.g., Behavior Cloning vs. GRPO) using vLLM, Django, and the sotopia evaluation framework.

### Sotopia Evaluation Framework

We use `sotopia==0.1.0rc5` for evaluation. First, create the environment and install the correct version:

```bash
conda create -n sotopia python=3.10
conda activate sotopia
pip install sotopia==0.1.0rc5
```

### Environment Setup

Make sure to set the required environment variables **in all terminal windows** (one for each model server).

```bash
conda activate sotopia-rl

# Set paths
export REPO_FOLDER_NAME="<your_sotopia-rl_repo_path>"
export MODEL_PATH="<your_base_model_path>"
export CHAT_TEMPLATE="${REPO_FOLDER_NAME}/evals/qwen2.5-7b.jinja"

# Set GPUs and ports
export SFT_GPU=0
export GRPO_GPU=1
export SFT_PORT=7010
export GRPO_PORT=7020

# Model folders and checkpoints
export SFT_MODEL_FOLDER_NAME="<your_sft_model_folder_name>"
export GRPO_MODEL_FOLDER_NAME="<your_grpo_model_folder_name>"
export SFT_MODEL_CKPT_STEP=<your_best_sft_checkpoint>
export GRPO_MODEL_CKPT_STEP=<your_best_grpo_checkpoint>

# Full checkpoint paths
export SFT_MODEL_PATH="${REPO_FOLDER_NAME}/${SFT_MODEL_FOLDER_NAME}/checkpoint-${SFT_MODEL_CKPT_STEP}/"
export GRPO_MODEL_PATH="${REPO_FOLDER_NAME}/${GRPO_MODEL_FOLDER_NAME}/checkpoint-${GRPO_MODEL_CKPT_STEP}/"

# Names for served models
export SFT_MODEL_NAME="${SFT_MODEL_FOLDER_NAME}-gpu${SFT_GPU}"
export GRPO_MODEL_NAME="${GRPO_MODEL_FOLDER_NAME}-gpu${GRPO_GPU}"

# Final evaluation tags
export ENV_MODEL="gpt-4o"
export TAG="${GRPO_MODEL_FOLDER_NAME}_step_${GRPO_MODEL_CKPT_STEP}_vs_${SFT_MODEL_FOLDER_NAME}_step_${SFT_MODEL_CKPT_STEP}"

# Endpoint URLs
export MODEL_A="custom/${GRPO_MODEL_NAME}@http://localhost:${GRPO_PORT}/v1"
export MODEL_B="custom/${SFT_MODEL_NAME}@http://localhost:${SFT_PORT}/v1"
```

### Launch Model Servers (LoRA-enabled)

**Terminal 1: Serve SFT Model**

```bash
CUDA_VISIBLE_DEVICES=$SFT_GPU python -m vllm.entrypoints.openai.api_server \
    --model $MODEL_PATH \
    --port "$SFT_PORT" \
    --max-lora-rank 64 \
    --chat-template $CHAT_TEMPLATE \
    --served-model-name qwen25-7b-instruct \
    --enable-lora \
    --lora-modules "$SFT_MODEL_NAME=$SFT_MODEL_PATH"
```

**Terminal 2: Serve GRPO Model**

```bash
CUDA_VISIBLE_DEVICES=$GRPO_GPU python -m vllm.entrypoints.openai.api_server \
    --model $MODEL_PATH \
    --port "$GRPO_PORT" \
    --max-lora-rank 64 \
    --chat-template $CHAT_TEMPLATE \
    --served-model-name qwen25-7b-instruct \
    --enable-lora \
    --lora-modules "$GRPO_MODEL_NAME=$GRPO_MODEL_PATH"
```

### Run Evaluation with Sotopia

##### Terminal 3: Run Evaluation

```bash
git clone https://github.com/sotopia-lab/sotopia.git
cd sotopia
conda activate sotopia
git checkout tags/v0.1.0-rc.5
cd sotopia
```

Ensure **all environment variables** listed above are exported before running the evaluation.

```bash
python examples/experiment_eval.py \
  --gin_file sotopia_conf/generation_utils_conf/generate.gin \
  --gin_file sotopia_conf/server_conf/server.gin \
  --gin_file sotopia_conf/run_async_server_in_batch.gin \
  --gin.BATCH_SIZE=20 \
  --gin.PUSH_TO_DB=True \
  '--gin.ENV_IDS=[your_env_ids]' \
  "--gin.ENV_MODEL='${ENV_MODEL}'" \
  "--gin.AGENT1_MODEL='${MODEL_A}'" \
  "--gin.AGENT2_MODEL='${MODEL_B}'" \
  "--gin.TAG='${TAG}'"

# Reverse agents
python examples/experiment_eval.py \
  --gin_file sotopia_conf/generation_utils_conf/generate.gin \
  --gin_file sotopia_conf/server_conf/server.gin \
  --gin_file sotopia_conf/run_async_server_in_batch.gin \
  --gin.BATCH_SIZE=20 \
  --gin.PUSH_TO_DB=True \
  '--gin.ENV_IDS=[your_env_ids]' \
  "--gin.ENV_MODEL='${ENV_MODEL}'" \
  "--gin.AGENT1_MODEL='${MODEL_B}'" \
  "--gin.AGENT2_MODEL='${MODEL_A}'" \
  "--gin.TAG='${TAG}'"
```

For ENV_IDS in ` sotopia-hard`  and ` sotopia-all` , please see [this file](https://github.com/sotopia-lab/sotopia-rl/tree/main/data/env_ids.txt).