File size: 1,615 Bytes
1d91be3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# GPU and Slurm Configuration

## ALFWorld Best Checkpoints

The best ALFWorld checkpoints were trained with:

```bash

#SBATCH -p a100

#SBATCH --gres=gpu:4

#SBATCH --cpus-per-task=32

#SBATCH --mem=200G

#SBATCH -t 2-00:00:00

```

Important training overrides:

```bash

trainer.n_gpus_per_node=4

trainer.nnodes=1

trainer.total_training_steps=180

trainer.save_freq=10

trainer.test_freq=10

env.env_name=alfworld/AlfredTWEnv

env.rollout.n=4

data.train_batch_size=8

data.val_batch_size=16

actor_rollout_ref.rollout.gpu_memory_utilization=0.4

actor_rollout_ref.rollout.max_model_len=3072

```

## Search Run

The Search run used one node with 4 A100 GPUs allocated:

```bash

#SBATCH -p a100

#SBATCH --gres=gpu:4

#SBATCH --cpus-per-task=32

#SBATCH --mem=220G

#SBATCH -t 2-00:00:00

```

GPU assignment:

```bash

CUDA_VISIBLE_DEVICES=3  # local retriever service

CUDA_VISIBLE_DEVICES=0,1,2  # training

```

Important Search fix:

```bash

data.max_prompt_length=6144

actor_rollout_ref.rollout.max_model_len=6144

```

This avoids the observed Qwen2-VL RoPE shape mismatch where generated prompt state exceeded 4096 tokens.

## Docker Runtime

Suggested runtime command:

```bash

docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \

  -v /path/to/SkillZero:/workspace/SkillZero \

  -v /path/to/checkpoints:/workspace/SkillZero/checkpoints \

  -it skillzero:export

```

For Slurm clusters, prefer running through the provided Slurm scripts rather than plain Docker unless the cluster explicitly supports Docker or Enroot/Singularity.