# GPU and Slurm Configuration

## ALFWorld Best Checkpoints

The best ALFWorld checkpoints were trained with:

```bash
#SBATCH -p a100
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=32
#SBATCH --mem=200G
#SBATCH -t 2-00:00:00
```

Important training overrides:

```bash
trainer.n_gpus_per_node=4
trainer.nnodes=1
trainer.total_training_steps=180
trainer.save_freq=10
trainer.test_freq=10
env.env_name=alfworld/AlfredTWEnv
env.rollout.n=4
data.train_batch_size=8
data.val_batch_size=16
actor_rollout_ref.rollout.gpu_memory_utilization=0.4
actor_rollout_ref.rollout.max_model_len=3072
```

## Search Run

The Search run used one node with 4 A100 GPUs allocated:

```bash
#SBATCH -p a100
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=32
#SBATCH --mem=220G
#SBATCH -t 2-00:00:00
```

GPU assignment:

```bash
CUDA_VISIBLE_DEVICES=3  # local retriever service
CUDA_VISIBLE_DEVICES=0,1,2  # training
```

Important Search fix:

```bash
data.max_prompt_length=6144
actor_rollout_ref.rollout.max_model_len=6144
```

This avoids the observed Qwen2-VL RoPE shape mismatch where generated prompt state exceeded 4096 tokens.

## Docker Runtime

Suggested runtime command:

```bash
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  -v /path/to/SkillZero:/workspace/SkillZero \
  -v /path/to/checkpoints:/workspace/SkillZero/checkpoints \
  -it skillzero:export
```

For Slurm clusters, prefer running through the provided Slurm scripts rather than plain Docker unless the cluster explicitly supports Docker or Enroot/Singularity.