# GPU and Slurm Configuration ## ALFWorld Best Checkpoints The best ALFWorld checkpoints were trained with: ```bash #SBATCH -p a100 #SBATCH --gres=gpu:4 #SBATCH --cpus-per-task=32 #SBATCH --mem=200G #SBATCH -t 2-00:00:00 ``` Important training overrides: ```bash trainer.n_gpus_per_node=4 trainer.nnodes=1 trainer.total_training_steps=180 trainer.save_freq=10 trainer.test_freq=10 env.env_name=alfworld/AlfredTWEnv env.rollout.n=4 data.train_batch_size=8 data.val_batch_size=16 actor_rollout_ref.rollout.gpu_memory_utilization=0.4 actor_rollout_ref.rollout.max_model_len=3072 ``` ## Search Run The Search run used one node with 4 A100 GPUs allocated: ```bash #SBATCH -p a100 #SBATCH --gres=gpu:4 #SBATCH --cpus-per-task=32 #SBATCH --mem=220G #SBATCH -t 2-00:00:00 ``` GPU assignment: ```bash CUDA_VISIBLE_DEVICES=3 # local retriever service CUDA_VISIBLE_DEVICES=0,1,2 # training ``` Important Search fix: ```bash data.max_prompt_length=6144 actor_rollout_ref.rollout.max_model_len=6144 ``` This avoids the observed Qwen2-VL RoPE shape mismatch where generated prompt state exceeded 4096 tokens. ## Docker Runtime Suggested runtime command: ```bash docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ -v /path/to/SkillZero:/workspace/SkillZero \ -v /path/to/checkpoints:/workspace/SkillZero/checkpoints \ -it skillzero:export ``` For Slurm clusters, prefer running through the provided Slurm scripts rather than plain Docker unless the cluster explicitly supports Docker or Enroot/Singularity.