| # GPU and Slurm Configuration | |
| ## ALFWorld Best Checkpoints | |
| The best ALFWorld checkpoints were trained with: | |
| ```bash | |
| #SBATCH -p a100 | |
| #SBATCH --gres=gpu:4 | |
| #SBATCH --cpus-per-task=32 | |
| #SBATCH --mem=200G | |
| #SBATCH -t 2-00:00:00 | |
| ``` | |
| Important training overrides: | |
| ```bash | |
| trainer.n_gpus_per_node=4 | |
| trainer.nnodes=1 | |
| trainer.total_training_steps=180 | |
| trainer.save_freq=10 | |
| trainer.test_freq=10 | |
| env.env_name=alfworld/AlfredTWEnv | |
| env.rollout.n=4 | |
| data.train_batch_size=8 | |
| data.val_batch_size=16 | |
| actor_rollout_ref.rollout.gpu_memory_utilization=0.4 | |
| actor_rollout_ref.rollout.max_model_len=3072 | |
| ``` | |
| ## Search Run | |
| The Search run used one node with 4 A100 GPUs allocated: | |
| ```bash | |
| #SBATCH -p a100 | |
| #SBATCH --gres=gpu:4 | |
| #SBATCH --cpus-per-task=32 | |
| #SBATCH --mem=220G | |
| #SBATCH -t 2-00:00:00 | |
| ``` | |
| GPU assignment: | |
| ```bash | |
| CUDA_VISIBLE_DEVICES=3 # local retriever service | |
| CUDA_VISIBLE_DEVICES=0,1,2 # training | |
| ``` | |
| Important Search fix: | |
| ```bash | |
| data.max_prompt_length=6144 | |
| actor_rollout_ref.rollout.max_model_len=6144 | |
| ``` | |
| This avoids the observed Qwen2-VL RoPE shape mismatch where generated prompt state exceeded 4096 tokens. | |
| ## Docker Runtime | |
| Suggested runtime command: | |
| ```bash | |
| docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ | |
| -v /path/to/SkillZero:/workspace/SkillZero \ | |
| -v /path/to/checkpoints:/workspace/SkillZero/checkpoints \ | |
| -it skillzero:export | |
| ``` | |
| For Slurm clusters, prefer running through the provided Slurm scripts rather than plain Docker unless the cluster explicitly supports Docker or Enroot/Singularity. | |