| # SkillZero Best Checkpoints Export | |
| This package contains the two best checkpoints selected by validation success rate from the completed SkillZero runs. | |
| ## Included Checkpoints | |
| 1. ALFWorld `global_step_160` | |
| - Validation metric: `val/success_rate = 0.594` | |
| - Archive path: | |
| `checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_160` | |
| 2. ALFWorld `global_step_150` | |
| - Validation metric: `val/success_rate = 0.477` | |
| - Archive path: | |
| `checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_150` | |
| ## Related Search Checkpoint | |
| The best Search checkpoint by validation success rate is not included in the "top two overall" package, but is useful for reproducing the Search run: | |
| - Search `global_step_180` | |
| - Validation metric: `val/success_rate = 0.356` | |
| - Test metrics: | |
| - `test/full_skill/success_rate = 0.282` | |
| - `test/no_skill/success_rate = 0.310` | |
| - Checkpoint path, if packaged separately: | |
| `checkpoints/SkillZero_search/skillzero_search_vl_3b_local_retriever/global_step_180` | |
| ## Hardware Used | |
| Training was submitted through Slurm on the `a100` partition. | |
| - ALFWorld: | |
| - GPUs: 4 x A100 | |
| - CPUs per task: 32 | |
| - Memory: 200GB | |
| - Time limit: 2 days | |
| - Search local retriever: | |
| - GPUs: 4 x A100 allocated | |
| - Training used GPUs 0,1,2 | |
| - Local retriever used GPU 3 | |
| - CPUs per task: 32 | |
| - Memory: 220GB | |
| - Time limit: 2 days | |
| ## Runtime Notes | |
| - Python environment name used on the cluster: `skillzero` | |
| - Retriever environment name: `retriever` | |
| - Main model: `Qwen/Qwen2.5-VL-3B-Instruct` | |
| - Training entry point: `python3 -m verl.trainer.main_ppo` | |
| - Original training logs are not required to use the checkpoints. | |
| ## Restore | |
| After extracting the archive, place checkpoint directories under: | |
| ```bash | |
| checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/ | |
| ``` | |
| Then use `trainer.resume_mode=resume_path` and set `trainer.resume_from_path` to the target `global_step_*` directory. | |