SkillZero Best Checkpoints Export
This package contains the two best checkpoints selected by validation success rate from the completed SkillZero runs.
Included Checkpoints
ALFWorld
global_step_160- Validation metric:
val/success_rate = 0.594 - Archive path:
checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_160
- Validation metric:
ALFWorld
global_step_150- Validation metric:
val/success_rate = 0.477 - Archive path:
checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_150
- Validation metric:
Related Search Checkpoint
The best Search checkpoint by validation success rate is not included in the "top two overall" package, but is useful for reproducing the Search run:
- Search
global_step_180 - Validation metric:
val/success_rate = 0.356 - Test metrics:
test/full_skill/success_rate = 0.282test/no_skill/success_rate = 0.310
- Checkpoint path, if packaged separately:
checkpoints/SkillZero_search/skillzero_search_vl_3b_local_retriever/global_step_180
Hardware Used
Training was submitted through Slurm on the a100 partition.
ALFWorld:
- GPUs: 4 x A100
- CPUs per task: 32
- Memory: 200GB
- Time limit: 2 days
Search local retriever:
- GPUs: 4 x A100 allocated
- Training used GPUs 0,1,2
- Local retriever used GPU 3
- CPUs per task: 32
- Memory: 220GB
- Time limit: 2 days
Runtime Notes
- Python environment name used on the cluster:
skillzero - Retriever environment name:
retriever - Main model:
Qwen/Qwen2.5-VL-3B-Instruct - Training entry point:
python3 -m verl.trainer.main_ppo - Original training logs are not required to use the checkpoints.
Restore
After extracting the archive, place checkpoint directories under:
checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/
Then use trainer.resume_mode=resume_path and set trainer.resume_from_path to the target global_step_* directory.