Nickybcybc's picture
Upload folder using huggingface_hub
1d91be3 verified

SkillZero Best Checkpoints Export

This package contains the two best checkpoints selected by validation success rate from the completed SkillZero runs.

Included Checkpoints

  1. ALFWorld global_step_160

    • Validation metric: val/success_rate = 0.594
    • Archive path: checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_160
  2. ALFWorld global_step_150

    • Validation metric: val/success_rate = 0.477
    • Archive path: checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_150

Related Search Checkpoint

The best Search checkpoint by validation success rate is not included in the "top two overall" package, but is useful for reproducing the Search run:

  • Search global_step_180
  • Validation metric: val/success_rate = 0.356
  • Test metrics:
    • test/full_skill/success_rate = 0.282
    • test/no_skill/success_rate = 0.310
  • Checkpoint path, if packaged separately: checkpoints/SkillZero_search/skillzero_search_vl_3b_local_retriever/global_step_180

Hardware Used

Training was submitted through Slurm on the a100 partition.

  • ALFWorld:

    • GPUs: 4 x A100
    • CPUs per task: 32
    • Memory: 200GB
    • Time limit: 2 days
  • Search local retriever:

    • GPUs: 4 x A100 allocated
    • Training used GPUs 0,1,2
    • Local retriever used GPU 3
    • CPUs per task: 32
    • Memory: 220GB
    • Time limit: 2 days

Runtime Notes

  • Python environment name used on the cluster: skillzero
  • Retriever environment name: retriever
  • Main model: Qwen/Qwen2.5-VL-3B-Instruct
  • Training entry point: python3 -m verl.trainer.main_ppo
  • Original training logs are not required to use the checkpoints.

Restore

After extracting the archive, place checkpoint directories under:

checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/

Then use trainer.resume_mode=resume_path and set trainer.resume_from_path to the target global_step_* directory.