Nickybcybc's picture
Upload folder using huggingface_hub
1d91be3 verified
# SkillZero Best Checkpoints Export
This package contains the two best checkpoints selected by validation success rate from the completed SkillZero runs.
## Included Checkpoints
1. ALFWorld `global_step_160`
- Validation metric: `val/success_rate = 0.594`
- Archive path:
`checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_160`
2. ALFWorld `global_step_150`
- Validation metric: `val/success_rate = 0.477`
- Archive path:
`checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_150`
## Related Search Checkpoint
The best Search checkpoint by validation success rate is not included in the "top two overall" package, but is useful for reproducing the Search run:
- Search `global_step_180`
- Validation metric: `val/success_rate = 0.356`
- Test metrics:
- `test/full_skill/success_rate = 0.282`
- `test/no_skill/success_rate = 0.310`
- Checkpoint path, if packaged separately:
`checkpoints/SkillZero_search/skillzero_search_vl_3b_local_retriever/global_step_180`
## Hardware Used
Training was submitted through Slurm on the `a100` partition.
- ALFWorld:
- GPUs: 4 x A100
- CPUs per task: 32
- Memory: 200GB
- Time limit: 2 days
- Search local retriever:
- GPUs: 4 x A100 allocated
- Training used GPUs 0,1,2
- Local retriever used GPU 3
- CPUs per task: 32
- Memory: 220GB
- Time limit: 2 days
## Runtime Notes
- Python environment name used on the cluster: `skillzero`
- Retriever environment name: `retriever`
- Main model: `Qwen/Qwen2.5-VL-3B-Instruct`
- Training entry point: `python3 -m verl.trainer.main_ppo`
- Original training logs are not required to use the checkpoints.
## Restore
After extracting the archive, place checkpoint directories under:
```bash
checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/
```
Then use `trainer.resume_mode=resume_path` and set `trainer.resume_from_path` to the target `global_step_*` directory.