# SkillZero Best Checkpoints Export

This package contains the two best checkpoints selected by validation success rate from the completed SkillZero runs.

## Included Checkpoints

1. ALFWorld `global_step_160`
   - Validation metric: `val/success_rate = 0.594`
   - Archive path:
     `checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_160`

2. ALFWorld `global_step_150`
   - Validation metric: `val/success_rate = 0.477`
   - Archive path:
     `checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_150`

## Related Search Checkpoint

The best Search checkpoint by validation success rate is not included in the "top two overall" package, but is useful for reproducing the Search run:

- Search `global_step_180`
- Validation metric: `val/success_rate = 0.356`
- Test metrics:
  - `test/full_skill/success_rate = 0.282`
  - `test/no_skill/success_rate = 0.310`
- Checkpoint path, if packaged separately:
  `checkpoints/SkillZero_search/skillzero_search_vl_3b_local_retriever/global_step_180`

## Hardware Used

Training was submitted through Slurm on the `a100` partition.

- ALFWorld:
  - GPUs: 4 x A100
  - CPUs per task: 32
  - Memory: 200GB
  - Time limit: 2 days

- Search local retriever:
  - GPUs: 4 x A100 allocated
  - Training used GPUs 0,1,2
  - Local retriever used GPU 3
  - CPUs per task: 32
  - Memory: 220GB
  - Time limit: 2 days

## Runtime Notes

- Python environment name used on the cluster: `skillzero`
- Retriever environment name: `retriever`
- Main model: `Qwen/Qwen2.5-VL-3B-Instruct`
- Training entry point: `python3 -m verl.trainer.main_ppo`
- Original training logs are not required to use the checkpoints.

## Restore

After extracting the archive, place checkpoint directories under:

```bash
checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/
```

Then use `trainer.resume_mode=resume_path` and set `trainer.resume_from_path` to the target `global_step_*` directory.