File size: 2,019 Bytes
1d91be3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# SkillZero Best Checkpoints Export

This package contains the two best checkpoints selected by validation success rate from the completed SkillZero runs.

## Included Checkpoints

1. ALFWorld `global_step_160`
   - Validation metric: `val/success_rate = 0.594`
   - Archive path:
     `checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_160`


2. ALFWorld `global_step_150`
   - Validation metric: `val/success_rate = 0.477`
   - Archive path:
     `checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/global_step_150`


## Related Search Checkpoint

The best Search checkpoint by validation success rate is not included in the "top two overall" package, but is useful for reproducing the Search run:

- Search `global_step_180`
- Validation metric: `val/success_rate = 0.356`
- Test metrics:
  - `test/full_skill/success_rate = 0.282`
  - `test/no_skill/success_rate = 0.310`
- Checkpoint path, if packaged separately:
  `checkpoints/SkillZero_search/skillzero_search_vl_3b_local_retriever/global_step_180`

## Hardware Used

Training was submitted through Slurm on the `a100` partition.

- ALFWorld:
  - GPUs: 4 x A100
  - CPUs per task: 32
  - Memory: 200GB
  - Time limit: 2 days

- Search local retriever:
  - GPUs: 4 x A100 allocated
  - Training used GPUs 0,1,2
  - Local retriever used GPU 3
  - CPUs per task: 32
  - Memory: 220GB
  - Time limit: 2 days

## Runtime Notes

- Python environment name used on the cluster: `skillzero`
- Retriever environment name: `retriever`
- Main model: `Qwen/Qwen2.5-VL-3B-Instruct`
- Training entry point: `python3 -m verl.trainer.main_ppo`
- Original training logs are not required to use the checkpoints.

## Restore

After extracting the archive, place checkpoint directories under:

```bash

checkpoints/SkillZero_alfworld/skillzero_alfworld_vl_3b_safe/

```

Then use `trainer.resume_mode=resume_path` and set `trainer.resume_from_path` to the target `global_step_*` directory.