|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- serl |
|
|
- reinforcement-learning |
|
|
- qwen2.5 |
|
|
- checkpoints |
|
|
--- |
|
|
|
|
|
# SeRL Training Checkpoints |
|
|
|
|
|
Compressed checkpoints from SeRL (Self-Evolving Reinforcement Learning) experiments. |
|
|
Files use **ZipNN lossless compression** (~33% smaller, transparent loading). |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```bash |
|
|
pip install zipnn huggingface_hub transformers |
|
|
``` |
|
|
|
|
|
```python |
|
|
# Enable ZipNN transparent loading |
|
|
from zipnn import zipnn_hf |
|
|
zipnn_hf() |
|
|
|
|
|
from huggingface_hub import snapshot_download |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Download specific checkpoint |
|
|
path = snapshot_download( |
|
|
"AshwinKM2005/serl-checkpoints", |
|
|
allow_patterns="serl_arce_qwen25_1_5b/huggingface/*" |
|
|
) |
|
|
|
|
|
# Load model (auto-decompresses .znn files) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
f"{path}/serl_arce_qwen25_1_5b/huggingface/global_step200_hf" |
|
|
) |
|
|
``` |
|
|
|
|
|
## Experiments |
|
|
|
|
|
| Experiment | Base Model | Dataset | |
|
|
|------------|-----------|---------| |
|
|
| serl_arcc_qwen25_0_5b | Qwen2.5-0.5B | ARC-Challenge | |
|
|
| serl_ARC-c_qwen25_1_5b | Qwen2.5-1.5B | ARC-Challenge | |
|
|
| serl_arce_qwen25_0_5b | Qwen2.5-0.5B | ARC-Easy | |
|
|
| serl_arce_qwen25_1_5b | Qwen2.5-1.5B | ARC-Easy | |
|
|
|
|
|
## Structure |
|
|
|
|
|
``` |
|
|
serl-checkpoints/ |
|
|
βββ serl_arcc_qwen25_0_5b/ |
|
|
β βββ huggingface/global_step100_hf/ |
|
|
β βββ deepspeed/global_step100/ |
|
|
βββ serl_ARC-c_qwen25_1_5b/ |
|
|
β βββ ... |
|
|
βββ ... |
|
|
``` |
|
|
|
|
|
## Compression |
|
|
|
|
|
Files ending in `.safetensors.znn` are ZipNN compressed. |
|
|
The `zipnn_hf()` hook enables transparent loading. |
|
|
|