File size: 2,605 Bytes
ddbc1ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# Scripts reference

**Directory:** `scripts/`

---

## `scripts/train_trl.py` β€” main training

Full GRPO training β€” single-step curriculum and episodic curriculum. See [train_trl.md](train_trl.md) for complete reference.

```bash
LIFESTACK_NO_UNSLOTH=1 python scripts/train_trl.py            # full 5-stage curriculum
python scripts/train_trl.py --dry-run                         # pipeline check, CPU OK
python scripts/train_trl.py --resume                          # resume from checkpoint
python scripts/train_trl.py --episode-train                   # episodic mode
python scripts/train_trl.py --push-to-hub --hub-model-id ...  # upload after training
```

---

## `scripts/eval.py` β€” random baseline

Runs a uniform random policy for N episodes and reports mean/std reward. No trained model, no GPU, no API key needed. Use this to establish a reward floor before GRPO runs or to verify env correctness after code changes.

```bash
python scripts/eval.py                              # 10 episodes, all domains
python scripts/eval.py --episodes 20 --domain flight_crisis
python scripts/eval.py --episodes 5 --verbose       # per-step output
```

---

## `scripts/plot_training.py` β€” plot generation

Parses training logs (`train_run_v*.log`) and generates matplotlib plots: reward curve, loss curve, per-component reward breakdown, and 4-panel summary. Supports multiple log formats.

```bash
python scripts/plot_training.py --log train_run_v4.log --output-dir plots/
```

---

## `scripts/smoke_test.py`

Fast pipeline check: import validation + one `reset()` + one `step()`. No GPU, no downloads.

```bash
python scripts/smoke_test.py
```

Also called by `setup.sh` at the end of the install process.

---

## `scripts/upload_hf_model_cards.py`

Uploads model cards and training artifacts to HuggingFace model repositories:

- `docs/HF_MODEL_CARD_V4.md` β†’ `jdsb06/lifestack-grpo-v4/README.md`
- `docs/HF_MODEL_CARD_V1.md` β†’ `jdsb06/lifestack-grpo/README.md`
- `train_run_v1.log` β†’ `jdsb06/lifestack-grpo/`
- Relevant plots β†’ `jdsb06/lifestack-grpo-v4/plots/`

```bash
python scripts/upload_hf_model_cards.py   # requires HF_TOKEN or huggingface-cli login
```

---

## `scripts/run_episode.py`

Runs one full episode with the `LifeStackAgent` (GRPO model or Groq API fallback). Requires credentials configured.

```bash
python scripts/run_episode.py
python scripts/run_episode.py --difficulty 3 --verbose
```

---

## Related files

- `docs/train_trl.md` β€” full train_trl.py reference
- `docs/eval.md` β€” eval.py reference
- `docs/training_guide.md` β€” end-to-end training guide