File size: 2,018 Bytes
d53a65c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# True LLM Learning Evaluation (Pre-RL vs Post-RL)

This folder is for checkpoint-vs-checkpoint evidence:

- pre-RL base model
- post-RL trained checkpoint

Both are evaluated with an identical protocol.

## Required environment variables

- `BASELINE_MODEL_NAME`
- `TRAINED_MODEL_PATH` (local directory with `adapter_config.json`)
- `ENV_BASE_URL` (CommitmentOS HTTP API)

Optional:

- `HF_TOKEN` (gated Hub models / rate limits)

Optional protocol overrides:

- `EVAL_SEED` (default: `42`)
- `EVAL_MAX_STEPS` (default: `12`)
- `EVAL_TEMPERATURE` (default: `0.0`)
- `EVAL_TOP_P` (default: `1.0`)
- `EVAL_MAX_NEW_TOKENS` (default: `256`)
- `EVAL_SUCCESS_THRESHOLD` (default: `0.6`)

## Run

```bash
cd commitment_os
pip install -e ".[llm-eval]"
python3 evaluation/evaluate_llm_checkpoints.py
python3 evaluation/plot_llm_checkpoints.py
```

The evaluator prints one line per task (`[eval …] task i/n`) so long Colab runs do not look frozen.

## After Colab

Zip weights + artifacts for download (paths assume `/content/commitment_os`):

```bash
cd /content/commitment_os && zip -r /content/commitment_os_bundle.zip training_output artifacts/evals_llm
```

Or copy `training_output/` and `artifacts/evals_llm/` to Google Drive if the zip is too large for the browser.

These bundles are **not** checked into git (clone speed + history). A **~330MB** zip (weights + this folder) is a normal size: publish it as a **GitHub Release** asset, **HF Hub**, or **Google Drive**.

**Drive (weights + this folder):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) — after download you should have `artifacts/evals_llm/` (this layout) next to `training_output/`. See root **README** for `gdown` / `TRAINED_MODEL_PATH` notes.

## Expected outputs

- `llm_eval_protocol.json`
- `baseline_llm_eval.json`
- `trained_llm_eval.json`
- `llm_comparison.csv`
- `llm_summary.json`
- `llm_case_study_hard_015.md`
- `llm_reward_by_task.svg`
- `llm_violations_before_after.svg`