File size: 918 Bytes
d02bacd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Training Evidence

Small, replayable `evidence.json` files for the four method comparisons used in
the demo:

- `sft/`
- `dpo/`
- `sft_dpo/`
- `grpo_rlvr/`

These files store recorded CoS action routes, rewards, fallback usage, and
terminal scores. They do not require adapter weights, so the GRPO+RLVR evidence
can be used even when the exported run has no `adapter_config.json`.

The `plots/` folder contains small committed PNGs for judges:

- terminal scores by method
- policy rewards by method
- expert-brief reward curve
- RL loss by method
- RL best-reward tracking by method
- RL chosen-action correctness by method

The `rl_training_metrics/` folder contains real `train_metrics.json` exports
from GRPO, PPO, and GRPO+RLVR runs. These are the source for the RL loss plots.

Generate full textual context reports with:

```bash
python3 training/scripts/kaggle_context_results_from_evidence.py --roots .
```