Spaces:

uchihamadara1816
/

AutoDataLab2.0

Sleeping

App Files Files Community

AutoDataLab2.0 / training /evidence /README.md

uchihamadara1816

Upload 172 files

d02bacd verified 25 days ago

preview code

raw

history blame contribute delete

918 Bytes

Training Evidence

Small, replayable evidence.json files for the four method comparisons used in the demo:

sft/
dpo/
sft_dpo/
grpo_rlvr/

These files store recorded CoS action routes, rewards, fallback usage, and terminal scores. They do not require adapter weights, so the GRPO+RLVR evidence can be used even when the exported run has no adapter_config.json.

The plots/ folder contains small committed PNGs for judges:

terminal scores by method
policy rewards by method
expert-brief reward curve
RL loss by method
RL best-reward tracking by method
RL chosen-action correctness by method

The rl_training_metrics/ folder contains real train_metrics.json exports from GRPO, PPO, and GRPO+RLVR runs. These are the source for the RL loss plots.

Generate full textual context reports with:

python3 training/scripts/kaggle_context_results_from_evidence.py --roots .