Spaces:
Sleeping
Sleeping
| """ | |
| Training infrastructure for the two-phase RL system. | |
| Modules: | |
| policies β pluggable policy interface (random / LLM API) | |
| pool_b_baseline β null-context P2 baseline runner (Stage-3 prereq) | |
| behavioral_metrics β stopping/ordering/calibration/breadth metrics | |
| trajectory_dataset β load/save trajectories, HF-Dataset adapter | |
| belief_aux_loss β calibration regression + consistency loss | |
| segment_grpo β framework-agnostic segment-level GRPO loss | |
| curriculum β Stage-2 β Stage-3 β Stage-4 driver | |
| variance_gate β Pool-B variance check + r_cross weight warmup | |
| ablations β four ablation runners (claims 1-4) | |
| report β paper tables + behavioral plots | |
| """ | |