Spaces:
Sleeping
Sleeping
File size: 812 Bytes
19f7f7b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | """
Training infrastructure for the two-phase RL system.
Modules:
policies β pluggable policy interface (random / LLM API)
pool_b_baseline β null-context P2 baseline runner (Stage-3 prereq)
behavioral_metrics β stopping/ordering/calibration/breadth metrics
trajectory_dataset β load/save trajectories, HF-Dataset adapter
belief_aux_loss β calibration regression + consistency loss
segment_grpo β framework-agnostic segment-level GRPO loss
curriculum β Stage-2 β Stage-3 β Stage-4 driver
variance_gate β Pool-B variance check + r_cross weight warmup
ablations β four ablation runners (claims 1-4)
report β paper tables + behavioral plots
"""
|