updated-policy / training /__init__.py
srinjoyd's picture
init
19f7f7b
"""
Training infrastructure for the two-phase RL system.
Modules:
policies β€” pluggable policy interface (random / LLM API)
pool_b_baseline β€” null-context P2 baseline runner (Stage-3 prereq)
behavioral_metrics β€” stopping/ordering/calibration/breadth metrics
trajectory_dataset β€” load/save trajectories, HF-Dataset adapter
belief_aux_loss β€” calibration regression + consistency loss
segment_grpo β€” framework-agnostic segment-level GRPO loss
curriculum β€” Stage-2 β†’ Stage-3 β†’ Stage-4 driver
variance_gate β€” Pool-B variance check + r_cross weight warmup
ablations β€” four ablation runners (claims 1-4)
report β€” paper tables + behavioral plots
"""