File size: 812 Bytes
19f7f7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"""
Training infrastructure for the two-phase RL system.

Modules:
    policies                β€” pluggable policy interface (random / LLM API)
    pool_b_baseline         β€” null-context P2 baseline runner (Stage-3 prereq)
    behavioral_metrics      β€” stopping/ordering/calibration/breadth metrics
    trajectory_dataset      β€” load/save trajectories, HF-Dataset adapter
    belief_aux_loss         β€” calibration regression + consistency loss
    segment_grpo            β€” framework-agnostic segment-level GRPO loss
    curriculum              β€” Stage-2 β†’ Stage-3 β†’ Stage-4 driver
    variance_gate           β€” Pool-B variance check + r_cross weight warmup
    ablations               β€” four ablation runners (claims 1-4)
    report                  β€” paper tables + behavioral plots
"""