Spaces:

Meta-HF-hackathon
/

updated-policy

Sleeping

init

19f7f7b 25 days ago

812 Bytes

	"""
	Training infrastructure for the two-phase RL system.

	Modules:
	policies — pluggable policy interface (random / LLM API)
	pool_b_baseline — null-context P2 baseline runner (Stage-3 prereq)
	behavioral_metrics — stopping/ordering/calibration/breadth metrics
	trajectory_dataset — load/save trajectories, HF-Dataset adapter
	belief_aux_loss — calibration regression + consistency loss
	segment_grpo — framework-agnostic segment-level GRPO loss
	curriculum — Stage-2 → Stage-3 → Stage-4 driver
	variance_gate — Pool-B variance check + r_cross weight warmup
	ablations — four ablation runners (claims 1-4)
	report — paper tables + behavioral plots
	"""