Streaming 120-step telemetry direct from GRPO Trainer.
Multi-Objective Log-Barrier Reward Surface
Rolling Recall
--%
Target ≥ 95.0%
Rolling FPR
Target < 5.0%
Current Step
--
Max 120
Reward
Log-Barrier Metric