Salesforce/ContextualJudgeBench
Viewer • Updated • 2k • 220 • 3
None defined yet.
Learning from Language Feedback via Variational Policy Distillation
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation