Salesforce/E1-Code-14B
Text Generation • 15B • Updated • 42 • 6
None defined yet.
Learning from Language Feedback via Variational Policy Distillation
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation