Salesforce/LLaMA-3-8B-SFR-SFT-R
Text Generation • 8B • Updated • 64 • 8
None defined yet.
Learning from Language Feedback via Variational Policy Distillation
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation