Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models
Paper • 2605.12227 • Published • 1
A research group focused on Natural Language Processing, Deep Learning and Sparse Modeling at Instituto de Telecomunicações and Instituto Superior Técnico in the sunny Lisbon ☀️, Portugal 🇵🇹.