Leverage the Average: an Analysis of KL Regularization in RL Paper • 2003.14089 • Published Mar 31, 2020 • 2
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Paper • 2305.13185 • Published May 22, 2023
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 49
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 31
Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View Paper • 2401.11237 • Published Jan 20, 2024