Leverage the Average: an Analysis of KL Regularization in RL Paper • 2003.14089 • Published Mar 31, 2020 • 2
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Paper • 2305.13185 • Published May 22, 2023
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 47
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 22
Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View Paper • 2401.11237 • Published Jan 20, 2024
MusicRL: Aligning Music Generation to Human Preferences Paper • 2402.04229 • Published Feb 6, 2024 • 17
Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games Paper • 2212.14449 • Published Dec 29, 2022
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion Paper • 2406.19185 • Published Jun 27, 2024
Imitating Language via Scalable Inverse Reinforcement Learning Paper • 2409.01369 • Published Sep 2, 2024
Understanding Likelihood Over-optimisation in Direct Alignment Algorithms Paper • 2410.11677 • Published Oct 15, 2024 • 1