Matthieu Geist's picture

Matthieu Geist

matthieu-geist

·

AI & ML interests

None yet

Recent Activity

authored a paper 3 months ago

Leverage the Average: an Analysis of KL Regularization in RL

authored a paper 3 months ago

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

authored a paper 3 months ago

Nash Learning from Human Feedback

View all activity

Organizations

authored 16 papers 3 months ago

Leverage the Average: an Analysis of KL Regularization in RL

Paper • 2003.14089 • Published Mar 31, 2020 • 2

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Paper • 2305.13185 • Published May 22, 2023

Nash Learning from Human Feedback

Paper • 2312.00886 • Published Dec 1, 2023 • 18

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Paper • 2306.13649 • Published Jun 23, 2023 • 22

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View

Paper • 2401.11237 • Published Jan 20, 2024

MusicRL: Aligning Music Generation to Human Preferences

Paper • 2402.04229 • Published Feb 6, 2024 • 17

Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3, 2024 • 20

Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

Paper • 2212.14449 • Published Dec 29, 2022

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Paper • 2406.19185 • Published Jun 27, 2024

Imitating Language via Scalable Inverse Reinforcement Learning

Paper • 2409.01369 • Published Sep 2, 2024

Solving robust MDPs as a sequence of static RL problems

Paper • 2410.06212 • Published Oct 8, 2024

Time-Constrained Robust MDPs

Paper • 2406.08395 • Published Jun 12, 2024

RRLS : Robust Reinforcement Learning Suite

Paper • 2406.08406 • Published Jun 12, 2024

Understanding Likelihood Over-optimisation in Direct Alignment Algorithms

Paper • 2410.11677 • Published Oct 15, 2024 • 1

Command A: An Enterprise-Ready Large Language Model

Paper • 2504.00698 • Published Apr 1 • 27