arxiv:2505.19731
Michal Valko
misovalko
AI & ML interests
large language models, reasoning, fine-tuning, test-time computation, reinforcement learning with human feedback, world models
Recent Activity
new activity
5 days ago
paris-ai-running-club/README:Replace event upvoted a paper 3 months ago
A General Theoretical Paradigm to Understand Learning from Human
Preferences authored
a paper
3 months ago
Optimal Design for Reward Modeling in RLHF