-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 1 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 1
Cornell-AGI
university
AI & ML interests
Reinforcement Learning from Human Feedback
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 18 • 2
-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 1 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 1
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 18 • 2