Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Cornell-AGI

university

AI & ML interests

Reinforcement Learning from Human Feedback

Cornell-AGI 's collections 3

Accelerating RL for LLM Reasoning with Optimal Advantage Reg

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Paper • 2505.20686 • Published May 27, 2025 • 2
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval

Viewer • Updated May 29, 2025 • 7.47k • 1
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval

Viewer • Updated May 29, 2025 • 7.47k • 2
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval

Viewer • Updated May 29, 2025 • 7.47k • 1

REBEL: Reinforcement Learning via Regressing Relative Reward

REBEL: Reinforcement Learning via Regressing Relative Rewards

Paper • 2404.16767 • Published Apr 25, 2024 • 2
Cornell-AGI/REBEL-Llama-3-Armo-iter_1

8B • Updated Sep 2, 2024 • 1 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_2

8B • Updated Sep 2, 2024 • 6 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_3

8B • Updated Sep 2, 2024 • 2 • 2

Regressing the Relative Future: Efficient Policy Optimizatio

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Paper • 2410.04612 • Published Oct 6, 2024
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1

8B • Updated Oct 8, 2024
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2

8B • Updated Oct 8, 2024
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1

Viewer • Updated Oct 8, 2024 • 64.6k • 18 • 2

Accelerating RL for LLM Reasoning with Optimal Advantage Reg

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Paper • 2505.20686 • Published May 27, 2025 • 2
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval

Viewer • Updated May 29, 2025 • 7.47k • 1
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval

Viewer • Updated May 29, 2025 • 7.47k • 2
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval

Viewer • Updated May 29, 2025 • 7.47k • 1

Regressing the Relative Future: Efficient Policy Optimizatio

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Paper • 2410.04612 • Published Oct 6, 2024
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1

8B • Updated Oct 8, 2024
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2

8B • Updated Oct 8, 2024
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1

Viewer • Updated Oct 8, 2024 • 64.6k • 18 • 2

REBEL: Reinforcement Learning via Regressing Relative Reward

REBEL: Reinforcement Learning via Regressing Relative Rewards

Paper • 2404.16767 • Published Apr 25, 2024 • 2
Cornell-AGI/REBEL-Llama-3-Armo-iter_1

8B • Updated Sep 2, 2024 • 1 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_2

8B • Updated Sep 2, 2024 • 6 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_3

8B • Updated Sep 2, 2024 • 2 • 2

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs