Aaron Cummings

Aaron-Cu

AI & ML interests

PhD Student CS | Artificial Intelligence Security and Optimization Graduate Research Assistant @ Kennesaw State University | MSCS | BSCS

Recent Activity

upvoted a paper 19 days ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

upvoted a paper 19 days ago

Reinforcement Pre-Training

upvoted a paper 19 days ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

View all activity

Organizations

upvoted 5 papers 19 days ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 301

upvoted a paper 4 months ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 138

liked a dataset 4 months ago

arxiv-community/arxiv_dataset

Updated Jan 18, 2024 • 1.04k • 132

upvoted a collection 5 months ago

SmolLM2

Collection

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5 • 295

upvoted a paper 6 months ago

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Paper • 2506.14965 • Published Jun 17 • 49

upvoted 2 papers 7 months ago

A General Theoretical Paradigm to Understand Learning from Human Preferences

Paper • 2310.12036 • Published Oct 18, 2023 • 19

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

upvoted an article 7 months ago

Article

Preference Tuning LLMs with Direct Preference Optimization Methods

Jan 18, 2024

•

liked a dataset 7 months ago

Anthropic/hh-rlhf

Viewer • Updated May 26, 2023 • 169k • 25.2k • 1.52k

upvoted 2 articles 7 months ago

Article

How to train a new language model from scratch using Transformers and Tokenizers

Feb 14, 2020

•

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

385

Aaron Cummings

AI & ML interests

Recent Activity

Organizations

Aaron-Cu's activity

Preference Tuning LLMs with Direct Preference Optimization Methods

How to train a new language model from scratch using Transformers and Tokenizers

Illustrating Reinforcement Learning from Human Feedback (RLHF)