Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3 • 21
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs Paper • 2509.00930 • Published Aug 31 • 4
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth Paper • 2509.03867 • Published Sep 4 • 208
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 73
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? Paper • 2509.04292 • Published Sep 4 • 57
Delta Activations: A Representation for Finetuned Large Language Models Paper • 2509.04442 • Published Sep 4 • 6
Set Block Decoding is a Language Model Inference Accelerator Paper • 2509.04185 • Published Sep 4 • 52
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs Paper • 2509.04013 • Published Sep 4 • 4
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 56
Reinforcement Learning Foundations for Deep Research Systems: A Survey Paper • 2509.06733 • Published Sep 8 • 31
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Paper • 2509.06493 • Published Sep 8 • 11
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents Paper • 2509.06283 • Published Sep 8 • 17
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet Paper • 2509.06861 • Published Sep 8 • 8
R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World Paper • 2509.06786 • Published Sep 8 • 3
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 98
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper • 2509.08721 • Published Sep 10 • 673
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding Paper • 2509.06923 • Published Sep 8 • 21
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning Paper • 2509.03646 • Published Sep 3 • 30
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers Paper • 2509.06938 • Published Sep 8 • 5
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 183
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models Paper • 2509.09675 • Published Sep 11 • 28
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8 • 16
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML Paper • 2509.06806 • Published Sep 8 • 63
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 33
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving Paper • 2509.12603 • Published Sep 16 • 9
Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published Sep 16 • 69
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning Paper • 2509.13755 • Published Sep 17 • 19
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning Paper • 2509.13761 • Published Sep 17 • 16
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18 • 52
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18 • 33
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Paper • 2509.15591 • Published Sep 19 • 45
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning Paper • 2509.17437 • Published about 1 month ago • 17
DiffusionNFT: Online Diffusion Reinforcement with Forward Process Paper • 2509.16117 • Published Sep 19 • 20
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels Paper • 2509.16596 • Published Sep 20 • 13
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning Paper • 2509.18083 • Published about 1 month ago • 5
Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs Paper • 2509.17998 • Published about 1 month ago • 1
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published 29 days ago • 22
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published 28 days ago • 38
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say Paper • 2509.21164 • Published 27 days ago • 8
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published 29 days ago • 117
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines Paper • 2509.21320 • Published 27 days ago • 99
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published 28 days ago • 17
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning Paper • 2509.21070 • Published 27 days ago • 9
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published 26 days ago • 131
Quantile Advantage Estimation for Entropy-Safe Reasoning Paper • 2509.22611 • Published 26 days ago • 117
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published 26 days ago • 67
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping Paper • 2509.21880 • Published 27 days ago • 44
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning Paper • 2509.19894 • Published 29 days ago • 32
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models Paper • 2509.22300 • Published 26 days ago • 3
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published 24 days ago • 114
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR Paper • 2509.23808 • Published 25 days ago • 47
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance Paper • 2509.22193 • Published 27 days ago • 37
SparseD: Sparse Attention for Diffusion Language Models Paper • 2509.24014 • Published 24 days ago • 30
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards Paper • 2509.24981 • Published 23 days ago • 29
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published 23 days ago • 18
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning Paper • 2509.23285 • Published 25 days ago • 13
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training Paper • 2509.24494 • Published 24 days ago • 9
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published 22 days ago • 496
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published 23 days ago • 52
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners Paper • 2509.26226 • Published 22 days ago • 31
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training Paper • 2509.25758 • Published 23 days ago • 21
Mem-α: Learning Memory Construction via Reinforcement Learning Paper • 2509.25911 • Published 23 days ago • 14
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Paper • 2509.26628 • Published 22 days ago • 12
InfoAgent: Advancing Autonomous Information-Seeking Agents Paper • 2509.25189 • Published 23 days ago • 11
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective Paper • 2509.22613 • Published 26 days ago • 9
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models Paper • 2509.24510 • Published 24 days ago • 3
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published 23 days ago • 133
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 23 days ago • 46
ACON: Optimizing Context Compression for Long-horizon LLM Agents Paper • 2510.00615 • Published 22 days ago • 30
BroRL: Scaling Reinforcement Learning via Broadened Exploration Paper • 2510.01180 • Published 21 days ago • 17
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs Paper • 2510.01037 • Published 21 days ago • 2
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published 22 days ago • 106
Interactive Training: Feedback-Driven Neural Network Optimization Paper • 2510.02297 • Published 20 days ago • 40
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems Paper • 2510.02263 • Published 20 days ago • 8