LADDER: Self-Improving LLMs Through Recursive Problem Decomposition Paper • 2503.00735 • Published Mar 2 • 23
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7 • 27
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published Mar 7 • 39
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 51
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation • 2B • Updated 16 days ago • 11.3k • 182
microsoft/Phi-4-mini-flash-reasoning Text Generation • 4B • Updated 20 days ago • 17.3k • 216