VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo Paper • 2508.02317 • Published 5 days ago • 12
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published 8 days ago • 103
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published 25 days ago • 84
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 614
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26 • 28
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23 • 56
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 260
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper • 2506.09113 • Published Jun 10 • 99
Cartridges: Lightweight and general-purpose long context representations via self-study Paper • 2506.06266 • Published Jun 6 • 5
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 133
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 176
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning Paper • 2505.21067 • Published May 27 • 3
Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions Paper • 2505.19949 • Published May 26 • 16
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26 • 44
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization Paper • 2505.18092 • Published May 23 • 44
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 89
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58