GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 209
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published 20 days ago • 122
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful Paper • 2507.07101 • Published 30 days ago • 3
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers Paper • 2507.04404 • Published Jul 6 • 21
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Paper • 2507.08771 • Published 28 days ago • 9
MetaStone-S1 Collection The open-source model of MetaStone-S1. • 4 items • Updated 9 days ago • 9
🧠SmolLM3 Collection Smol, multilingual, long-context reasoner • 12 items • Updated 3 days ago • 69
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Paper • 2507.07955 • Published 29 days ago • 22
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper • 2507.07996 • Published 29 days ago • 32
Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search Paper • 2507.02652 • Published Jul 3 • 24
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening Paper • 2506.02355 • Published Jun 3 • 1
Bridging Offline and Online Reinforcement Learning for LLMs Paper • 2506.21495 • Published Jun 26 • 2
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1 • 44