Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 8 days ago • 85
Efficient RLVR Training via Weighted Mutual Information Data Selection Paper • 2603.01907 • Published 9 days ago • 14
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Paper • 2602.23881 • Published 12 days ago • 18
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published 15 days ago • 93
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published about 1 month ago • 261
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 Paper • 2602.14457 • Published 24 days ago • 28
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published 21 days ago • 12
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 26 days ago • 43
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published Feb 4 • 10
FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching Paper • 2602.12829 • Published 26 days ago • 4
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 29 days ago • 241
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published 27 days ago • 91
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 347
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published Feb 2 • 95
Closing the Loop: Universal Repository Representation with RPG-Encoder Paper • 2602.02084 • Published Feb 2 • 83