Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 10 days ago • 101
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training Paper • 2507.17634 • Published Jul 23 • 2