Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning Paper • 2508.07101 • Published Aug 9 • 14
Accelerating Retrieval-Augmented Language Model Serving with Speculation Paper • 2401.14021 • Published Jan 25, 2024 • 2
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention Paper • 2410.05076 • Published Oct 7, 2024 • 8
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 122