Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5, 2025 • 26
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Paper • 2601.07372 • Published Jan 12 • 42