Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published 10 days ago • 26
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published 18 days ago • 18
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published 18 days ago • 480
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? Paper • 2305.07759 • Published May 12, 2023 • 36