Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality Paper • 2602.14080 • Published 16 days ago • 20
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking Paper • 2602.16849 • Published 12 days ago • 6
2Mamba2Furious: Linear in Complexity, Competitive in Accuracy Paper • 2602.17363 • Published 12 days ago • 7
Preliminary sonification of ENSO using traditional Javanese gamelan scales Paper • 2602.14560 • Published 15 days ago • 1
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published 14 days ago • 9
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels Paper • 2602.11715 • Published 19 days ago • 5 • 3
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels Paper • 2602.11715 • Published 19 days ago • 5
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm Paper • 2602.11543 • Published 19 days ago • 5 • 4
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm Paper • 2602.11543 • Published 19 days ago • 5
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation Paper • 2602.11451 • Published 19 days ago • 15
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models Paper • 2602.06694 • Published 25 days ago • 15 • 5
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models Paper • 2602.06694 • Published 25 days ago • 15
SimpleGPT: Improving GPT via A Simple Normalization Strategy Paper • 2602.01212 • Published 30 days ago • 3 • 4
SimpleGPT: Improving GPT via A Simple Normalization Strategy Paper • 2602.01212 • Published 30 days ago • 3
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection Paper • 2601.19375 • Published Jan 27 • 5
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors Paper • 2601.17958 • Published Jan 25 • 3
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Paper • 2601.18778 • Published Jan 26 • 40