EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published Feb 13 • 8
YOLOv12: Attention-Centric Real-Time Object Detectors Paper • 2502.12524 • Published Feb 18 • 12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 151
ObjectMover: Generative Object Movement with Video Prior Paper • 2503.08037 • Published Mar 11 • 5
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 73
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published Mar 18 • 152
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31 • 21
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 33
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation Paper • 2506.19852 • Published Jun 24 • 41
Representing Speech Through Autoregressive Prediction of Cochlear Tokens Paper • 2508.11598 • Published Aug 15 • 17
2D Gaussian Splatting with Semantic Alignment for Image Inpainting Paper • 2509.01964 • Published Sep 2 • 6
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 5 days ago • 34