Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16 • 42
VeriThinker: Learning to Verify Makes Reasoning Model Efficient Paper • 2505.17941 • Published May 23 • 25
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22 • 21
Attention Prompting on Image for Large Vision-Language Models Paper • 2409.17143 • Published Sep 25, 2024 • 7
Mugs: A Multi-Granular Self-Supervised Learning Framework Paper • 2203.14415 • Published Mar 27, 2022
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Paper • 2101.11986 • Published Jan 28, 2021
ConvBERT: Improving BERT with Span-based Dynamic Convolution Paper • 2008.02496 • Published Aug 6, 2020
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 73
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning Paper • 2503.07906 • Published Mar 10 • 4