T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper β’ 2507.05964 β’ Published about 1 month ago β’ 113
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper β’ 2506.23918 β’ Published Jun 30 β’ 84
WebSailor: Navigating Super-human Reasoning for Web Agent Paper β’ 2507.02592 β’ Published Jul 3 β’ 106
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs Paper β’ 2506.21656 β’ Published Jun 26 β’ 14
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Paper β’ 2506.21356 β’ Published Jun 26 β’ 22
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Paper β’ 2506.21416 β’ Published Jun 26 β’ 28
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper β’ 2506.17450 β’ Published Jun 20 β’ 62
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Paper β’ 2506.18882 β’ Published Jun 23 β’ 88
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper β’ 2506.09827 β’ Published Jun 11 β’ 18
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models Paper β’ 2506.05928 β’ Published Jun 6 β’ 4
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper β’ 2506.07177 β’ Published Jun 8 β’ 22
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Paper β’ 2506.08887 β’ Published Jun 10 β’ 4
Aligning Text, Images, and 3D Structure Token-by-Token Paper β’ 2506.08002 β’ Published Jun 9 β’ 19
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models Paper β’ 2506.06751 β’ Published Jun 7 β’ 71
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper β’ 2506.01111 β’ Published Jun 1 β’ 30
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper β’ 2506.03147 β’ Published Jun 3 β’ 58