Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Paper • 2504.02821 • Published Apr 3 • 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper • 2504.17343 • Published Apr 24 • 12
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper • 2504.15921 • Published Apr 22 • 7
Distilling semantically aware orders for autoregressive image generation Paper • 2504.17069 • Published Apr 23 • 6
VideoDeepResearch: Long Video Understanding With Agentic Tool Using Paper • 2506.10821 • Published Jun 12 • 20
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper • 2506.07177 • Published Jun 8 • 22