ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints Paper • 2510.14847 • Published 7 days ago • 53
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published 10 days ago • 26
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published Dec 3, 2024 • 24
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Paper • 2411.19108 • Published Nov 28, 2024 • 20
DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution Paper • 2405.16071 • Published May 25, 2024 • 2