One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 24
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3 • 222
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation Paper • 2508.19209 • Published Aug 26 • 42
Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis Paper • 2403.11487 • Published Mar 18, 2024 • 1
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published May 15 • 16
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 186
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 5 days ago • 60
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 12
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors Paper • 2504.11427 • Published Apr 15 • 18
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published Apr 11 • 42
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper • 2504.08942 • Published Apr 11 • 28