Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13 • 10
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction Paper • 2510.22706 • Published Oct 26 • 40
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 190
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Paper • 2507.02813 • Published Jul 3 • 60
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations Paper • 2506.13651 • Published Jun 16 • 8
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding Paper • 2506.01853 • Published Jun 2 • 32
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29 • 68
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper • 2504.01956 • Published Apr 2 • 41
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning Paper • 2503.15265 • Published Mar 19 • 46
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 59
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 77
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published Nov 7, 2024 • 56
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published Aug 29, 2024 • 32
DreamCinema: Cinematic Transfer with Free Camera and 3D Character Paper • 2408.12601 • Published Aug 22, 2024 • 31
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12, 2024 • 55
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published Jun 6, 2024 • 39