GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains Paper • 2505.18700 • Published 15 days ago • 4
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper • 2505.24417 • Published 9 days ago • 12
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework Paper • 2504.12395 • Published Apr 16 • 17
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors Paper • 2504.11427 • Published Apr 15 • 19
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31 • 21
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published Apr 3 • 45
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models Paper • 2403.09471 • Published Mar 14, 2024
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper • 2504.01016 • Published Apr 1 • 29
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31 • 21
Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors Paper • 2309.17261 • Published Sep 29, 2023