SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing Paper • 2405.04007 • Published May 7, 2024 • 1
Chimera: Compositional Image Generation using Part-based Concepting Paper • 2510.18083 • Published Oct 20 • 1
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation Paper • 2509.15357 • Published Sep 18 • 1
Structured Information for Improving Spatial Relationships in Text-to-Image Generation Paper • 2509.15962 • Published Sep 19 • 1
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published 10 days ago • 63
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions Paper • 2511.06876 • Published 12 days ago • 22
Simulating the Visual World with Artificial Intelligence: A Roadmap Paper • 2511.08585 • Published 11 days ago • 28
LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE Paper • 2509.21790 • Published Sep 26 • 1
Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models Paper • 2510.18457 • Published Oct 21 • 3
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies Paper • 2510.17950 • Published Oct 20 • 7
FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min Paper • 2510.02691 • Published Oct 3 • 1
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting Paper • 2511.07321 • Published 12 days ago • 1
PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting Paper • 2510.18714 • Published Oct 21 • 1
MapAnything: Universal Feed-Forward Metric 3D Reconstruction Paper • 2509.13414 • Published Sep 16 • 2
MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts Paper • 2510.27234 • Published 23 days ago • 1
SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views Paper • 2509.17246 • Published Sep 21 • 2