FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published 6 days ago • 66
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 151
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 8 days ago • 154
Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process Paper • 2406.18361 • Published Jun 26, 2024 • 1
Are Pixel-Wise Metrics Reliable for Sparse-View Computed Tomography Reconstruction? Paper • 2506.02093 • Published Jun 2 • 1
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published 28 days ago • 22
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published Sep 19 • 52