AnyI2V: Animating Any Conditional Image with Motion Control Paper โข 2507.02857 โข Published Jul 3 โข 12
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion Paper โข 2507.06165 โข Published Jul 8 โข 54
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Paper โข 2506.21416 โข Published Jun 26 โข 28
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper โข 2506.13759 โข Published Jun 16 โข 42
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper โข 2506.09985 โข Published Jun 11 โข 30
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework Paper โข 2506.10741 โข Published Jun 12 โข 27
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper โข 2506.05982 โข Published Jun 6 โข 2
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper โข 2506.05982 โข Published Jun 6 โข 2 โข 2
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper โข 2506.09790 โข Published Jun 11 โข 52
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper โข 2506.05573 โข Published Jun 5 โข 77
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper โข 2506.01011 โข Published Jun 1 โข 9
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper โข 2506.01011 โข Published Jun 1 โข 9 โข 2
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper โข 2505.21541 โข Published May 24 โข 7
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper โข 2505.21541 โข Published May 24 โข 7
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper โข 2505.21541 โข Published May 24 โข 7 โข 2