MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper โข 2506.05982 โข Published Jun 6 โข 2 โข 2
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper โข 2506.01011 โข Published Jun 1 โข 9 โข 2
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper โข 2505.21541 โข Published May 24 โข 7 โข 2
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers Paper โข 2506.02528 โข Published Jun 3 โข 15 โข 2
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper โข 2505.24417 โข Published May 30 โข 13 โข 2
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains Paper โข 2505.18700 โข Published May 24 โข 4 โข 2
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper โข 2505.18445 โข Published May 24 โข 65 โข 2
FocusedAD: Character-centric Movie Audio Description Paper โข 2504.12157 โข Published Apr 16 โข 9 โข 3
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer Paper โข 2503.07027 โข Published Mar 10 โข 29 โข 2
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper โข 2502.14397 โข Published Feb 20 โข 42 โข 6
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper โข 2502.14397 โข Published Feb 20 โข 42 โข 6
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer Paper โข 2502.01105 โข Published Feb 3 โข 20 โข 4
MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation Paper โข 2502.01572 โข Published Feb 3 โข 21 โข 2