paper seminar_251001 - a jmkim0309 Collection

jmkim0309 's Collections

paper seminar_251001

paper seminar_251001

updated 9 days ago

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8 • 39
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published Sep 8 • 31
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published Sep 8 • 29
Interleaving Reasoning for Better Text-to-Image Generation

Paper • 2509.06945 • Published Sep 8 • 13
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10 • 71
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

Paper • 2509.01624 • Published Sep 1 • 7
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Paper • 2509.06942 • Published Sep 8 • 16
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

Paper • 2509.15185 • Published about 1 month ago • 28
LLM-I: LLMs are Naturally Interleaved Multimodal Creators

Paper • 2509.13642 • Published Sep 17 • 8
Image Tokenizer Needs Post-Training

Paper • 2509.12474 • Published Sep 15 • 7
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12 • 30
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10 • 125
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

Paper • 2509.01977 • Published Sep 2 • 12
GenCompositor: Generative Video Compositing with Diffusion Transformer

Paper • 2509.02460 • Published Sep 2 • 25
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28 • 34
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published 30 days ago • 52
Lynx: Towards High-Fidelity Personalized Video Generation

Paper • 2509.15496 • Published about 1 month ago • 12
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published 27 days ago • 64
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

Paper • 2509.19244 • Published 26 days ago • 11
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published 26 days ago • 21
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

Paper • 2510.05094 • Published 13 days ago • 34
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs

Paper • 2509.25771 • Published 19 days ago • 10
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published 19 days ago • 30
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published 17 days ago • 88