InstantIR: Blind Image Restoration with Instant Generative Reference Paper • 2410.06551 • Published Oct 9, 2024 • 6
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Paper • 2502.05761 • Published Feb 9 • 7
Dynamic Pyramid Network for Efficient Multimodal Large Language Model Paper • 2503.20322 • Published Mar 26 • 1
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 143
WithAnyone: Towards Controllable and ID Consistent Image Generation Paper • 2510.14975 • Published Oct 16 • 83
view post Post 5721 Want to iterate on a Hugging Face Space with an LLM? Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model! multimodalart/repo2txt See translation 🤗 3 3 🚀 1 1 👍 1 1 + Reply
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework Paper • 2504.12395 • Published Apr 16 • 16
view post Post 17855 Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐I've built a live real time demo on Spaces 📹💨 multimodalart/self-forcing See translation 6 replies · ❤️ 12 12 🔥 6 6 + Reply
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains Paper • 2505.18700 • Published May 24 • 4
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper • 2505.24417 • Published May 30 • 13