Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models Paper β’ 2604.08545 β’ Published 5 days ago β’ 39
Vero: An Open RL Recipe for General Visual Reasoning Paper β’ 2604.04917 β’ Published 8 days ago β’ 30
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper β’ 2603.28767 β’ Published 15 days ago β’ 57
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper β’ 2603.19232 β’ Published 26 days ago β’ 33
BitDance Collection BitDance: Open-source autoregressive model with binary visual tokens. A research project for building powerful multimodal autoregressive model. β’ 10 items β’ Updated Mar 2 β’ 11
Running on Zero MCP Featured 85 BitDance-14B-64x π 85 Open-source autoregressive model with binary visual tokens.
UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model Paper β’ 2602.14178 β’ Published Feb 15 β’ 14
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper β’ 2602.14041 β’ Published Feb 15 β’ 53
UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model Paper β’ 2602.14178 β’ Published Feb 15 β’ 14
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper β’ 2602.14041 β’ Published Feb 15 β’ 53
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper β’ 2512.17909 β’ Published Dec 19, 2025 β’ 37
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper β’ 2512.13281 β’ Published Dec 15, 2025 β’ 65
OneThinker: All-in-one Reasoning Model for Image and Video Paper β’ 2512.03043 β’ Published Dec 2, 2025 β’ 34