Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 3 days ago • 68
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 3 days ago • 68
Solaris: Building a Multiplayer Video World Model in Minecraft Paper • 2602.22208 • Published 9 days ago • 27
Solaris-Models Collection Model weights for Solaris: Building a Multiplayer Video World Model in Minecraft • 1 item • Updated 4 days ago • 3
Solaris-Data Collection Training and evaluation datasets collected for Solaris: Building a Multiplayer Video World Model in Minecraft • 2 items • Updated 11 days ago • 3
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 53
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Paper • 2511.04655 • Published Nov 6, 2025 • 8
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding Paper • 2511.04668 • Published Nov 6, 2025 • 5