Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 4 days ago • 90
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published 10 days ago • 66
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 9 days ago • 40
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published 22 days ago • 92
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published May 12 • 132
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 151
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 12 days ago • 157
ASPO: Asymmetric Importance Sampling Policy Optimization Paper • 2510.06062 • Published 18 days ago • 13
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published 18 days ago • 69
VideoNSA: Native Sparse Attention Scales Video Understanding Paper • 2510.02295 • Published 23 days ago • 9
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18, 2024 • 78
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published 23 days ago • 49
The Unreasonable Effectiveness of Scaling Agents for Computer Use Paper • 2510.02250 • Published 23 days ago • 24