view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 151
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 4 days ago • 90
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published 10 days ago • 66
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 9 days ago • 40
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published 22 days ago • 92 • 8
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published 22 days ago • 92