Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis Paper • 2411.01156 • Published Nov 2, 2024 • 10
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published 10 days ago • 67
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published 28 days ago • 44
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 13 days ago • 165
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 13 days ago • 157
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published 19 days ago • 91
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 20 days ago • 445
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published 26 days ago • 507
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published 30 days ago • 132
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published 30 days ago • 122
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published 30 days ago • 176
Tree Search for LLM Agent Reinforcement Learning Paper • 2509.21240 • Published about 1 month ago • 87
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 75