-
LAPS: A Length-Aware-Prefill LLM Serving System
Paper • 2601.11589 • Published • 1 -
Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving
Paper • 2512.17077 • Published -
PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks
Paper • 2501.09367 • Published -
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Paper • 2502.13965 • Published • 19