DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models Paper • 2503.02175 • Published Mar 4 • 3
CASP: Compression of Large Multimodal Models Based on Attention Sparsity Paper • 2503.05936 • Published Mar 7 • 2
EBJR: Energy-Based Joint Reasoning for Adaptive Inference Paper • 2110.10343 • Published Oct 20, 2021 • 1
E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models Paper • 2203.00748 • Published Mar 1, 2022 • 1
GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation Paper • 2403.19754 • Published Mar 28, 2024
Efficiently Serving Large Multimodal Models Using EPD Disaggregation Paper • 2501.05460 • Published Dec 25, 2024 • 1
ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models Paper • 2510.02613 • Published 20 days ago • 1
ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale Paper • 2508.17624 • Published Aug 25