view article Article **Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding** 2 days ago • 34
view article Article **Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding** 2 days ago • 34
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 17
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 12 items • Updated about 20 hours ago • 62