AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published 20 days ago • 58
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published 20 days ago • 58 • 2
Identifying Sensitive Weights via Post-quantization Integral Paper • 2503.01901 • Published Feb 28 • 8
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training Paper • 2409.09099 • Published Sep 13, 2024
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models Paper • 2509.25996 • Published Sep 30
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training Paper • 2407.20584 • Published Jul 30, 2024
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs Paper • 2510.04767 • Published Oct 6 • 26
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published 20 days ago • 58
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28 • 115
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16 • 75