Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models Paper • 2404.02657 • Published Apr 3, 2024 • 1
Shadow-FT Collection The trained weights via shadow-FT ( https://arxiv.org/abs/2505.12716 ) • 9 items • Updated Jun 3