Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why Paper โข 2605.10889 โข Published 3 days ago โข 3
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation Paper โข 2501.17433 โข Published Jan 29, 2025 โข 10