SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs Paper • 2510.05069 • Published 12 days ago • 11
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published 17 days ago • 54
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published 16 days ago • 28