The Rogue Scalpel: Activation Steering Compromises LLM Safety Paper • 2509.22067 • Published 23 days ago • 26
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published 13 days ago • 96
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features Paper • 2509.22033 • Published 23 days ago • 16
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs Paper • 2508.11383 • Published Aug 15 • 39