Latent Adversarial Regularization for Offline Preference Optimization Paper • 2601.22083 • Published 4 days ago • 11
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs Paper • 2505.20254 • Published May 26, 2025 • 5
Model Transferability With Responsive Decision Subjects Paper • 2107.05911 • Published Jul 13, 2021
Procedural Fairness Through Decoupling Objectionable Data Generating Components Paper • 2311.14688 • Published Nov 5, 2023