Activation Space Interventions Can Be Transferred Between Large Language Models Paper • 2503.04429 • Published Mar 6 • 2
Transferring Activation Features for model interventions Collection 23 items • Updated 2 days ago • 1
Blog: Activations transfer for model interventions. Collection Collects backdoor datasets, language models and transfer mappings between these spaces. • 6 items • Updated May 10 • 3
Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models Paper • 2310.08164 • Published Oct 12, 2023 • 4