An Embarrassingly Simple Defense Against LLM Abliteration Attacks
Paper
•
2505.19056
•
Published
•
4
Large Language Models, Cooperative AI, AI Society, Multi Agent Systems, Deep Learning, Artificial Intelligence, Natural Language Processing, Communicative AI