view reply It may be that Magnitude-Preserving Orthogonal Ablation can help with the remaining issues. Tested models: Harbinger 24B Absolute Heresy Gemma 3n E4B MPOA Cydonia 4.3 Absolute Heresy
view reply I have been experimenting with Heretic models. Like abiliteration before it, Heretic appears to narrowly remove overt refusals while preserving deceptive capabilities. This covert noncompliance is especially apparent for Llama base models. Tested models: Cydonia-24B-v4.3-heretic-v2: super decensored, although the base model was already pretty uncensored Llama3.2-30B-A3B-II-Dark-Champion-INSTRUCT-Heretic-Abliterated-Uncensored: subverts or ignores topics deemed unsafe, quality degrades if jailbroken LongWriter-llama3.1-8b-Heretic: also self-censors and reports itself as censored, but can be jailbroken