This works very well!
Good model! Granite is a beast!
Thanks!
If interested I can make the other variants public. I played around with optimizing “refusals on test set /100” vs KL divergence. Even 71/100 refusals on the test set it will straight up tell you how to make white phosphorus with “at home” reactions and there’s somewhat of a quality/level-of-detail improvement with a KL divergence of 0.05 instead of this (my subjective opinion)
Yes, please publish the other versions too! I will compare them and give feedback.
What parameters did you use for abliteration? I want to try it on my own.
Done:
- https://huggingface.co/pszemraj/granite-4.0-h-small-heretic_lo
- https://huggingface.co/pszemraj/granite-4.0-h-small-heretic_med
abliteration params for each/framework are on the model cards! note that I am WIP adding support for hybrid models to the repo (and have some work to do) so I would use this branch for now: https://github.com/pszemraj/heretic/tree/hybrid-layer-support