SyntheticSAEBench Model Variations
This repository contains variations on the SynthSAEBench-16k model, organized into subdirs based on the specific attribute that's different. Unless otherwise specified, all other attributes are identical to the original SynthSAEBench-16k model.
firing-magnitude-stdev
These models change the stev of firing magnitude, setting it to a constant for each feature in the model. The base model uses a random std per-feature with mean 0.5. Available variations:
- std-0
- std-0.1
- std-0.5
- std-2.5
superposition
These models change the hidden dimension of the model, changing the level of superposition in the model. Larger hidden dim means less superposition. The base model has hidden dim 768. Available variations:
- d-512
- d-1024
- d-1536
truncate-num-features
These models truncate the number of features in the original model, keeping the first N features. The base model has 16384 feature. Available variations:
- n-4096
- n-8192
relative-firing-probability
These models scale all the probabilities of the original model by the given multiplier (1.0 would be identical to the base model). This also scales the L0 of the model. Available variations:
- rel-p-0.1
- rel-p-0.25
- rel-p-0.5
- rel-p-0.75
- rel-p-1.25
- rel-p-1.5
misc
These models change several properties at once, typically using different hierarchy structures. However, the current models here are designed to keep the L0 of the first 4096 features at around 25 to match the standard model. Available variations:
- hierarchy-128-128-me-1.0-l0-40-4kl0-25
- rand-hierarchy-16-4-32-me-0.75-l0-30-4kl0-24
In these models, me-0.75 means 75% of nodes in the hierarchy have mutually-exclusive children. The number after hierarchy is the number of root nodes. rand-hierarchy means there is a random number of children per parent. E.g. rand-hierarchy-16-4-32 means 16 root nodes, and randomly between 4 and 32 child nodes per parent. For full details of the settings of misc models, it's best to look at the model config directly.
Usage
from sae_lens.synthetic import SyntheticModel
model = SyntheticModel.from_pretrained("chanind/synth-sae-bench-variations", model_path="model/path")