π§© Model Weights for Towards Atoms of Large Language Models
This repository contains the model weights associated with the paper:
π Towards Atoms of Large Language Models
Specifically, it provides the weights of threshold-activated sparse autoencoders (SAEs) trained on activations across layers of Gemma2-2B, using the CounterFact dataset.
Note that only the model weights are included in this repository.
For complete implementation, including training scripts, data preprocessing, and evaluation pipelines, please refer to the main codebase:
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support