🧩 Model Weights for Towards Atoms of Large Language Models

arXiv GitHub Hugging Face

This repository contains the model weights associated with the paper:

πŸ‘‰ Towards Atoms of Large Language Models

Specifically, it provides the weights of threshold-activated sparse autoencoders (SAEs) trained on activations across layers of Gemma2-2B, using the CounterFact dataset.

Note that only the model weights are included in this repository.

For complete implementation, including training scripts, data preprocessing, and evaluation pipelines, please refer to the main codebase:

πŸ‘‰ https://github.com/ChenhuiHu/towards_atoms

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support