|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- alexl83/AlpacaDataCleaned |
|
- sahil2801/CodeAlpaca-20k |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- rlhf |
|
- alignment |
|
- simulation |
|
- computational social science |
|
--- |
|
|
|
|
|
# Model Card for So(cially)-Good LM |
|
|
|
 |
|
|
|
 |
|
|
|
|
|
**Fast, Effective, and Stable alternative of RLHF!** |
|
|
|
**Instead of training an additional reward model that is likely to be gamed, we directly train the model on the social games!** 🕹️ 🎲 🎮 |
|
|
|
Full details on simulation and training can be found [here](https://github.com/agi-templar/Stable-Alignment). |
|
|
|
# Training Procedure |
|
|
|
This is the very beginning of Stable Alignment project, which is an enhanced instruction tuning model based on LLaMA. |
|
|
|
We improve: |
|
|
|
- Instruction tuning data quality, by using [AlpacaDataCleaned](https://github.com/gururise/AlpacaDataCleaned), which fixes many errors in original Alpaca dataset. |
|
|
|
- Code pretraining with [codealpaca](https://github.com/sahil280114/codealpaca). |
|
|
|
We use the [Alpaca fine-tuning script](https://github.com/tatsu-lab/stanford_alpaca) to train this model. |
|
|
|
|
|
# Bias, Risks, and Limitations |
|
|
|
Although this project aims to better align current LMs with social norms, inappropriate content and inherent biases in the training data will still impair the alignment of the model. |
|
|
|
The model should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application. |