simple-transformer / README.md
MilindChawre's picture
Adding changes in README
ac5a860
---
title: Simple Transformer
emoji: 🔥
colorFrom: indigo
colorTo: gray
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
short_description: Transformer trained on Shakespeare play dataset
---
# Transformer Model Training
This project implements a transformer-based language model using PyTorch. The model is designed to learn from a text corpus and can be trained and fine-tuned for various natural language processing tasks.
## Table of Contents
- [Features](#features)
- [Requirements](#requirements)
- [Installation](#installation)
- [Usage](#usage)
- [Training](#training)
- [Actual Training](#actual-training)
- [Checkpointing](#checkpointing)
- [Model Compression](#model-compression)
- [Working Demo](#working-demo)
- [License](#license)
- [Acknowledgments](#acknowledgments)
## Features
- Transformer architecture with causal self-attention and feedforward layers.
- Efficient data loading and batching.
- Checkpointing to resume training.
- Support for multiple devices (CPU, CUDA, MPS).
- Model compression for reduced file size.
- Streamlit application for text generation using the trained model.
## Requirements
- Python 3.6 or higher
- PyTorch 1.7 or higher
- tqdm
- tiktoken
- streamlit
- transformers
## Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/transformer-model-training.git
cd transformer-model-training
```
2. Install the required packages:
```bash
pip install -r requirements.txt
```
## Usage
1. Prepare your text data in a file named `input.txt`. The model will read this file to load tokens for training.
2. To train the model, run the training script:
```bash
python train.py
```
3. The model will save checkpoints after each epoch in `checkpoint.pt` and the final model in `trained_model_quantized.pt`.
4. To generate text using the trained model, run the Streamlit application:
```bash
streamlit run app.py
```
5. Enter your text and specify the length of additional text to generate in the Streamlit interface.
## Training
- The model is trained using a batch size of 4 and a learning rate of 3e-4.
- The training loop includes loss calculation, backpropagation, and optimizer steps.
- The loss is monitored, and checkpoints are saved to allow for resuming training.
- The training process is logged in `training.log`, which contains detailed statistics for each epoch, including loss values and checkpointing information.
- The training process supports early stopping if the loss falls below a specified threshold (0.099999), which can help prevent overfitting and reduce training time.
## Actual Training
The model was trained for a total of **91 epochs**. The training process involved the following steps:
- **Data Preparation**: The model reads and encodes text data from `input.txt`, loading a total of **338,025 tokens**.
- **Batch Processing**: Each epoch consists of **82 batches**, with each batch containing sequences of tokens for training.
- **Loss Monitoring**: The loss is calculated at each step, and the model's performance is tracked throughout the training process.
- **Checkpointing**: The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`) after each epoch, allowing for recovery in case of interruptions.
- **Final Model**: After training, the model is saved with quantization and compression as `trained_model_quantized.pt`, reducing the file size for easier deployment. The final loss achieved at the end of training was approximately 0.089421.
The training log file contains detailed statistics for each epoch, including loss values and checkpointing information. You can find the log file named `training.log` in the project directory.
## Checkpointing
- The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`).
- To resume training from the last checkpoint, simply run the training script again. The model will automatically load the latest checkpoint.
## Model Compression
- The final model is saved with compression to reduce file size. The model file will be saved as `trained_model_quantized.pt`.
## Working Demo
You can try out the working demo of the model on Hugging Face Spaces:
![Hugging Face Spaces Demo](https://link-to-your-image.com/demo-image.png)
[Play with the Demo Here](https://huggingface.co/spaces/yourusername/your-demo)
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Acknowledgments
- This project is inspired by the original GPT architecture and various resources available in the NLP community.