Spaces:

MilindChawre
/

simple-transformer

Sleeping

App Files Files Community

simple-transformer / README.md

MilindChawre

Adding changes in README

ac5a860 7 months ago

preview code

raw

history blame contribute delete

4.63 kB

	---
	title: Simple Transformer
	emoji: 🔥
	colorFrom: indigo
	colorTo: gray
	sdk: streamlit
	sdk_version: 1.41.1
	app_file: app.py
	pinned: false
	short_description: Transformer trained on Shakespeare play dataset
	---

	# Transformer Model Training

	This project implements a transformer-based language model using PyTorch. The model is designed to learn from a text corpus and can be trained and fine-tuned for various natural language processing tasks.

	## Table of Contents
	- [Features](#features)
	- [Requirements](#requirements)
	- [Installation](#installation)
	- [Usage](#usage)
	- [Training](#training)
	- [Actual Training](#actual-training)
	- [Checkpointing](#checkpointing)
	- [Model Compression](#model-compression)
	- [Working Demo](#working-demo)
	- [License](#license)
	- [Acknowledgments](#acknowledgments)

	## Features
	- Transformer architecture with causal self-attention and feedforward layers.
	- Efficient data loading and batching.
	- Checkpointing to resume training.
	- Support for multiple devices (CPU, CUDA, MPS).
	- Model compression for reduced file size.
	- Streamlit application for text generation using the trained model.

	## Requirements
	- Python 3.6 or higher
	- PyTorch 1.7 or higher
	- tqdm
	- tiktoken
	- streamlit
	- transformers

	## Installation
	1. Clone the repository:
	```bash
	git clone https://github.com/yourusername/transformer-model-training.git
	cd transformer-model-training
	```

	2. Install the required packages:
	```bash
	pip install -r requirements.txt
	```

	## Usage
	1. Prepare your text data in a file named `input.txt`. The model will read this file to load tokens for training.

	2. To train the model, run the training script:
	```bash
	python train.py
	```

	3. The model will save checkpoints after each epoch in `checkpoint.pt` and the final model in `trained_model_quantized.pt`.

	4. To generate text using the trained model, run the Streamlit application:
	```bash
	streamlit run app.py
	```

	5. Enter your text and specify the length of additional text to generate in the Streamlit interface.

	## Training
	- The model is trained using a batch size of 4 and a learning rate of 3e-4.
	- The training loop includes loss calculation, backpropagation, and optimizer steps.
	- The loss is monitored, and checkpoints are saved to allow for resuming training.
	- The training process is logged in `training.log`, which contains detailed statistics for each epoch, including loss values and checkpointing information.
	- The training process supports early stopping if the loss falls below a specified threshold (0.099999), which can help prevent overfitting and reduce training time.

	## Actual Training
	The model was trained for a total of 91 epochs. The training process involved the following steps:
	- Data Preparation: The model reads and encodes text data from `input.txt`, loading a total of 338,025 tokens.
	- Batch Processing: Each epoch consists of 82 batches, with each batch containing sequences of tokens for training.
	- Loss Monitoring: The loss is calculated at each step, and the model's performance is tracked throughout the training process.
	- Checkpointing: The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`) after each epoch, allowing for recovery in case of interruptions.
	- Final Model: After training, the model is saved with quantization and compression as `trained_model_quantized.pt`, reducing the file size for easier deployment. The final loss achieved at the end of training was approximately 0.089421.

	The training log file contains detailed statistics for each epoch, including loss values and checkpointing information. You can find the log file named `training.log` in the project directory.

	## Checkpointing
	- The model state and current epoch are saved in a single checkpoint file (`checkpoint.pt`).
	- To resume training from the last checkpoint, simply run the training script again. The model will automatically load the latest checkpoint.

	## Model Compression
	- The final model is saved with compression to reduce file size. The model file will be saved as `trained_model_quantized.pt`.

	## Working Demo
	You can try out the working demo of the model on Hugging Face Spaces:

	![Hugging Face Spaces Demo](https://link-to-your-image.com/demo-image.png)

	[Play with the Demo Here](https://huggingface.co/spaces/yourusername/your-demo)

	## License
	This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

	## Acknowledgments
	- This project is inspired by the original GPT architecture and various resources available in the NLP community.