yhamidullah's picture
update readme
cba8814 verified
---
license: apache-2.0
tags:
- summarization
- custom-model
- pegasus
- seq2seq
- huggingface
- transformers
library_name: transformers
inference: false
model-index:
- name: Custom Pegasus Summarizer
results: []
---
# πŸ¦… Custom Pegasus Summarizer
This model is a **custom-wrapped version** of \[`google/pegasus-xsum`\](https://huggingface.co/google/pegasus-xsum) built for **summarization tasks**. It\'s implemented using Hugging Face\'s \`transformers\` library and wrapped with a custom model class for educational and experimental flexibility.
βœ… It supports:
- Easy fine-tuning and extension \(e.g., adapters, prompt tuning\)
- Drop-in replacement for the original model
- Hugging Face Hub compatibility
- Works with \`AutoTokenizer\` and \`CustomSeq2SeqModel\`
---
## 🧠 Model Architecture
- **Base**: google/pegasus-xsum
- **Wrapper**: CustomSeq2SeqModel \(inherits from PreTrainedModel\)
- **Tokenizer**: AutoTokenizer from the same repo
- **Configuration**: CustomSeq2SeqConfig \(inherits from PretrainedConfig\)
---
## πŸ§ͺ Training Details
- **Dataset**: xsum \(500-sample subset\)
- **Task**: Abstractive Summarization
- **Epochs**: 1
- **Batch Size**: 4
- **Learning Rate**: 2e-5
- **Training Framework**: Hugging Face Trainer
---
## πŸ’‘ Usage Example
\`\`\`python
from transformers import AutoTokenizer
from model import CustomSeq2SeqModel # Your custom wrapper
tokenizer = AutoTokenizer.from_pretrained("your-username/custom-pegasus-summarizer")
model = CustomSeq2SeqModel.from_pretrained("your-username/custom-pegasus-summarizer")
text = "summarize: The Apollo program was a major milestone in space exploration..."
inputs = tokenizer(text, return_tensors="pt", truncation=True)
summary_ids = model.generate(**inputs, max_length=60)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
\`\`\`
---
## πŸŽ› Live Demos
You can try this model interactively on Hugging Face Spaces:
- Gradio App: https://huggingface.co/spaces/your-username/custom-pegasus-gradio
- Streamlit App: https://huggingface.co/spaces/your-username/custom-pegasus-streamlit
---
## πŸ“¦ Files Included
- \`config.json\` – Model configuration \(used by \`from_pretrained\`\)
- \`pytorch_model.bin\` – Fine-tuned model weights
- \`tokenizer_config.json\` – Tokenizer settings
- \`vocab.json\` / \`merges.txt\` – Tokenizer vocab \(depends on tokenizer type\)
- \`special_tokens_map.json\` – Special tokens for summarization
- \`README.md\` – This model card
- \`model.py\` – \(if included\) Your \`CustomSeq2SeqModel\` class
---
## πŸ“œ License
Apache 2.0 β€” same license as the original \`pegasus-xsum\`.
---
## πŸ™ Acknowledgments
- Hugging Face for \`transformers\`, \`datasets\`, and \`hub\`
- Authors of PEGASUS
- Educational/Research communities building custom NLP models