File size: 2,883 Bytes
959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 959402b cba8814 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
license: apache-2.0
tags:
- summarization
- custom-model
- pegasus
- seq2seq
- huggingface
- transformers
library_name: transformers
inference: false
model-index:
- name: Custom Pegasus Summarizer
results: []
---
# π¦
Custom Pegasus Summarizer
This model is a **custom-wrapped version** of \[`google/pegasus-xsum`\](https://huggingface.co/google/pegasus-xsum) built for **summarization tasks**. It\'s implemented using Hugging Face\'s \`transformers\` library and wrapped with a custom model class for educational and experimental flexibility.
β
It supports:
- Easy fine-tuning and extension \(e.g., adapters, prompt tuning\)
- Drop-in replacement for the original model
- Hugging Face Hub compatibility
- Works with \`AutoTokenizer\` and \`CustomSeq2SeqModel\`
---
## π§ Model Architecture
- **Base**: google/pegasus-xsum
- **Wrapper**: CustomSeq2SeqModel \(inherits from PreTrainedModel\)
- **Tokenizer**: AutoTokenizer from the same repo
- **Configuration**: CustomSeq2SeqConfig \(inherits from PretrainedConfig\)
---
## π§ͺ Training Details
- **Dataset**: xsum \(500-sample subset\)
- **Task**: Abstractive Summarization
- **Epochs**: 1
- **Batch Size**: 4
- **Learning Rate**: 2e-5
- **Training Framework**: Hugging Face Trainer
---
## π‘ Usage Example
\`\`\`python
from transformers import AutoTokenizer
from model import CustomSeq2SeqModel # Your custom wrapper
tokenizer = AutoTokenizer.from_pretrained("your-username/custom-pegasus-summarizer")
model = CustomSeq2SeqModel.from_pretrained("your-username/custom-pegasus-summarizer")
text = "summarize: The Apollo program was a major milestone in space exploration..."
inputs = tokenizer(text, return_tensors="pt", truncation=True)
summary_ids = model.generate(**inputs, max_length=60)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
\`\`\`
---
## π Live Demos
You can try this model interactively on Hugging Face Spaces:
- Gradio App: https://huggingface.co/spaces/your-username/custom-pegasus-gradio
- Streamlit App: https://huggingface.co/spaces/your-username/custom-pegasus-streamlit
---
## π¦ Files Included
- \`config.json\` β Model configuration \(used by \`from_pretrained\`\)
- \`pytorch_model.bin\` β Fine-tuned model weights
- \`tokenizer_config.json\` β Tokenizer settings
- \`vocab.json\` / \`merges.txt\` β Tokenizer vocab \(depends on tokenizer type\)
- \`special_tokens_map.json\` β Special tokens for summarization
- \`README.md\` β This model card
- \`model.py\` β \(if included\) Your \`CustomSeq2SeqModel\` class
---
## π License
Apache 2.0 β same license as the original \`pegasus-xsum\`.
---
## π Acknowledgments
- Hugging Face for \`transformers\`, \`datasets\`, and \`hub\`
- Authors of PEGASUS
- Educational/Research communities building custom NLP models
|