metadata

license: apache-2.0
tags:
  - summarization
  - custom-model
  - pegasus
  - seq2seq
  - huggingface
  - transformers
library_name: transformers
inference: false
model-index:
  - name: Custom Pegasus Summarizer
    results: []

🦅 Custom Pegasus Summarizer

This model is a custom-wrapped version of [google/pegasus-xsum](https://huggingface.co/google/pegasus-xsum) built for summarization tasks. It's implemented using Hugging Face's `transformers` library and wrapped with a custom model class for educational and experimental flexibility.

✅ It supports:

Easy fine-tuning and extension (e.g., adapters, prompt tuning)
Drop-in replacement for the original model
Hugging Face Hub compatibility
Works with `AutoTokenizer` and `CustomSeq2SeqModel`

🧠 Model Architecture

Base: google/pegasus-xsum
Wrapper: CustomSeq2SeqModel (inherits from PreTrainedModel)
Tokenizer: AutoTokenizer from the same repo
Configuration: CustomSeq2SeqConfig (inherits from PretrainedConfig)

🧪 Training Details

Dataset: xsum (500-sample subset)
Task: Abstractive Summarization
Epochs: 1
Batch Size: 4
Learning Rate: 2e-5
Training Framework: Hugging Face Trainer

💡 Usage Example

```python from transformers import AutoTokenizer from model import CustomSeq2SeqModel # Your custom wrapper

tokenizer = AutoTokenizer.from_pretrained("your-username/custom-pegasus-summarizer") model = CustomSeq2SeqModel.from_pretrained("your-username/custom-pegasus-summarizer")

text = "summarize: The Apollo program was a major milestone in space exploration..." inputs = tokenizer(text, return_tensors="pt", truncation=True) summary_ids = model.generate(**inputs, max_length=60) print(tokenizer.decode(summary_ids[0], skip_special_tokens=True)) ```

🎛 Live Demos

You can try this model interactively on Hugging Face Spaces:

Gradio App: https://huggingface.co/spaces/your-username/custom-pegasus-gradio
Streamlit App: https://huggingface.co/spaces/your-username/custom-pegasus-streamlit

📦 Files Included

`config.json` – Model configuration (used by `from_pretrained`)
`pytorch_model.bin` – Fine-tuned model weights
`tokenizer_config.json` – Tokenizer settings
`vocab.json` / `merges.txt` – Tokenizer vocab (depends on tokenizer type)
`special_tokens_map.json` – Special tokens for summarization
`README.md` – This model card
`model.py` – (if included) Your `CustomSeq2SeqModel` class

📜 License

Apache 2.0 — same license as the original `pegasus-xsum`.

🙏 Acknowledgments

Hugging Face for `transformers`, `datasets`, and `hub`
Authors of PEGASUS
Educational/Research communities building custom NLP models