Spaces:
Running
Running
# π Speed Optimized Summarization with DistilBART | |
The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings. | |
--- | |
## π Major Speed Optimizations Applied | |
### 1. Faster Model | |
- **Switched from** `facebook/bart-large-cnn` (**~1.6GB**) | |
- **To** `sshleifer/distilbart-cnn-12-6` (**~400MB**) | |
- π₯ **6x smaller model size** = Much faster loading and inference | |
### 2. Processing Optimizations | |
- **Smaller chunks:** 512 words vs 900 (faster processing) | |
- **Limited chunks:** Max 5 chunks processed (prevents hanging on huge docs) | |
- **Faster tokenization:** Word count instead of full tokenization for chunking | |
- **Reduced beam search:** 2 beams instead of 4 (2x faster) | |
### 3. Smart Summarization | |
- **Shorter summaries:** Reduced max lengths across all modes | |
- **Skip final summary:** For documents with β€2 chunks (saves time) | |
- **Early stopping:** Enabled for faster convergence | |
- **Progress tracking:** Shows which chunk is being processed | |
### 4. Memory & Performance | |
- **Float16 precision:** Used when GPU is available (faster inference) | |
- **Optimized pipeline:** Better model loading with fallback | |
- **`optimum` library added:** For additional speed improvements | |
--- | |
## β‘ Expected Speed Improvements | |
| Task | Before | After | | |
|-------------------|----------------------|------------------------------| | |
| Model loading | ~30+ seconds | ~10 seconds | | |
| PDF processing | Minutes | ~5β15 seconds | | |
| Memory usage | ~1.6GB | ~400MB | | |
| Overall speed | Slow | π 5β10x faster | | |
--- | |
## 𧬠What is DistilBART? | |
**DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BARTβs performance. Itβs the result of **model distillation**, where a smaller model (the *student*) learns from a larger one (the *teacher*), in this case, `facebook/bart-large`. | |
| Attribute | Description | | |
|------------------|---------------------------------------------------------------------| | |
| **Full Name** | Distilled BART | | |
| **Base Model** | `facebook/bart-large` | | |
| **Distilled By** | Hugging Face π€ | | |
| **Purpose** | Faster inference and smaller footprint for tasks like summarization | | |
| **Architecture** | Encoder-decoder Transformer, like BART, but with fewer layers | | |
--- | |
## βοΈ Key Differences: BART vs DistilBART | |
| Feature | BART (Large) | DistilBART | | |
|----------------|--------------|------------------------| | |
| Encoder Layers | 12 | 6 | | |
| Decoder Layers | 12 | 6 | | |
| Parameters | ~406M | ~222M | | |
| Model Size | ~1.6GB | ~400MB (~55% smaller) | | |
| Speed | Slower | ~2x faster | | |
| Performance | Very high | Slight drop (~1β2%) | | |
--- | |
## π― Use Cases | |
- β **Text Summarization** (primary use case) | |
- π **Translation** (basic use) | |
- β‘ Ideal for **edge devices** or **real-time systems** where speed & size matter | |
--- | |
## π§ͺ Example: Summarization with DistilBART | |
You can easily use DistilBART with Hugging Face Transformers: | |
```python | |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM | |
# Load pretrained DistilBART model | |
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6") | |
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6") | |
# Input text | |
ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..." | |
# Tokenize and summarize | |
inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True) | |
summary_ids = model.generate( | |
inputs["input_ids"], | |
max_length=150, | |
min_length=40, | |
length_penalty=2.0, | |
num_beams=4, | |
early_stopping=True | |
) | |
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True)) | |
```` | |
--- | |
## π¦ Available Variants | |
| Model Name | Task | Description | | |
| --------------------------------- | ---------------------------- | ---------------------------------------- | | |
| `sshleifer/distilbart-cnn-12-6` | Summarization | Distilled from `facebook/bart-large-cnn` | | |
| `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries | | |
π [Find more on Hugging Face Model Hub](https://huggingface.co/models?search=distilbart) | |
--- | |
## π Summary | |
* π§ **DistilBART** is a distilled, faster version of **BART** | |
* π§© Ideal for summarization tasks with lower memory and latency requirements | |
* π‘ Trained using **knowledge distillation** from `facebook/bart-large` | |
* βοΈ Works well in apps needing faster performance without significant loss in quality | |
--- | |
β **Try it now β it should be significantly faster!** πββοΈπ¨ | |
``` | |
Thank You | |
``` | |