Spaces:
Running
Running
File size: 5,333 Bytes
47c04ef |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# π Speed Optimized Summarization with DistilBART
The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings.
---
## π Major Speed Optimizations Applied
### 1. Faster Model
- **Switched from** `facebook/bart-large-cnn` (**~1.6GB**)
- **To** `sshleifer/distilbart-cnn-12-6` (**~400MB**)
- π₯ **6x smaller model size** = Much faster loading and inference
### 2. Processing Optimizations
- **Smaller chunks:** 512 words vs 900 (faster processing)
- **Limited chunks:** Max 5 chunks processed (prevents hanging on huge docs)
- **Faster tokenization:** Word count instead of full tokenization for chunking
- **Reduced beam search:** 2 beams instead of 4 (2x faster)
### 3. Smart Summarization
- **Shorter summaries:** Reduced max lengths across all modes
- **Skip final summary:** For documents with β€2 chunks (saves time)
- **Early stopping:** Enabled for faster convergence
- **Progress tracking:** Shows which chunk is being processed
### 4. Memory & Performance
- **Float16 precision:** Used when GPU is available (faster inference)
- **Optimized pipeline:** Better model loading with fallback
- **`optimum` library added:** For additional speed improvements
---
## β‘ Expected Speed Improvements
| Task | Before | After |
|-------------------|----------------------|------------------------------|
| Model loading | ~30+ seconds | ~10 seconds |
| PDF processing | Minutes | ~5β15 seconds |
| Memory usage | ~1.6GB | ~400MB |
| Overall speed | Slow | π 5β10x faster |
---
## 𧬠What is DistilBART?
**DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BARTβs performance. Itβs the result of **model distillation**, where a smaller model (the *student*) learns from a larger one (the *teacher*), in this case, `facebook/bart-large`.
| Attribute | Description |
|------------------|---------------------------------------------------------------------|
| **Full Name** | Distilled BART |
| **Base Model** | `facebook/bart-large` |
| **Distilled By** | Hugging Face π€ |
| **Purpose** | Faster inference and smaller footprint for tasks like summarization |
| **Architecture** | Encoder-decoder Transformer, like BART, but with fewer layers |
---
## βοΈ Key Differences: BART vs DistilBART
| Feature | BART (Large) | DistilBART |
|----------------|--------------|------------------------|
| Encoder Layers | 12 | 6 |
| Decoder Layers | 12 | 6 |
| Parameters | ~406M | ~222M |
| Model Size | ~1.6GB | ~400MB (~55% smaller) |
| Speed | Slower | ~2x faster |
| Performance | Very high | Slight drop (~1β2%) |
---
## π― Use Cases
- β
**Text Summarization** (primary use case)
- π **Translation** (basic use)
- β‘ Ideal for **edge devices** or **real-time systems** where speed & size matter
---
## π§ͺ Example: Summarization with DistilBART
You can easily use DistilBART with Hugging Face Transformers:
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load pretrained DistilBART model
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
# Input text
ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."
# Tokenize and summarize
inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
summary_ids = model.generate(
inputs["input_ids"],
max_length=150,
min_length=40,
length_penalty=2.0,
num_beams=4,
early_stopping=True
)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
````
---
## π¦ Available Variants
| Model Name | Task | Description |
| --------------------------------- | ---------------------------- | ---------------------------------------- |
| `sshleifer/distilbart-cnn-12-6` | Summarization | Distilled from `facebook/bart-large-cnn` |
| `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries |
π [Find more on Hugging Face Model Hub](https://huggingface.co/models?search=distilbart)
---
## π Summary
* π§ **DistilBART** is a distilled, faster version of **BART**
* π§© Ideal for summarization tasks with lower memory and latency requirements
* π‘ Trained using **knowledge distillation** from `facebook/bart-large`
* βοΈ Works well in apps needing faster performance without significant loss in quality
---
β
**Try it now β it should be significantly faster!** πββοΈπ¨
```
Thank You
```
|