Spaces:

Tulika2000
/

ai-pdf-summarizer

Running

File size: 5,333 Bytes

47c04ef

# 🚀 Speed Optimized Summarization with DistilBART

The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings.

---

## 🚀 Major Speed Optimizations Applied

### 1. Faster Model
- **Switched from** `facebook/bart-large-cnn` (**~1.6GB**)  
- **To** `sshleifer/distilbart-cnn-12-6` (**~400MB**)  
- 🔥 **6x smaller model size** = Much faster loading and inference

### 2. Processing Optimizations
- **Smaller chunks:** 512 words vs 900 (faster processing)
- **Limited chunks:** Max 5 chunks processed (prevents hanging on huge docs)
- **Faster tokenization:** Word count instead of full tokenization for chunking
- **Reduced beam search:** 2 beams instead of 4 (2x faster)

### 3. Smart Summarization
- **Shorter summaries:** Reduced max lengths across all modes
- **Skip final summary:** For documents with ≤2 chunks (saves time)
- **Early stopping:** Enabled for faster convergence
- **Progress tracking:** Shows which chunk is being processed

### 4. Memory & Performance
- **Float16 precision:** Used when GPU is available (faster inference)
- **Optimized pipeline:** Better model loading with fallback
- **`optimum` library added:** For additional speed improvements

---

## ⚡ Expected Speed Improvements

| Task              | Before               | After                        |
|-------------------|----------------------|------------------------------|
| Model loading     | ~30+ seconds         | ~10 seconds                  |
| PDF processing    | Minutes              | ~5–15 seconds                |
| Memory usage      | ~1.6GB               | ~400MB                       |
| Overall speed     | Slow                 | 🚀 5–10x faster              |

---

## 🧬 What is DistilBART?

**DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BART’s performance. It’s the result of **model distillation**, where a smaller model (the *student*) learns from a larger one (the *teacher*), in this case, `facebook/bart-large`.

| Attribute        | Description                                                         |
|------------------|---------------------------------------------------------------------|
| **Full Name**    | Distilled BART                                                      |
| **Base Model**   | `facebook/bart-large`                                               |
| **Distilled By** | Hugging Face 🤗                                                     |
| **Purpose**      | Faster inference and smaller footprint for tasks like summarization |
| **Architecture** | Encoder-decoder Transformer, like BART, but with fewer layers       |

---

## ⚙️ Key Differences: BART vs DistilBART

| Feature        | BART (Large) | DistilBART            |
|----------------|--------------|------------------------|
| Encoder Layers | 12           | 6                      |
| Decoder Layers | 12           | 6                      |
| Parameters     | ~406M        | ~222M                  |
| Model Size     | ~1.6GB       | ~400MB (~55% smaller)  |
| Speed          | Slower       | ~2x faster             |
| Performance    | Very high    | Slight drop (~1–2%)    |

---

## 🎯 Use Cases

- ✅ **Text Summarization** (primary use case)
- 🌐 **Translation** (basic use)
- ⚡ Ideal for **edge devices** or **real-time systems** where speed & size matter

---

## 🧪 Example: Summarization with DistilBART

You can easily use DistilBART with Hugging Face Transformers:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load pretrained DistilBART model
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")

# Input text
ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."

# Tokenize and summarize
inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
summary_ids = model.generate(
    inputs["input_ids"],
    max_length=150,
    min_length=40,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True
)

print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
````

---

## 📦 Available Variants

| Model Name                        | Task                         | Description                              |
| --------------------------------- | ---------------------------- | ---------------------------------------- |
| `sshleifer/distilbart-cnn-12-6`   | Summarization                | Distilled from `facebook/bart-large-cnn` |
| `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries             |

🔎 [Find more on Hugging Face Model Hub](https://huggingface.co/models?search=distilbart)

---

## 📘 Summary

* 🧠 **DistilBART** is a distilled, faster version of **BART**
* 🧩 Ideal for summarization tasks with lower memory and latency requirements
* 💡 Trained using **knowledge distillation** from `facebook/bart-large`
* ⚙️ Works well in apps needing faster performance without significant loss in quality

---

✅ **Try it now — it should be significantly faster!** 🏃‍♂️💨

```




Thank You
```