# πŸš€ Speed Optimized Summarization with DistilBART The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings. --- ## πŸš€ Major Speed Optimizations Applied ### 1. Faster Model - **Switched from** `facebook/bart-large-cnn` (**~1.6GB**) - **To** `sshleifer/distilbart-cnn-12-6` (**~400MB**) - πŸ”₯ **6x smaller model size** = Much faster loading and inference ### 2. Processing Optimizations - **Smaller chunks:** 512 words vs 900 (faster processing) - **Limited chunks:** Max 5 chunks processed (prevents hanging on huge docs) - **Faster tokenization:** Word count instead of full tokenization for chunking - **Reduced beam search:** 2 beams instead of 4 (2x faster) ### 3. Smart Summarization - **Shorter summaries:** Reduced max lengths across all modes - **Skip final summary:** For documents with ≀2 chunks (saves time) - **Early stopping:** Enabled for faster convergence - **Progress tracking:** Shows which chunk is being processed ### 4. Memory & Performance - **Float16 precision:** Used when GPU is available (faster inference) - **Optimized pipeline:** Better model loading with fallback - **`optimum` library added:** For additional speed improvements --- ## ⚑ Expected Speed Improvements | Task | Before | After | |-------------------|----------------------|------------------------------| | Model loading | ~30+ seconds | ~10 seconds | | PDF processing | Minutes | ~5–15 seconds | | Memory usage | ~1.6GB | ~400MB | | Overall speed | Slow | πŸš€ 5–10x faster | --- ## 🧬 What is DistilBART? **DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BART’s performance. It’s the result of **model distillation**, where a smaller model (the *student*) learns from a larger one (the *teacher*), in this case, `facebook/bart-large`. | Attribute | Description | |------------------|---------------------------------------------------------------------| | **Full Name** | Distilled BART | | **Base Model** | `facebook/bart-large` | | **Distilled By** | Hugging Face πŸ€— | | **Purpose** | Faster inference and smaller footprint for tasks like summarization | | **Architecture** | Encoder-decoder Transformer, like BART, but with fewer layers | --- ## βš™οΈ Key Differences: BART vs DistilBART | Feature | BART (Large) | DistilBART | |----------------|--------------|------------------------| | Encoder Layers | 12 | 6 | | Decoder Layers | 12 | 6 | | Parameters | ~406M | ~222M | | Model Size | ~1.6GB | ~400MB (~55% smaller) | | Speed | Slower | ~2x faster | | Performance | Very high | Slight drop (~1–2%) | --- ## 🎯 Use Cases - βœ… **Text Summarization** (primary use case) - 🌐 **Translation** (basic use) - ⚑ Ideal for **edge devices** or **real-time systems** where speed & size matter --- ## πŸ§ͺ Example: Summarization with DistilBART You can easily use DistilBART with Hugging Face Transformers: ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Load pretrained DistilBART model tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6") model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6") # Input text ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..." # Tokenize and summarize inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True) summary_ids = model.generate( inputs["input_ids"], max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True ) print(tokenizer.decode(summary_ids[0], skip_special_tokens=True)) ```` --- ## πŸ“¦ Available Variants | Model Name | Task | Description | | --------------------------------- | ---------------------------- | ---------------------------------------- | | `sshleifer/distilbart-cnn-12-6` | Summarization | Distilled from `facebook/bart-large-cnn` | | `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries | πŸ”Ž [Find more on Hugging Face Model Hub](https://huggingface.co/models?search=distilbart) --- ## πŸ“˜ Summary * 🧠 **DistilBART** is a distilled, faster version of **BART** * 🧩 Ideal for summarization tasks with lower memory and latency requirements * πŸ’‘ Trained using **knowledge distillation** from `facebook/bart-large` * βš™οΈ Works well in apps needing faster performance without significant loss in quality --- βœ… **Try it now β€” it should be significantly faster!** πŸƒβ€β™‚οΈπŸ’¨ ``` Thank You ```