Spaces:
Running
Running
π Speed Optimized Summarization with DistilBART
The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings.
π Major Speed Optimizations Applied
1. Faster Model
- Switched from
facebook/bart-large-cnn
(~1.6GB) - To
sshleifer/distilbart-cnn-12-6
(~400MB) - π₯ 6x smaller model size = Much faster loading and inference
2. Processing Optimizations
- Smaller chunks: 512 words vs 900 (faster processing)
- Limited chunks: Max 5 chunks processed (prevents hanging on huge docs)
- Faster tokenization: Word count instead of full tokenization for chunking
- Reduced beam search: 2 beams instead of 4 (2x faster)
3. Smart Summarization
- Shorter summaries: Reduced max lengths across all modes
- Skip final summary: For documents with β€2 chunks (saves time)
- Early stopping: Enabled for faster convergence
- Progress tracking: Shows which chunk is being processed
4. Memory & Performance
- Float16 precision: Used when GPU is available (faster inference)
- Optimized pipeline: Better model loading with fallback
optimum
library added: For additional speed improvements
β‘ Expected Speed Improvements
Task | Before | After |
---|---|---|
Model loading | ~30+ seconds | ~10 seconds |
PDF processing | Minutes | ~5β15 seconds |
Memory usage | ~1.6GB | ~400MB |
Overall speed | Slow | π 5β10x faster |
𧬠What is DistilBART?
DistilBART is a compressed version of the BART model designed to be lighter and faster while retaining most of BARTβs performance. Itβs the result of model distillation, where a smaller model (the student) learns from a larger one (the teacher), in this case, facebook/bart-large
.
Attribute | Description |
---|---|
Full Name | Distilled BART |
Base Model | facebook/bart-large |
Distilled By | Hugging Face π€ |
Purpose | Faster inference and smaller footprint for tasks like summarization |
Architecture | Encoder-decoder Transformer, like BART, but with fewer layers |
βοΈ Key Differences: BART vs DistilBART
Feature | BART (Large) | DistilBART |
---|---|---|
Encoder Layers | 12 | 6 |
Decoder Layers | 12 | 6 |
Parameters | ~406M | ~222M |
Model Size | ~1.6GB | |
Speed | Slower | ~2x faster |
Performance | Very high | Slight drop (~1β2%) |
π― Use Cases
- β Text Summarization (primary use case)
- π Translation (basic use)
- β‘ Ideal for edge devices or real-time systems where speed & size matter
π§ͺ Example: Summarization with DistilBART
You can easily use DistilBART with Hugging Face Transformers:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load pretrained DistilBART model
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
# Input text
ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."
# Tokenize and summarize
inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
summary_ids = model.generate(
inputs["input_ids"],
max_length=150,
min_length=40,
length_penalty=2.0,
num_beams=4,
early_stopping=True
)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
π¦ Available Variants
Model Name | Task | Description |
---|---|---|
sshleifer/distilbart-cnn-12-6 |
Summarization | Distilled from facebook/bart-large-cnn |
philschmid/distilbart-xsum-12-6 |
Summarization (XSUM dataset) | Short, abstractive summaries |
π Find more on Hugging Face Model Hub
π Summary
- π§ DistilBART is a distilled, faster version of BART
- π§© Ideal for summarization tasks with lower memory and latency requirements
- π‘ Trained using knowledge distillation from
facebook/bart-large
- βοΈ Works well in apps needing faster performance without significant loss in quality
β Try it now β it should be significantly faster! πββοΈπ¨
Thank You