ai-pdf-summarizer / model_change.md
Kiruthick18's picture
Upload model_change.md
df38952 verified
|
raw
history blame
5.46 kB

πŸš€ Speed Optimized Summarization with DistilBART

The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings.


πŸš€ Major Speed Optimizations Applied

1. Faster Model

  • Switched from facebook/bart-large-cnn (~1.6GB)
  • To sshleifer/distilbart-cnn-12-6 (~400MB)
  • πŸ”₯ 6x smaller model size = Much faster loading and inference

2. Processing Optimizations

  • Smaller chunks: 512 words vs 900 (faster processing)
  • Limited chunks: Max 5 chunks processed (prevents hanging on huge docs)
  • Faster tokenization: Word count instead of full tokenization for chunking
  • Reduced beam search: 2 beams instead of 4 (2x faster)

3. Smart Summarization

  • Shorter summaries: Reduced max lengths across all modes
  • Skip final summary: For documents with ≀2 chunks (saves time)
  • Early stopping: Enabled for faster convergence
  • Progress tracking: Shows which chunk is being processed

4. Memory & Performance

  • Float16 precision: Used when GPU is available (faster inference)
  • Optimized pipeline: Better model loading with fallback
  • optimum library added: For additional speed improvements

⚑ Expected Speed Improvements

Task Before After
Model loading ~30+ seconds ~10 seconds
PDF processing Minutes ~5–15 seconds
Memory usage ~1.6GB ~400MB
Overall speed Slow πŸš€ 5–10x faster

🧬 What is DistilBART?

DistilBART is a compressed version of the BART model designed to be lighter and faster while retaining most of BART’s performance. It’s the result of model distillation, where a smaller model (the student) learns from a larger one (the teacher), in this case, facebook/bart-large.

Attribute Description
Full Name Distilled BART
Base Model facebook/bart-large
Distilled By Hugging Face πŸ€—
Purpose Faster inference and smaller footprint for tasks like summarization
Architecture Encoder-decoder Transformer, like BART, but with fewer layers

βš™οΈ Key Differences: BART vs DistilBART

Feature BART (Large) DistilBART
Encoder Layers 12 6
Decoder Layers 12 6
Parameters ~406M ~222M
Model Size ~1.6GB 400MB (55% smaller)
Speed Slower ~2x faster
Performance Very high Slight drop (~1–2%)

🎯 Use Cases

  • βœ… Text Summarization (primary use case)
  • 🌐 Translation (basic use)
  • ⚑ Ideal for edge devices or real-time systems where speed & size matter

πŸ§ͺ Example: Summarization with DistilBART

You can easily use DistilBART with Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load pretrained DistilBART model
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")

# Input text
ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."

# Tokenize and summarize
inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
summary_ids = model.generate(
    inputs["input_ids"],
    max_length=150,
    min_length=40,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True
)

print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))

πŸ“¦ Available Variants

Model Name Task Description
sshleifer/distilbart-cnn-12-6 Summarization Distilled from facebook/bart-large-cnn
philschmid/distilbart-xsum-12-6 Summarization (XSUM dataset) Short, abstractive summaries

πŸ”Ž Find more on Hugging Face Model Hub


πŸ“˜ Summary

  • 🧠 DistilBART is a distilled, faster version of BART
  • 🧩 Ideal for summarization tasks with lower memory and latency requirements
  • πŸ’‘ Trained using knowledge distillation from facebook/bart-large
  • βš™οΈ Works well in apps needing faster performance without significant loss in quality

βœ… Try it now β€” it should be significantly faster! πŸƒβ€β™‚οΈπŸ’¨


Thank You