Spaces:

LovnishVerma
/

pdf_summarization

Sleeping

App Files Files Community

LovnishVerma commited on May 31, 2025

Commit

9bdfb59

verified ·

1 Parent(s): 2afbff9

Rename model_change.txt to model_change.md

Browse files

Files changed (1) hide show

model_change.txt → model_change.md +69 -50

model_change.txt → model_change.md RENAMED Viewed

@@ -1,50 +1,52 @@
-The BART model is quite large (1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings..
-🚀 Major Speed Optimizations Applied:
-1. Faster Model
-Switched from facebook/bart-large-cnn (1.6GB) to sshleifer/distilbart-cnn-12-6 (~400MB)
-6x smaller model size = Much faster loading and inference
-2. Processing Optimizations
-Smaller chunks: 512 words vs 900 (faster processing)
-Limited chunks: Max 5 chunks processed (prevents hanging on huge docs)
-Faster tokenization: Word count instead of full tokenization for chunking
-Reduced beam search: 2 beams instead of 4 (2x faster)
-3. Smart Summarization
-Shorter summaries: Reduced max lengths across all modes
-Skip final summary: For documents with ≤2 chunks (saves time)
-Early stopping: Enabled for faster convergence
-Progress tracking: Shows which chunk is being processed
-4. Memory & Performance
-Float16 precision: When GPU available (faster inference)
-Optimized pipeline: Better model loading with fallback
-Added optimum library: For additional speed improvements
-⚡ Expected Speed Improvements:
-Model loading: ~10 seconds instead of 30+ seconds
-Processing: ~5-15 seconds per PDF instead of minutes
-Memory usage: ~400MB instead of 1.6GB
-Overall: 5-10x faster while maintaining good quality!
-The app should now be much faster while still providing quality summaries. The DistilBART model is specifically designed to be fast while maintaining most of the quality of the larger BART model.
-Try it now - it should be significantly faster! 🏃‍♂️💨
-**DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BART’s performance. It’s the result of **model distillation**, a process where a smaller model (the "student") learns to mimic a larger, more powerful model (the "teacher")—in this case, Facebook's **BART-Large**.
 ---
 ## 🧬 What is DistilBART?
 | Attribute        | Description                                                         |
-| ---------------- | ------------------------------------------------------------------- |
 | **Full Name**    | Distilled BART                                                      |
 | **Base Model**   | `facebook/bart-large`                                               |
 | **Distilled By** | Hugging Face 🤗                                                     |
@@ -56,43 +58,51 @@ Try it now - it should be significantly faster! 🏃‍♂️💨
 ## ⚙️ Key Differences: BART vs DistilBART
 | Feature        | BART (Large) | DistilBART            |
-| -------------- | ------------ | --------------------- |
-| Encoder Layers | 12           | 6                     |
-| Decoder Layers | 12           | 6                     |
-| Parameters     | \~406M       | \~222M                |
-| Size           | Large        | Smaller (\~55% size)  |
-| Speed          | Slower       | \~2x faster inference |
-| Performance    | Very high    | Slight drop (\~1-2%)  |
 ---
 ## 🎯 Use Cases
-* **Text Summarization** (main use case)
-* **Translation** (to some extent)
-* Ideal for **edge devices** or real-time systems where speed and size matter.
 ---
 ## 🧪 Example: Summarization with DistilBART
-You can easily use DistilBART with Hugging Face Transformers.
 ```python
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-# Load pretrained DistilBART model for summarization
 tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
 model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
-# Example input
 ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."
 # Tokenize and summarize
 inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
-summary_ids = model.generate(inputs["input_ids"], max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
 print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
-```
 ---
@@ -103,13 +113,22 @@ print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
 | `sshleifer/distilbart-cnn-12-6`   | Summarization                | Distilled from `facebook/bart-large-cnn` |
 | `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries             |
-You can find more on the [Hugging Face Model Hub](https://huggingface.co/models?search=distilbart).
 ---
 ## 📘 Summary
-* **DistilBART** is a lightweight, faster version of **BART**, optimized for speed and memory usage.
-* Great for tasks like **summarization** with minimal performance loss.
-* Trained via **knowledge distillation** from `facebook/bart-large`.

+# 🚀 Speed Optimized Summarization with DistilBART
+The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings.
+---
+## 🚀 Major Speed Optimizations Applied
+### 1. Faster Model
+- **Switched from** `facebook/bart-large-cnn` (**~1.6GB**)
+- **To** `sshleifer/distilbart-cnn-12-6` (**~400MB**)
+- 🔥 **6x smaller model size** = Much faster loading and inference
+### 2. Processing Optimizations
+- **Smaller chunks:** 512 words vs 900 (faster processing)
+- **Limited chunks:** Max 5 chunks processed (prevents hanging on huge docs)
+- **Faster tokenization:** Word count instead of full tokenization for chunking
+- **Reduced beam search:** 2 beams instead of 4 (2x faster)
+### 3. Smart Summarization
+- **Shorter summaries:** Reduced max lengths across all modes
+- **Skip final summary:** For documents with ≤2 chunks (saves time)
+- **Early stopping:** Enabled for faster convergence
+- **Progress tracking:** Shows which chunk is being processed
+### 4. Memory & Performance
+- **Float16 precision:** Used when GPU is available (faster inference)
+- **Optimized pipeline:** Better model loading with fallback
+- **`optimum` library added:** For additional speed improvements
+---
+## ⚡ Expected Speed Improvements
+| Task              | Before               | After                        |
+|-------------------|----------------------|------------------------------|
+| Model loading     | ~30+ seconds         | ~10 seconds                  |
+| PDF processing    | Minutes              | ~5–15 seconds                |
+| Memory usage      | ~1.6GB               | ~400MB                       |
+| Overall speed     | Slow                 | 🚀 5–10x faster              |
 ---
 ## 🧬 What is DistilBART?
+**DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BART’s performance. It’s the result of **model distillation**, where a smaller model (the *student*) learns from a larger one (the *teacher*), in this case, `facebook/bart-large`.
 | Attribute        | Description                                                         |
+|------------------|---------------------------------------------------------------------|
 | **Full Name**    | Distilled BART                                                      |
 | **Base Model**   | `facebook/bart-large`                                               |
 | **Distilled By** | Hugging Face 🤗                                                     |
 ## ⚙️ Key Differences: BART vs DistilBART
 | Feature        | BART (Large) | DistilBART            |
+|----------------|--------------|------------------------|
+| Encoder Layers | 12           | 6                      |
+| Decoder Layers | 12           | 6                      |
+| Parameters     | ~406M        | ~222M                  |
+| Model Size     | ~1.6GB       | ~400MB (~55% smaller)  |
+| Speed          | Slower       | ~2x faster             |
+| Performance    | Very high    | Slight drop (~1–2%)    |
 ---
 ## 🎯 Use Cases
+- ✅ **Text Summarization** (primary use case)
+- 🌐 **Translation** (basic use)
+- ⚡ Ideal for **edge devices** or **real-time systems** where speed & size matter
 ---
 ## 🧪 Example: Summarization with DistilBART
+You can easily use DistilBART with Hugging Face Transformers:
 ```python
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+# Load pretrained DistilBART model
 tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
 model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
+# Input text
 ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."
 # Tokenize and summarize
 inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
+summary_ids = model.generate(
+    inputs["input_ids"],
+    max_length=150,
+    min_length=40,
+    length_penalty=2.0,
+    num_beams=4,
+    early_stopping=True
+)
 print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
+````
 ---
 | `sshleifer/distilbart-cnn-12-6`   | Summarization                | Distilled from `facebook/bart-large-cnn` |
 | `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries             |
+🔎 [Find more on Hugging Face Model Hub](https://huggingface.co/models?search=distilbart)
 ---
 ## 📘 Summary
+* 🧠 **DistilBART** is a distilled, faster version of **BART**
+* 🧩 Ideal for summarization tasks with lower memory and latency requirements
+* 💡 Trained using **knowledge distillation** from `facebook/bart-large`
+* ⚙️ Works well in apps needing faster performance without significant loss in quality
+---
+✅ **Try it now — it should be significantly faster!** 🏃‍♂️💨
+```
+Thank You
+```