LovnishVerma commited on
Commit
9bdfb59
Β·
verified Β·
1 Parent(s): 2afbff9

Rename model_change.txt to model_change.md

Browse files
model_change.txt β†’ model_change.md RENAMED
@@ -1,50 +1,52 @@
1
- The BART model is quite large (1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings..
2
 
3
- πŸš€ Major Speed Optimizations Applied:
4
- 1. Faster Model
5
 
6
- Switched from facebook/bart-large-cnn (1.6GB) to sshleifer/distilbart-cnn-12-6 (~400MB)
7
- 6x smaller model size = Much faster loading and inference
8
-
9
- 2. Processing Optimizations
10
-
11
- Smaller chunks: 512 words vs 900 (faster processing)
12
- Limited chunks: Max 5 chunks processed (prevents hanging on huge docs)
13
- Faster tokenization: Word count instead of full tokenization for chunking
14
- Reduced beam search: 2 beams instead of 4 (2x faster)
15
-
16
- 3. Smart Summarization
17
 
18
- Shorter summaries: Reduced max lengths across all modes
19
- Skip final summary: For documents with ≀2 chunks (saves time)
20
- Early stopping: Enabled for faster convergence
21
- Progress tracking: Shows which chunk is being processed
22
 
23
- 4. Memory & Performance
 
 
 
24
 
25
- Float16 precision: When GPU available (faster inference)
26
- Optimized pipeline: Better model loading with fallback
27
- Added optimum library: For additional speed improvements
 
 
28
 
29
- ⚑ Expected Speed Improvements:
 
 
 
 
30
 
31
- Model loading: ~10 seconds instead of 30+ seconds
32
- Processing: ~5-15 seconds per PDF instead of minutes
33
- Memory usage: ~400MB instead of 1.6GB
34
- Overall: 5-10x faster while maintaining good quality!
35
 
36
- The app should now be much faster while still providing quality summaries. The DistilBART model is specifically designed to be fast while maintaining most of the quality of the larger BART model.
37
- Try it now - it should be significantly faster! πŸƒβ€β™‚οΈπŸ’¨
38
 
 
39
 
40
- **DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BART’s performance. It’s the result of **model distillation**, a process where a smaller model (the "student") learns to mimic a larger, more powerful model (the "teacher")β€”in this case, Facebook's **BART-Large**.
 
 
 
 
 
41
 
42
  ---
43
 
44
  ## 🧬 What is DistilBART?
45
 
 
 
46
  | Attribute | Description |
47
- | ---------------- | ------------------------------------------------------------------- |
48
  | **Full Name** | Distilled BART |
49
  | **Base Model** | `facebook/bart-large` |
50
  | **Distilled By** | Hugging Face πŸ€— |
@@ -56,43 +58,51 @@ Try it now - it should be significantly faster! πŸƒβ€β™‚οΈπŸ’¨
56
  ## βš™οΈ Key Differences: BART vs DistilBART
57
 
58
  | Feature | BART (Large) | DistilBART |
59
- | -------------- | ------------ | --------------------- |
60
- | Encoder Layers | 12 | 6 |
61
- | Decoder Layers | 12 | 6 |
62
- | Parameters | \~406M | \~222M |
63
- | Size | Large | Smaller (\~55% size) |
64
- | Speed | Slower | \~2x faster inference |
65
- | Performance | Very high | Slight drop (\~1-2%) |
66
 
67
  ---
68
 
69
  ## 🎯 Use Cases
70
 
71
- * **Text Summarization** (main use case)
72
- * **Translation** (to some extent)
73
- * Ideal for **edge devices** or real-time systems where speed and size matter.
74
 
75
  ---
76
 
77
  ## πŸ§ͺ Example: Summarization with DistilBART
78
 
79
- You can easily use DistilBART with Hugging Face Transformers.
80
 
81
  ```python
82
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
83
 
84
- # Load pretrained DistilBART model for summarization
85
  tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
86
  model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
87
 
88
- # Example input
89
  ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."
90
 
91
  # Tokenize and summarize
92
  inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
93
- summary_ids = model.generate(inputs["input_ids"], max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
 
 
 
 
 
 
 
 
94
  print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
95
- ```
96
 
97
  ---
98
 
@@ -103,13 +113,22 @@ print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
103
  | `sshleifer/distilbart-cnn-12-6` | Summarization | Distilled from `facebook/bart-large-cnn` |
104
  | `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries |
105
 
106
- You can find more on the [Hugging Face Model Hub](https://huggingface.co/models?search=distilbart).
107
 
108
  ---
109
 
110
  ## πŸ“˜ Summary
111
 
112
- * **DistilBART** is a lightweight, faster version of **BART**, optimized for speed and memory usage.
113
- * Great for tasks like **summarization** with minimal performance loss.
114
- * Trained via **knowledge distillation** from `facebook/bart-large`.
 
 
 
 
 
115
 
 
 
 
 
 
1
+ # πŸš€ Speed Optimized Summarization with DistilBART
2
 
3
+ The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings.
 
4
 
5
+ ---
 
 
 
 
 
 
 
 
 
 
6
 
7
+ ## πŸš€ Major Speed Optimizations Applied
 
 
 
8
 
9
+ ### 1. Faster Model
10
+ - **Switched from** `facebook/bart-large-cnn` (**~1.6GB**)
11
+ - **To** `sshleifer/distilbart-cnn-12-6` (**~400MB**)
12
+ - πŸ”₯ **6x smaller model size** = Much faster loading and inference
13
 
14
+ ### 2. Processing Optimizations
15
+ - **Smaller chunks:** 512 words vs 900 (faster processing)
16
+ - **Limited chunks:** Max 5 chunks processed (prevents hanging on huge docs)
17
+ - **Faster tokenization:** Word count instead of full tokenization for chunking
18
+ - **Reduced beam search:** 2 beams instead of 4 (2x faster)
19
 
20
+ ### 3. Smart Summarization
21
+ - **Shorter summaries:** Reduced max lengths across all modes
22
+ - **Skip final summary:** For documents with ≀2 chunks (saves time)
23
+ - **Early stopping:** Enabled for faster convergence
24
+ - **Progress tracking:** Shows which chunk is being processed
25
 
26
+ ### 4. Memory & Performance
27
+ - **Float16 precision:** Used when GPU is available (faster inference)
28
+ - **Optimized pipeline:** Better model loading with fallback
29
+ - **`optimum` library added:** For additional speed improvements
30
 
31
+ ---
 
32
 
33
+ ## ⚑ Expected Speed Improvements
34
 
35
+ | Task | Before | After |
36
+ |-------------------|----------------------|------------------------------|
37
+ | Model loading | ~30+ seconds | ~10 seconds |
38
+ | PDF processing | Minutes | ~5–15 seconds |
39
+ | Memory usage | ~1.6GB | ~400MB |
40
+ | Overall speed | Slow | πŸš€ 5–10x faster |
41
 
42
  ---
43
 
44
  ## 🧬 What is DistilBART?
45
 
46
+ **DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BART’s performance. It’s the result of **model distillation**, where a smaller model (the *student*) learns from a larger one (the *teacher*), in this case, `facebook/bart-large`.
47
+
48
  | Attribute | Description |
49
+ |------------------|---------------------------------------------------------------------|
50
  | **Full Name** | Distilled BART |
51
  | **Base Model** | `facebook/bart-large` |
52
  | **Distilled By** | Hugging Face πŸ€— |
 
58
  ## βš™οΈ Key Differences: BART vs DistilBART
59
 
60
  | Feature | BART (Large) | DistilBART |
61
+ |----------------|--------------|------------------------|
62
+ | Encoder Layers | 12 | 6 |
63
+ | Decoder Layers | 12 | 6 |
64
+ | Parameters | ~406M | ~222M |
65
+ | Model Size | ~1.6GB | ~400MB (~55% smaller) |
66
+ | Speed | Slower | ~2x faster |
67
+ | Performance | Very high | Slight drop (~1–2%) |
68
 
69
  ---
70
 
71
  ## 🎯 Use Cases
72
 
73
+ - βœ… **Text Summarization** (primary use case)
74
+ - 🌐 **Translation** (basic use)
75
+ - ⚑ Ideal for **edge devices** or **real-time systems** where speed & size matter
76
 
77
  ---
78
 
79
  ## πŸ§ͺ Example: Summarization with DistilBART
80
 
81
+ You can easily use DistilBART with Hugging Face Transformers:
82
 
83
  ```python
84
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
85
 
86
+ # Load pretrained DistilBART model
87
  tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
88
  model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
89
 
90
+ # Input text
91
  ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."
92
 
93
  # Tokenize and summarize
94
  inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
95
+ summary_ids = model.generate(
96
+ inputs["input_ids"],
97
+ max_length=150,
98
+ min_length=40,
99
+ length_penalty=2.0,
100
+ num_beams=4,
101
+ early_stopping=True
102
+ )
103
+
104
  print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
105
+ ````
106
 
107
  ---
108
 
 
113
  | `sshleifer/distilbart-cnn-12-6` | Summarization | Distilled from `facebook/bart-large-cnn` |
114
  | `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries |
115
 
116
+ πŸ”Ž [Find more on Hugging Face Model Hub](https://huggingface.co/models?search=distilbart)
117
 
118
  ---
119
 
120
  ## πŸ“˜ Summary
121
 
122
+ * 🧠 **DistilBART** is a distilled, faster version of **BART**
123
+ * 🧩 Ideal for summarization tasks with lower memory and latency requirements
124
+ * πŸ’‘ Trained using **knowledge distillation** from `facebook/bart-large`
125
+ * βš™οΈ Works well in apps needing faster performance without significant loss in quality
126
+
127
+ ---
128
+
129
+ βœ… **Try it now β€” it should be significantly faster!** πŸƒβ€β™‚οΈπŸ’¨
130
 
131
+ ```
132
+
133
+ Thank You
134
+ ```