saione commited on
Commit
d23bae5
·
verified ·
1 Parent(s): 3500335

Upload meeting summarizer model assets for Streamlit Cloud

Browse files
README.md CHANGED
@@ -1,3 +1,300 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: cc-by-nc-nd-4.0
4
+ datasets:
5
+ - knkarthick/samsum
6
+ metrics:
7
+ - rouge
8
+ tags:
9
+ - summarization
10
+ - abstractive-summarization
11
+ - dialogue-summarization
12
+ - bart
13
+ - seq2seq
14
+ model-index:
15
+ - name: bart-base-samsum-summarizer
16
+ results:
17
+ - task:
18
+ type: summarization
19
+ dataset:
20
+ type: knkarthick/samsum
21
+ name: SAMSum
22
+ split: test
23
+ metrics:
24
+ - type: rouge1
25
+ value: 48.48
26
+ name: ROUGE-1 (D27 beam=5, lp=1.33)
27
+ - type: rouge2
28
+ value: 23.55
29
+ name: ROUGE-2 (D27 beam=5, lp=1.33)
30
+ - type: rougeL
31
+ value: 40.12
32
+ name: ROUGE-L (D27 beam=5, lp=1.33)
33
+ ---
34
+
35
+ # bart-base-samsum-summarizer
36
+
37
+ `facebook/bart-base` fine-tuned on the [SAMSum](https://huggingface.co/datasets/knkarthick/samsum)
38
+ dialogue summarization corpus.
39
+
40
+ > **Note:** Front-matter ROUGE scores reflect the champion decoding config (D27: beam=5, length_penalty=1.33).
41
+ > Default generation config (beam=4, lp=1.0) yields ROUGE-1=47.86, ROUGE-2=23.22, ROUGE-L=39.85.
42
+
43
+ > **⚠️ License**: SAMSum is released under **CC BY-NC-ND 4.0** (non-commercial, no derivatives).
44
+ > This model card, the model weights, and any outputs produced with them are
45
+ > subject to the same terms. **Commercial use is prohibited.**
46
+
47
+ ---
48
+
49
+ ## Model Description
50
+
51
+ | Field | Value |
52
+ |-------|-------|
53
+ | Base model | `facebook/bart-base` (139M parameters) |
54
+ | Task | Abstractive dialogue summarization |
55
+ | Language | English |
56
+ | License | cc-by-nc-nd-4.0 |
57
+ | Dataset | SAMSum (`knkarthick/samsum`) |
58
+ | Hardware trained on | Apple M4 Pro, 24 GB UMA, MPS / BF16 |
59
+
60
+ ---
61
+
62
+ ## Intended Use
63
+
64
+ - **Intended use**: Summarizing short chat conversations (≤ 512 tokens) into
65
+ 1–3 sentence abstractive summaries.
66
+ - **Out-of-scope**: Real-time transcription, audio processing, multi-lingual
67
+ dialogues, or any commercial product.
68
+ - **Not recommended for**: Mission-critical applications where hallucinations
69
+ cannot be tolerated. The model hallucinates entity-level details in ~10% of
70
+ test examples.
71
+
72
+ ---
73
+
74
+ ## Usage
75
+
76
+ ```python
77
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
78
+ import torch
79
+
80
+ model_id = "your-hf-username/bart-base-samsum-summarizer"
81
+
82
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
83
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_id, dtype=torch.bfloat16)
84
+ model.eval()
85
+
86
+ dialogue = """
87
+ Amanda: I baked cookies. Do you want some?
88
+ Jerry: Sure!
89
+ Amanda: I'll bring you tomorrow :-)
90
+ Jerry: Thanks! Do you know how to make the lemon ones?
91
+ Amanda: The biscuits? I'll send you the recipe. It's easy!
92
+ """.strip()
93
+
94
+ inputs = tokenizer(dialogue, return_tensors="pt", max_length=512, truncation=True)
95
+ with torch.no_grad():
96
+ out = model.generate(
97
+ **inputs,
98
+ max_new_tokens = 128,
99
+ num_beams = 5,
100
+ length_penalty = 1.33, # D27 champion config (ROUGE-L 40.12)
101
+ early_stopping = True,
102
+ )
103
+ print(tokenizer.decode(out[0], skip_special_tokens=True))
104
+ # → "Amanda will bring Jerry some cookies tomorrow and send him the recipe."
105
+ ```
106
+
107
+ ---
108
+
109
+ ## Performance
110
+
111
+ All metrics are macro-averaged ROUGE F-measures × 100 on the 819-sample SAMSum test set.
112
+
113
+ ### Test-Set ROUGE
114
+
115
+ | Metric | Value |
116
+ |--------|-------|
117
+ | ROUGE-1 | 48.48 |
118
+ | ROUGE-2 | 23.55 |
119
+ | **ROUGE-L** | **40.12** *(champion: D27 beam=5, lp=1.33)* |
120
+ | ROUGE-L (training config: beam=4, lp=1.0) | 39.92 |
121
+
122
+ ### Comparison: Fine-Tuned vs Zero-Shot
123
+
124
+ | | ROUGE-L |
125
+ |--|---------|
126
+ | BART-base zero-shot (100 samples) | 19.89 |
127
+ | BART-base fine-tuned (819 samples) | **40.12** (+20.23) |
128
+
129
+ ### Decoding Strategy Ablation (11 configs)
130
+
131
+ | Config | ROUGE-L | Avg tokens | ms/sample |
132
+ |--------|---------|-----------|----------|
133
+ | D1: beam=4, lp=0.8 | 39.49 | 15.2 | 138 |
134
+ | D2: beam=4, lp=1.0 | 39.92 | 15.9 | 136 |
135
+ | D3: beam=4, lp=1.2 | 39.97 | 16.7 | 136 |
136
+ | D4: beam=8, lp=1.0 | 39.74 | 15.8 | 220 |
137
+ | D5: nucleus p=0.9 | 35.93 | 18.8 | 92 |
138
+ | D6: beam=4, lp=1.4 | 39.94 | 17.3 | 142 |
139
+ | D7: beam=4, lp=1.25 | 40.01 | 16.8 | 136 |
140
+ | D8: beam=4, lp=1.3 | 40.01 | 17.0 | 137 |
141
+ | D9: beam=4, lp=1.2, nrng=3 | 39.97 | 16.7 | 136 |
142
+ | D10: beam=6, lp=1.2 | 40.03 | 16.7 | 178 |
143
+ | D11: beam=4, lp=1.2, min_len=5 | 39.97 | 16.7 | 136 |
144
+
145
+ > Full 29-config sweep results in `results/metrics/decoding_D*.json`. Champion: **D27** (beam=5, lp=1.33) at ROUGE-L **40.12** — see `docs/EXPERIMENTS.md` for complete E3 table.
146
+
147
+ ### Faithfulness Metrics
148
+
149
+ | Metric | Value |
150
+ |--------|-------|
151
+ | Hallucination rate (spaCy NER) | 10.1% (83 / 819) |
152
+ | Speaker preservation | 75.5% |
153
+ | NLI faithfulness (DeBERTa-v3) | 0.308 |
154
+ | Length–ROUGE-L Pearson r | −0.25 |
155
+
156
+ ### LoRA Parameter-Efficient Fine-Tuning
157
+
158
+ | Model | ROUGE-1 | ROUGE-2 | ROUGE-L | Trainable params |
159
+ |-------|---------|---------|---------|-----------------|
160
+ | BART-base (full fine-tune) | 48.04 | 23.33 | 39.92 | 139.4M (100%) |
161
+ | BART-base (LoRA r=16, α=32) | 45.15 | 21.20 | 37.59 | 0.88M (0.63%) |
162
+
163
+ LoRA achieves **94.2%** of full fine-tune ROUGE-L with only **0.63%** trainable parameters.
164
+
165
+ ### PEGASUS Cross-Domain Transfer
166
+
167
+ | Condition | ROUGE-1 | ROUGE-2 | ROUGE-L | Notes |
168
+ |-----------|---------|---------|---------|-------|
169
+ | Zero-shot | 1.85 | 0.00 | 1.60 | news → dialogue domain mismatch |
170
+ | Fine-tuned | 1.65 | 0.00 | 1.56 | Convergence failure (see below) |
171
+
172
+ **Training failure**: `gradient_accumulation_steps=8` on MPS caused 8× gradient
173
+ inflation (effective lr=1.6e-4). `eval_loss=9.601` at epoch 3 ≈ random baseline.
174
+ Fixed in script (`grad_accum=1`); ROUGE-L 40–44 expected on re-run.
175
+
176
+ ### Extended Training (E8 — 8 epochs, cosine LR, lr=3e-5)
177
+
178
+ | Condition | ROUGE-1 | ROUGE-2 | ROUGE-L | Train time | Notes |
179
+ |-----------|---------|---------|---------|-----------|-------|
180
+ | Baseline (5ep, lr=5e-5) | 47.86 | 23.22 | 39.85 | 168.4 min | E1 result |
181
+ | Extended (8ep, lr=3e-5, cosine) | 46.45 | 22.05 | 38.46 | 259.6 min | Best epoch 4 |
182
+
183
+ **Finding**: Δ ROUGE-L = −1.39. Lower peak LR caused underfitting; baseline with lr=5e-5
184
+ linear decay converges to a better optimum. Hypothesis not supported.
185
+
186
+ ---
187
+
188
+ ## Training Procedure
189
+
190
+ ### Dataset
191
+
192
+ - **Train**: 14,731 examples
193
+ - **Validation**: 818 examples
194
+ - **Test**: 819 examples
195
+ - **Variant used**: `with_speakers` — speaker attribution tags (`Name: `) preserved.
196
+ Ablation shows this contributes +6.62 ROUGE-L vs stripping tags.
197
+
198
+ ### Preprocessing
199
+
200
+ Dialogues are tokenized with `AutoTokenizer` from `facebook/bart-base`.
201
+ `max_source_length=512`, `max_target_length=128` (covers 99%+ of SAMSum
202
+ examples at these lengths). No task prefix (BART does not require one;
203
+ T5 uses `"summarize: "`).
204
+
205
+ ### Hyperparameters
206
+
207
+ | Parameter | Value |
208
+ |-----------|-------|
209
+ | Base model | `facebook/bart-base` |
210
+ | Optimizer | AdamW |
211
+ | Learning rate | 5.0 × 10⁻⁵ |
212
+ | LR schedule | Linear decay |
213
+ | Warmup steps | 500 |
214
+ | Weight decay | 0.01 |
215
+ | Batch size | 8 |
216
+ | Max epochs | 5 |
217
+ | Early stopping patience | 2 |
218
+ | Gradient clip norm | 1.0 |
219
+ | Precision | BF16 |
220
+ | Best epoch | 5 |
221
+ | Best val ROUGE-L | 41.57 |
222
+ | Training time | 72.4 min (M4 Pro MPS) |
223
+
224
+ ### Compute
225
+
226
+ Trained on Apple M4 Pro (T6041), 24 GB Unified Memory, 20 GPU cores.
227
+ PyTorch 2.10.0 MPS backend, BF16.
228
+
229
+ ---
230
+
231
+ ## Limitations
232
+
233
+ - **Synthetic training data**: SAMSum was constructed by human annotators
234
+ writing fictional WhatsApp-style dialogues. The model has not been evaluated
235
+ on real meeting transcripts or audio-derived text.
236
+ - **Two-speaker bias**: ~75% of SAMSum examples involve exactly 2 participants.
237
+ Summarization quality for 3+ speaker conversations is likely lower.
238
+ - **Hallucination**: ~10.1% of test summaries contain at least one NER-detected
239
+ hallucinated entity. The actual hallucination rate is higher for non-entity
240
+ errors (e.g. fabricated scores, inverted speaker actions).
241
+ - **Speaker attribution errors**: ~25% of summaries have at least one
242
+ speaker attribution mistake (e.g. "X will call Y" when it is Y who called).
243
+ - **Non-commercial only**: CC BY-NC-ND 4.0 applies to all outputs.
244
+
245
+ ---
246
+
247
+ ## Citation
248
+
249
+ ```bibtex
250
+ @inproceedings{gliwa-etal-2019-samsum,
251
+ title = "{SAMS}um Corpus: A Human-annotated Dialogue Dataset
252
+ for Abstractive Summarization",
253
+ author = "Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej
254
+ and Wawer, Aleksander",
255
+ booktitle = "Proceedings of the 2nd Workshop on New Frontiers in
256
+ Summarization",
257
+ year = "2019",
258
+ publisher = "Association for Computational Linguistics",
259
+ doi = "10.18653/v1/D19-5409",
260
+ }
261
+ ```
262
+
263
+ ---
264
+
265
+ ## How to Push to HuggingFace Hub
266
+
267
+ ```bash
268
+ # 1. Log in
269
+ huggingface-cli login
270
+
271
+ # 2. Create the repository (replace <username>)
272
+ huggingface-cli repo create bart-base-samsum-summarizer --type model
273
+
274
+ # 3. Push model weights + tokenizer
275
+ python3 - <<'EOF'
276
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
277
+ import torch
278
+
279
+ model_path = "models/best/facebook_bart-base_with_speakers"
280
+ repo_id = "your-hf-username/bart-base-samsum-summarizer" # ← replace
281
+
282
+ tok = AutoTokenizer.from_pretrained(model_path)
283
+ mdl = AutoModelForSeq2SeqLM.from_pretrained(model_path, dtype=torch.bfloat16)
284
+
285
+ tok.push_to_hub(repo_id)
286
+ mdl.push_to_hub(repo_id)
287
+ print(f"✅ Pushed to https://huggingface.co/{repo_id}")
288
+ EOF
289
+
290
+ # 4. Push model card
291
+ huggingface-cli upload your-hf-username/bart-base-samsum-summarizer \
292
+ model_card.md README.md
293
+
294
+ # 5. Verify
295
+ huggingface-cli whoami
296
+ # → Opens https://huggingface.co/your-hf-username/bart-base-samsum-summarizer
297
+ ```
298
+
299
+ > **Note**: Do NOT push `models/best/` to GitHub — model weights belong on
300
+ > the HuggingFace Hub only. The `.gitignore` should already exclude `models/`.
config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "T5ForConditionalGeneration"
4
+ ],
5
+ "classifier_dropout": 0.0,
6
+ "d_ff": 2048,
7
+ "d_kv": 64,
8
+ "d_model": 512,
9
+ "decoder_start_token_id": 0,
10
+ "dense_act_fn": "relu",
11
+ "dropout_rate": 0.1,
12
+ "dtype": "float32",
13
+ "eos_token_id": 1,
14
+ "feed_forward_proj": "relu",
15
+ "initializer_factor": 1.0,
16
+ "is_decoder": false,
17
+ "is_encoder_decoder": true,
18
+ "is_gated_act": false,
19
+ "layer_norm_epsilon": 1e-06,
20
+ "model_type": "t5",
21
+ "n_positions": 512,
22
+ "num_decoder_layers": 6,
23
+ "num_heads": 8,
24
+ "num_layers": 6,
25
+ "output_past": true,
26
+ "pad_token_id": 0,
27
+ "relative_attention_max_distance": 128,
28
+ "relative_attention_num_buckets": 32,
29
+ "scale_decoder_outputs": true,
30
+ "task_specific_params": {
31
+ "summarization": {
32
+ "early_stopping": true,
33
+ "length_penalty": 2.0,
34
+ "max_length": 200,
35
+ "min_length": 30,
36
+ "no_repeat_ngram_size": 3,
37
+ "num_beams": 4,
38
+ "prefix": "summarize: "
39
+ },
40
+ "translation_en_to_de": {
41
+ "early_stopping": true,
42
+ "max_length": 300,
43
+ "num_beams": 4,
44
+ "prefix": "translate English to German: "
45
+ },
46
+ "translation_en_to_fr": {
47
+ "early_stopping": true,
48
+ "max_length": 300,
49
+ "num_beams": 4,
50
+ "prefix": "translate English to French: "
51
+ },
52
+ "translation_en_to_ro": {
53
+ "early_stopping": true,
54
+ "max_length": 300,
55
+ "num_beams": 4,
56
+ "prefix": "translate English to Romanian: "
57
+ }
58
+ },
59
+ "tie_word_embeddings": true,
60
+ "transformers_version": "5.2.0",
61
+ "use_cache": false,
62
+ "vocab_size": 32128
63
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "decoder_start_token_id": 0,
4
+ "eos_token_id": [
5
+ 1
6
+ ],
7
+ "pad_token_id": 0,
8
+ "transformers_version": "5.2.0"
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fc53c7d1347a62628eb060b11482aa1c4899b2bfdaddf3d1c08ffc072be0d6f
3
+ size 242041896
task5_production_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task": "task5_production_baseline",
3
+ "lora_rank": 16,
4
+ "structured_schema": {
5
+ "topics": "list of main topics discussed",
6
+ "action_items": "list of action items or next steps",
7
+ "decision": "main decision or outcome"
8
+ },
9
+ "model_path": "/Users/vnissankararao/dsgrid/dstask2/meeting-summarizer/models/production_task5",
10
+ "source": "/Users/vnissankararao/dsgrid/dstask2/meeting-summarizer/models/best/t5-small_lora_r16/merged_structured",
11
+ "structured_supervised": true
12
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "eos_token": "</s>",
5
+ "extra_ids": 100,
6
+ "extra_special_tokens": [
7
+ "<extra_id_0>",
8
+ "<extra_id_1>",
9
+ "<extra_id_2>",
10
+ "<extra_id_3>",
11
+ "<extra_id_4>",
12
+ "<extra_id_5>",
13
+ "<extra_id_6>",
14
+ "<extra_id_7>",
15
+ "<extra_id_8>",
16
+ "<extra_id_9>",
17
+ "<extra_id_10>",
18
+ "<extra_id_11>",
19
+ "<extra_id_12>",
20
+ "<extra_id_13>",
21
+ "<extra_id_14>",
22
+ "<extra_id_15>",
23
+ "<extra_id_16>",
24
+ "<extra_id_17>",
25
+ "<extra_id_18>",
26
+ "<extra_id_19>",
27
+ "<extra_id_20>",
28
+ "<extra_id_21>",
29
+ "<extra_id_22>",
30
+ "<extra_id_23>",
31
+ "<extra_id_24>",
32
+ "<extra_id_25>",
33
+ "<extra_id_26>",
34
+ "<extra_id_27>",
35
+ "<extra_id_28>",
36
+ "<extra_id_29>",
37
+ "<extra_id_30>",
38
+ "<extra_id_31>",
39
+ "<extra_id_32>",
40
+ "<extra_id_33>",
41
+ "<extra_id_34>",
42
+ "<extra_id_35>",
43
+ "<extra_id_36>",
44
+ "<extra_id_37>",
45
+ "<extra_id_38>",
46
+ "<extra_id_39>",
47
+ "<extra_id_40>",
48
+ "<extra_id_41>",
49
+ "<extra_id_42>",
50
+ "<extra_id_43>",
51
+ "<extra_id_44>",
52
+ "<extra_id_45>",
53
+ "<extra_id_46>",
54
+ "<extra_id_47>",
55
+ "<extra_id_48>",
56
+ "<extra_id_49>",
57
+ "<extra_id_50>",
58
+ "<extra_id_51>",
59
+ "<extra_id_52>",
60
+ "<extra_id_53>",
61
+ "<extra_id_54>",
62
+ "<extra_id_55>",
63
+ "<extra_id_56>",
64
+ "<extra_id_57>",
65
+ "<extra_id_58>",
66
+ "<extra_id_59>",
67
+ "<extra_id_60>",
68
+ "<extra_id_61>",
69
+ "<extra_id_62>",
70
+ "<extra_id_63>",
71
+ "<extra_id_64>",
72
+ "<extra_id_65>",
73
+ "<extra_id_66>",
74
+ "<extra_id_67>",
75
+ "<extra_id_68>",
76
+ "<extra_id_69>",
77
+ "<extra_id_70>",
78
+ "<extra_id_71>",
79
+ "<extra_id_72>",
80
+ "<extra_id_73>",
81
+ "<extra_id_74>",
82
+ "<extra_id_75>",
83
+ "<extra_id_76>",
84
+ "<extra_id_77>",
85
+ "<extra_id_78>",
86
+ "<extra_id_79>",
87
+ "<extra_id_80>",
88
+ "<extra_id_81>",
89
+ "<extra_id_82>",
90
+ "<extra_id_83>",
91
+ "<extra_id_84>",
92
+ "<extra_id_85>",
93
+ "<extra_id_86>",
94
+ "<extra_id_87>",
95
+ "<extra_id_88>",
96
+ "<extra_id_89>",
97
+ "<extra_id_90>",
98
+ "<extra_id_91>",
99
+ "<extra_id_92>",
100
+ "<extra_id_93>",
101
+ "<extra_id_94>",
102
+ "<extra_id_95>",
103
+ "<extra_id_96>",
104
+ "<extra_id_97>",
105
+ "<extra_id_98>",
106
+ "<extra_id_99>"
107
+ ],
108
+ "is_local": false,
109
+ "model_max_length": 512,
110
+ "pad_token": "<pad>",
111
+ "tokenizer_class": "T5Tokenizer",
112
+ "unk_token": "<unk>"
113
+ }