Upload meeting summarizer model assets for Streamlit Cloud

Browse files

Files changed (7) hide show

README.md +300 -3
config.json +63 -0
generation_config.json +9 -0
model.safetensors +3 -0
task5_production_config.json +12 -0
tokenizer.json +0 -0
tokenizer_config.json +113 -0

README.md CHANGED Viewed

@@ -1,3 +1,300 @@
----
-license: apache-2.0
----

+---
+language: en
+license: cc-by-nc-nd-4.0
+datasets:
+  - knkarthick/samsum
+metrics:
+  - rouge
+tags:
+  - summarization
+  - abstractive-summarization
+  - dialogue-summarization
+  - bart
+  - seq2seq
+model-index:
+  - name: bart-base-samsum-summarizer
+    results:
+      - task:
+          type: summarization
+        dataset:
+          type: knkarthick/samsum
+          name: SAMSum
+          split: test
+        metrics:
+          - type: rouge1
+            value: 48.48
+            name: ROUGE-1 (D27 beam=5, lp=1.33)
+          - type: rouge2
+            value: 23.55
+            name: ROUGE-2 (D27 beam=5, lp=1.33)
+          - type: rougeL
+            value: 40.12
+            name: ROUGE-L (D27 beam=5, lp=1.33)
+---
+# bart-base-samsum-summarizer
+`facebook/bart-base` fine-tuned on the [SAMSum](https://huggingface.co/datasets/knkarthick/samsum)
+dialogue summarization corpus.
+> **Note:** Front-matter ROUGE scores reflect the champion decoding config (D27: beam=5, length_penalty=1.33).
+> Default generation config (beam=4, lp=1.0) yields ROUGE-1=47.86, ROUGE-2=23.22, ROUGE-L=39.85.
+> **⚠️ License**: SAMSum is released under **CC BY-NC-ND 4.0** (non-commercial, no derivatives).
+> This model card, the model weights, and any outputs produced with them are
+> subject to the same terms. **Commercial use is prohibited.**
+---
+## Model Description
+| Field | Value |
+|-------|-------|
+| Base model | `facebook/bart-base` (139M parameters) |
+| Task | Abstractive dialogue summarization |
+| Language | English |
+| License | cc-by-nc-nd-4.0 |
+| Dataset | SAMSum (`knkarthick/samsum`) |
+| Hardware trained on | Apple M4 Pro, 24 GB UMA, MPS / BF16 |
+---
+## Intended Use
+- **Intended use**: Summarizing short chat conversations (≤ 512 tokens) into
+  1–3 sentence abstractive summaries.
+- **Out-of-scope**: Real-time transcription, audio processing, multi-lingual
+  dialogues, or any commercial product.
+- **Not recommended for**: Mission-critical applications where hallucinations
+  cannot be tolerated. The model hallucinates entity-level details in ~10% of
+  test examples.
+---
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+import torch
+model_id = "your-hf-username/bart-base-samsum-summarizer"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model     = AutoModelForSeq2SeqLM.from_pretrained(model_id, dtype=torch.bfloat16)
+model.eval()
+dialogue = """
+Amanda: I baked cookies. Do you want some?
+Jerry: Sure!
+Amanda: I'll bring you tomorrow :-)
+Jerry: Thanks! Do you know how to make the lemon ones?
+Amanda: The biscuits? I'll send you the recipe. It's easy!
+""".strip()
+inputs = tokenizer(dialogue, return_tensors="pt", max_length=512, truncation=True)
+with torch.no_grad():
+    out = model.generate(
+        **inputs,
+        max_new_tokens = 128,
+        num_beams      = 5,
+        length_penalty = 1.33,  # D27 champion config (ROUGE-L 40.12)
+        early_stopping = True,
+    )
+print(tokenizer.decode(out[0], skip_special_tokens=True))
+# → "Amanda will bring Jerry some cookies tomorrow and send him the recipe."
+```
+---
+## Performance
+All metrics are macro-averaged ROUGE F-measures × 100 on the 819-sample SAMSum test set.
+### Test-Set ROUGE
+| Metric | Value |
+|--------|-------|
+| ROUGE-1 | 48.48 |
+| ROUGE-2 | 23.55 |
+| **ROUGE-L** | **40.12** *(champion: D27 beam=5, lp=1.33)* |
+| ROUGE-L (training config: beam=4, lp=1.0) | 39.92 |
+### Comparison: Fine-Tuned vs Zero-Shot
+| | ROUGE-L |
+|--|---------|
+| BART-base zero-shot (100 samples) | 19.89 |
+| BART-base fine-tuned (819 samples) | **40.12** (+20.23) |
+### Decoding Strategy Ablation (11 configs)
+| Config | ROUGE-L | Avg tokens | ms/sample |
+|--------|---------|-----------|----------|
+| D1: beam=4, lp=0.8 | 39.49 | 15.2 | 138 |
+| D2: beam=4, lp=1.0 | 39.92 | 15.9 | 136 |
+| D3: beam=4, lp=1.2 | 39.97 | 16.7 | 136 |
+| D4: beam=8, lp=1.0 | 39.74 | 15.8 | 220 |
+| D5: nucleus p=0.9 | 35.93 | 18.8 | 92 |
+| D6: beam=4, lp=1.4 | 39.94 | 17.3 | 142 |
+| D7: beam=4, lp=1.25 | 40.01 | 16.8 | 136 |
+| D8: beam=4, lp=1.3 | 40.01 | 17.0 | 137 |
+| D9: beam=4, lp=1.2, nrng=3 | 39.97 | 16.7 | 136 |
+| D10: beam=6, lp=1.2 | 40.03 | 16.7 | 178 |
+| D11: beam=4, lp=1.2, min_len=5 | 39.97 | 16.7 | 136 |
+> Full 29-config sweep results in `results/metrics/decoding_D*.json`. Champion: **D27** (beam=5, lp=1.33) at ROUGE-L **40.12** — see `docs/EXPERIMENTS.md` for complete E3 table.
+### Faithfulness Metrics
+| Metric | Value |
+|--------|-------|
+| Hallucination rate (spaCy NER) | 10.1% (83 / 819) |
+| Speaker preservation | 75.5% |
+| NLI faithfulness (DeBERTa-v3) | 0.308 |
+| Length–ROUGE-L Pearson r | −0.25 |
+### LoRA Parameter-Efficient Fine-Tuning
+| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | Trainable params |
+|-------|---------|---------|---------|-----------------|
+| BART-base (full fine-tune) | 48.04 | 23.33 | 39.92 | 139.4M (100%) |
+| BART-base (LoRA r=16, α=32) | 45.15 | 21.20 | 37.59 | 0.88M (0.63%) |
+LoRA achieves **94.2%** of full fine-tune ROUGE-L with only **0.63%** trainable parameters.
+### PEGASUS Cross-Domain Transfer
+| Condition | ROUGE-1 | ROUGE-2 | ROUGE-L | Notes |
+|-----------|---------|---------|---------|-------|
+| Zero-shot | 1.85 | 0.00 | 1.60 | news → dialogue domain mismatch |
+| Fine-tuned | 1.65 | 0.00 | 1.56 | Convergence failure (see below) |
+**Training failure**: `gradient_accumulation_steps=8` on MPS caused 8× gradient
+inflation (effective lr=1.6e-4). `eval_loss=9.601` at epoch 3 ≈ random baseline.
+Fixed in script (`grad_accum=1`); ROUGE-L 40–44 expected on re-run.
+### Extended Training (E8 — 8 epochs, cosine LR, lr=3e-5)
+| Condition | ROUGE-1 | ROUGE-2 | ROUGE-L | Train time | Notes |
+|-----------|---------|---------|---------|-----------|-------|
+| Baseline (5ep, lr=5e-5) | 47.86 | 23.22 | 39.85 | 168.4 min | E1 result |
+| Extended (8ep, lr=3e-5, cosine) | 46.45 | 22.05 | 38.46 | 259.6 min | Best epoch 4 |
+**Finding**: Δ ROUGE-L = −1.39. Lower peak LR caused underfitting; baseline with lr=5e-5
+linear decay converges to a better optimum. Hypothesis not supported.
+---
+## Training Procedure
+### Dataset
+- **Train**: 14,731 examples
+- **Validation**: 818 examples
+- **Test**: 819 examples
+- **Variant used**: `with_speakers` — speaker attribution tags (`Name: `) preserved.
+  Ablation shows this contributes +6.62 ROUGE-L vs stripping tags.
+### Preprocessing
+Dialogues are tokenized with `AutoTokenizer` from `facebook/bart-base`.
+`max_source_length=512`, `max_target_length=128` (covers 99%+ of SAMSum
+examples at these lengths). No task prefix (BART does not require one;
+T5 uses `"summarize: "`).
+### Hyperparameters
+| Parameter | Value |
+|-----------|-------|
+| Base model | `facebook/bart-base` |
+| Optimizer | AdamW |
+| Learning rate | 5.0 × 10⁻⁵ |
+| LR schedule | Linear decay |
+| Warmup steps | 500 |
+| Weight decay | 0.01 |
+| Batch size | 8 |
+| Max epochs | 5 |
+| Early stopping patience | 2 |
+| Gradient clip norm | 1.0 |
+| Precision | BF16 |
+| Best epoch | 5 |
+| Best val ROUGE-L | 41.57 |
+| Training time | 72.4 min (M4 Pro MPS) |
+### Compute
+Trained on Apple M4 Pro (T6041), 24 GB Unified Memory, 20 GPU cores.
+PyTorch 2.10.0 MPS backend, BF16.
+---
+## Limitations
+- **Synthetic training data**: SAMSum was constructed by human annotators
+  writing fictional WhatsApp-style dialogues. The model has not been evaluated
+  on real meeting transcripts or audio-derived text.
+- **Two-speaker bias**: ~75% of SAMSum examples involve exactly 2 participants.
+  Summarization quality for 3+ speaker conversations is likely lower.
+- **Hallucination**: ~10.1% of test summaries contain at least one NER-detected
+  hallucinated entity. The actual hallucination rate is higher for non-entity
+  errors (e.g. fabricated scores, inverted speaker actions).
+- **Speaker attribution errors**: ~25% of summaries have at least one
+  speaker attribution mistake (e.g. "X will call Y" when it is Y who called).
+- **Non-commercial only**: CC BY-NC-ND 4.0 applies to all outputs.
+---
+## Citation
+```bibtex
+@inproceedings{gliwa-etal-2019-samsum,
+    title     = "{SAMS}um Corpus: A Human-annotated Dialogue Dataset
+                 for Abstractive Summarization",
+    author    = "Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej
+                 and Wawer, Aleksander",
+    booktitle = "Proceedings of the 2nd Workshop on New Frontiers in
+                 Summarization",
+    year      = "2019",
+    publisher = "Association for Computational Linguistics",
+    doi       = "10.18653/v1/D19-5409",
+}
+```
+---
+## How to Push to HuggingFace Hub
+```bash
+# 1. Log in
+huggingface-cli login
+# 2. Create the repository (replace <username>)
+huggingface-cli repo create bart-base-samsum-summarizer --type model
+# 3. Push model weights + tokenizer
+python3 - <<'EOF'
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+import torch
+model_path = "models/best/facebook_bart-base_with_speakers"
+repo_id    = "your-hf-username/bart-base-samsum-summarizer"   # ← replace
+tok = AutoTokenizer.from_pretrained(model_path)
+mdl = AutoModelForSeq2SeqLM.from_pretrained(model_path, dtype=torch.bfloat16)
+tok.push_to_hub(repo_id)
+mdl.push_to_hub(repo_id)
+print(f"✅ Pushed to https://huggingface.co/{repo_id}")
+EOF
+# 4. Push model card
+huggingface-cli upload your-hf-username/bart-base-samsum-summarizer \
+    model_card.md README.md
+# 5. Verify
+huggingface-cli whoami
+# → Opens https://huggingface.co/your-hf-username/bart-base-samsum-summarizer
+```
+> **Note**: Do NOT push `models/best/` to GitHub — model weights belong on
+> the HuggingFace Hub only. The `.gitignore` should already exclude `models/`.

config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "classifier_dropout": 0.0,
+  "d_ff": 2048,
+  "d_kv": 64,
+  "d_model": 512,
+  "decoder_start_token_id": 0,
+  "dense_act_fn": "relu",
+  "dropout_rate": 0.1,
+  "dtype": "float32",
+  "eos_token_id": 1,
+  "feed_forward_proj": "relu",
+  "initializer_factor": 1.0,
+  "is_decoder": false,
+  "is_encoder_decoder": true,
+  "is_gated_act": false,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "n_positions": 512,
+  "num_decoder_layers": 6,
+  "num_heads": 8,
+  "num_layers": 6,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "scale_decoder_outputs": true,
+  "task_specific_params": {
+    "summarization": {
+      "early_stopping": true,
+      "length_penalty": 2.0,
+      "max_length": 200,
+      "min_length": 30,
+      "no_repeat_ngram_size": 3,
+      "num_beams": 4,
+      "prefix": "summarize: "
+    },
+    "translation_en_to_de": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to German: "
+    },
+    "translation_en_to_fr": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to French: "
+    },
+    "translation_en_to_ro": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to Romanian: "
+    }
+  },
+  "tie_word_embeddings": true,
+  "transformers_version": "5.2.0",
+  "use_cache": false,
+  "vocab_size": 32128
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "decoder_start_token_id": 0,
+  "eos_token_id": [
+    1
+  ],
+  "pad_token_id": 0,
+  "transformers_version": "5.2.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fc53c7d1347a62628eb060b11482aa1c4899b2bfdaddf3d1c08ffc072be0d6f
+size 242041896

task5_production_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "task": "task5_production_baseline",
+  "lora_rank": 16,
+  "structured_schema": {
+    "topics": "list of main topics discussed",
+    "action_items": "list of action items or next steps",
+    "decision": "main decision or outcome"
+  },
+  "model_path": "/Users/vnissankararao/dsgrid/dstask2/meeting-summarizer/models/production_task5",
+  "source": "/Users/vnissankararao/dsgrid/dstask2/meeting-summarizer/models/best/t5-small_lora_r16/merged_structured",
+  "structured_supervised": true
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "extra_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "is_local": false,
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>"
+}