Shaelois
/

MeetingScript

bigbird_pegasus

text2text-generation

Model card Files Files and versions

Shaelois commited on Apr 20, 2025

Commit

61e7dec

·

verified ·

1 Parent(s): e0f2f6c

Update README.md

Files changed (1) hide show

README.md +40 -1

README.md CHANGED Viewed

@@ -9,4 +9,43 @@ metrics:
 base_model:
 - google/bigbird-pegasus-large-bigpatent
 pipeline_tag: summarization
----

 base_model:
 - google/bigbird-pegasus-large-bigpatent
 pipeline_tag: summarization
+---
+# MeetingScript
+> A BigBird‐Pegasus model fine‑tuned for meeting transcription summarization on the MeetingBank dataset.
+📦 **Model Files**
+- **Weights & config**: `pytorch_model.bin`, `config.json`
+- **Tokenizer**: `tokenizer.json`, `tokenizer_config.json`, `merges.txt`, `special_tokens_map.json`
+- **Generation defaults**: `generation_config.json`
+🔗 **Hub:** https://huggingface.co/Shaelois/MeetingScript
+---
+## Model Description
+**MeetingScript** is a sequence‑to‑sequence model based on
+[google/bigbird-pegasus-large-bigpatent](https://huggingface.co/google/bigbird-pegasus-large-bigpatent)
+and fine‑tuned on the [MeetingBank](https://huggingface.co/datasets/huuuyeah/meetingbank) corpus of meeting transcripts paired with human‐written summaries.
+It is designed to take long meeting transcripts (up to 4 096 tokens) and produce concise, coherent summaries.
+---
+## Evaluation Results
+Evaluated on the held‑out test split of MeetingBank (≈ 600 transcripts), using beam search (4 beams, max_length=150):
+| Metric      | F1 Score (%) |
+|-------------|-------------:|
+| **ROUGE‑1** |       51.5556 |
+| **ROUGE‑2** |       38.5378 |
+| **ROUGE‑L** |       48.0786 |
+| **ROUGE‑Lsum** |    48.0142 |
+---
+## Training Data
+Dataset: MeetingBank
+Splits: Train (~5 000), Validation (~600), Test (~600)
+Preprocessing: Sliding‑window chunking for sequences > 4 096 tokens