Shaelois commited on
Commit
61e7dec
·
verified ·
1 Parent(s): e0f2f6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -1
README.md CHANGED
@@ -9,4 +9,43 @@ metrics:
9
  base_model:
10
  - google/bigbird-pegasus-large-bigpatent
11
  pipeline_tag: summarization
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  base_model:
10
  - google/bigbird-pegasus-large-bigpatent
11
  pipeline_tag: summarization
12
+ ---
13
+ # MeetingScript
14
+
15
+ > A BigBird‐Pegasus model fine‑tuned for meeting transcription summarization on the MeetingBank dataset.
16
+
17
+ 📦 **Model Files**
18
+ - **Weights & config**: `pytorch_model.bin`, `config.json`
19
+ - **Tokenizer**: `tokenizer.json`, `tokenizer_config.json`, `merges.txt`, `special_tokens_map.json`
20
+ - **Generation defaults**: `generation_config.json`
21
+
22
+ 🔗 **Hub:** https://huggingface.co/Shaelois/MeetingScript
23
+
24
+ ---
25
+
26
+ ## Model Description
27
+
28
+ **MeetingScript** is a sequence‑to‑sequence model based on
29
+ [google/bigbird-pegasus-large-bigpatent](https://huggingface.co/google/bigbird-pegasus-large-bigpatent)
30
+ and fine‑tuned on the [MeetingBank](https://huggingface.co/datasets/huuuyeah/meetingbank) corpus of meeting transcripts paired with human‐written summaries.
31
+ It is designed to take long meeting transcripts (up to 4 096 tokens) and produce concise, coherent summaries.
32
+
33
+ ---
34
+
35
+ ## Evaluation Results
36
+
37
+ Evaluated on the held‑out test split of MeetingBank (≈ 600 transcripts), using beam search (4 beams, max_length=150):
38
+
39
+ | Metric | F1 Score (%) |
40
+ |-------------|-------------:|
41
+ | **ROUGE‑1** | 51.5556 |
42
+ | **ROUGE‑2** | 38.5378 |
43
+ | **ROUGE‑L** | 48.0786 |
44
+ | **ROUGE‑Lsum** | 48.0142 |
45
+
46
+ ---
47
+ ## Training Data
48
+ Dataset: MeetingBank
49
+ Splits: Train (~5 000), Validation (~600), Test (~600)
50
+ Preprocessing: Sliding‑window chunking for sequences > 4 096 tokens
51
+