ChrisBridges commited on
Commit
6522aa9
·
verified ·
1 Parent(s): 567f8c2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - facebook/bart-large-cnn
7
+ pipeline_tag: summarization
8
+ library_name: transformers
9
+ ---
10
+
11
+ # BART-large-quotes
12
+
13
+ BART \[1\] fine-tuned for extractive summarization on a dataset of movie and book quotes.
14
+
15
+ Continued fine-tuning from the BART-large-cnn checkpoint, which was fine-tuned on the CNN Daily Mail, which is more extractive than abstractive.
16
+
17
+ ## Training Description
18
+
19
+ ### Dataset
20
+
21
+ The model was trained on 11295 quotes, comprising 6280 movie quotes from the Cornell Movie Quotes dataset \[2\] and 5015 book quotes from the T50 dataset \[3\].
22
+ As described in the T50 paper, each movie quote is accompanied by a context of 4 sentences each on the left and the right, while 10 sentences are used for book quotes.
23
+ Training/Development/Test splits of proportions 7:1:2 were created with stratified sampling.
24
+
25
+ Split sizes:
26
+ | Split | Total | Movie | Book |
27
+ | ----- | ----- | ----- | ---- |
28
+ | Train | 7906 | 4396 | 3510 |
29
+ | Dev | 1130 | 628 | 502 |
30
+ | Test | 2259 | 1256 | 1003 |
31
+
32
+ Length statistics:
33
+ | Data | min | median | max | mean ± std |
34
+ | ------------- | --- | ------ | ---- | ---------------- |
35
+ | Movie Context | 38 | 148 | 3358 | 167.13 ± 102.57 |
36
+ | Movie Quote | 5 | 20 | 592 | 28.22 ± 27.79 |
37
+ | T50 Context | 86 | 628 | 6497 | 659.14 ± 310.49 |
38
+ | T50 Quote | 6 | 41 | 877 | 61.87 ± 63.89 |
39
+ | Total Context | 38 | 246 | 6497 | 385.58 ± 329.258 |
40
+ | Total Quote | 5 | 26 | 877 | 43.16 ± 50.21 |
41
+
42
+ ### Parameters
43
+
44
+ Each experiment uses a max input length of 1024 and a max output length of 128 to account for the short average length of quotes.
45
+ While there is a significant variance in the length of quotes, we are more interested in poignant statements.
46
+
47
+ Each BART model is trained with a batch size of 32 for 30 epochs (7440 steps) using AdamW with 0.01 weight decay and a linearly annealing learning rate of 5e-5.
48
+ The first 5% of steps, i.e., 1.5 epochs, are used for a linear warmup. The model is evaluated every 500 steps w.r.t. ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-Lsum.
49
+ After the training, the checkpoint with the best eval_rougeL is loaded to prefer extractive over abstractive summarization. FP16 mixed precision is used.
50
+
51
+ In addition, we evaluate T5-base \[4\] with a batch size of 8 (29670 steps) due to the greater memory footprint, and a peak learning rate of 3e-4.
52
+
53
+ The learning rates were chosen empirically on shorter training runs of 5 epochs.
54
+
55
+ ### Evaluation
56
+
57
+ Since no data splits were published with the T50 paper \[3\], we cannot reproduce the results and evaluate on the previously described training data.
58
+ Rather than using the whole test set at once for evaluation, we split it into 3 equally-sized disjoint random samples of size 753.
59
+ Each model is evaluated on all 3 samples, and we report the mean scores and 95% confidence interval for all scores.
60
+ Additionally, we report the average predicted quote length, the number of epochs, and the total training time.
61
+
62
+ | Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum | Avg Quote Length | Epochs | Time |
63
+ | -------------- | --------------- | --------------- | --------------- | --------------- | ---------------- | ------ | ------- |
64
+ | T5-base | 0.3758 ± 0.0175 | 0.2990 ± 0.0128 | 0.3628 ± 0.0189 | 0.3684 ± 0.0201 | 18.1576 ± 0.1084 | 1.01 | 3:39:14 |
65
+ | BART-base | 0.4236 ± 0.0133 | 0.3498 ± 0.0116 | 0.4112 ± 0.0135 | 0.4165 ± 0.0107 | 19.1027 ± 0.1755 | 12.10 | 0:44:48 |
66
+ | BART-large | 0.4252 ± 0.0240 | 0.3456 ± 0.0204 | 0.4115 ± 0.0206 | 0.4171 ± 0.0209 | 19.2877 ± 0.1819 | 6.05 | 2:43:56 |
67
+ | BART-large-cnn | 0.4384 ± 0.0225 | 0.3693 ± 0.0197 | 0.4165 ± 0.0239 | 0.4317 ± 0.0234 | 81.8623 ± 1.5324 | 28.23 | 3:48:24 |
68
+
69
+ ## References
70
+ \[1\] (BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
71
+ )[https://arxiv.org/abs/1910.13461]
72
+ \[2\] (You Had Me at Hello: How Phrasing Affects Memorability)[https://aclanthology.org/P12-1094/]
73
+ \[3\] (Quote Detection: A New Task and Dataset for NLP)[https://aclanthology.org/2023.latechclfl-1.3/]
74
+ \[4\] (Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer)[https://arxiv.org/abs/1910.10683]