ChrisBridges commited on
Commit
2198a3f
·
verified ·
1 Parent(s): d8d5a32

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - facebook/bart-large-cnn
7
+ pipeline_tag: summarization
8
+ library_name: transformers
9
+ tags:
10
+ - movies
11
+ - books
12
+ - quotes
13
+ - quote-detection
14
+ - extractive
15
+ ---
16
+
17
+ # BART-base-quotes
18
+
19
+ BART \[1\] fine-tuned for extractive summarization on a dataset of movie and book quotes.
20
+
21
+ Continued fine-tuning from the BART-base checkpoint.
22
+
23
+ **Compare**: The larger model [BART-large-quotes](https://huggingface.co/ChrisBridges/bart-large-quotes) achieved slightly better ROUGE scores,
24
+ but favors much longer quotes (~4x length on average).
25
+
26
+ ## Training Description
27
+
28
+ ### Dataset
29
+
30
+ The model was trained on 11295 quotes, comprising 6280 movie quotes from the Cornell Movie Quotes dataset \[2\] and 5015 book quotes from the T50 dataset \[3\].
31
+ As described in the T50 paper, each movie quote is accompanied by a context of 4 sentences each on the left and the right, while 10 sentences are used for book quotes.
32
+ Training/Development/Test splits of proportions 7:1:2 were created with stratified sampling.
33
+ In the tables below, we report the sample sizes in each data split and the length statistics of the contexts and quotes in each sample.
34
+
35
+ | Split | Total | Movie | Book |
36
+ | ----- | ----- | ----- | ---- |
37
+ | Train | 7906 | 4396 | 3510 |
38
+ | Dev | 1130 | 628 | 502 |
39
+ | Test | 2259 | 1256 | 1003 |
40
+
41
+ | Data | min | median | max | mean ± std |
42
+ | ------------- | --- | ------ | ---- | ---------------- |
43
+ | Movie Context | 38 | 148 | 3358 | 167.13 ± 102.57 |
44
+ | Movie Quote | 5 | 20 | 592 | 28.22 ± 27.79 |
45
+ | T50 Context | 86 | 628 | 6497 | 659.14 ± 310.49 |
46
+ | T50 Quote | 6 | 41 | 877 | 61.87 ± 63.89 |
47
+ | Total Context | 38 | 246 | 6497 | 385.58 ± 329.258 |
48
+ | Total Quote | 5 | 26 | 877 | 43.16 ± 50.21 |
49
+
50
+ ### Parameters
51
+
52
+ Each experiment uses a max input length of 1024 and a max output length of 128 to account for the short average length of quotes.
53
+ While there is a significant variance in the length of quotes, we are more interested in poignant statements.
54
+
55
+ Each BART model is trained with a batch size of 32 for 30 epochs (7440 steps) using AdamW with 0.01 weight decay and a linearly annealing learning rate of 5e-5.
56
+ The first 5% of steps, i.e., 1.5 epochs, are used for a linear warmup. The model is evaluated every 500 steps w.r.t. ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-Lsum.
57
+ After the training, the checkpoint with the best eval_rougeL is loaded to prefer extractive over abstractive summarization. FP16 mixed precision is used.
58
+
59
+ In addition, we evaluate T5-base \[4\] with a batch size of 8 (29670 steps) due to the greater memory footprint, and a peak learning rate of 3e-4.
60
+
61
+ The learning rates were chosen empirically on shorter training runs of 5 epochs.
62
+
63
+ ### Evaluation
64
+
65
+ Since no data splits were published with the T50 paper \[3\], we cannot reproduce the results and evaluate on the previously described training data.
66
+ Rather than using the whole test set at once for evaluation, we split it into 3 equally-sized disjoint random samples of size 753.
67
+ Each model is evaluated on all 3 samples, and we report the mean scores and 95% confidence interval for all scores.
68
+ Additionally, we report the average predicted quote length, the number of epochs, and the total training time.
69
+
70
+ | Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum | Avg Quote Length | Epochs | Time |
71
+ | -------------- | --------------- | --------------- | --------------- | --------------- | ---------------- | ------ | ------- |
72
+ | T5-base | 0.3758 ± 0.0175 | 0.2990 ± 0.0128 | 0.3628 ± 0.0189 | 0.3684 ± 0.0201 | 18.1576 ± 0.1084 | 1.01 | 3:39:14 |
73
+ | BART-base | 0.4236 ± 0.0133 | 0.3498 ± 0.0116 | 0.4112 ± 0.0135 | 0.4165 ± 0.0107 | 19.1027 ± 0.1755 | 12.10 | 0:44:48 |
74
+ | BART-large | 0.4252 ± 0.0240 | 0.3456 ± 0.0204 | 0.4115 ± 0.0206 | 0.4171 ± 0.0209 | 19.2877 ± 0.1819 | 6.05 | 2:43:56 |
75
+ | BART-large-cnn | 0.4384 ± 0.0225 | 0.3693 ± 0.0197 | 0.4165 ± 0.0239 | 0.4317 ± 0.0234 | 81.8623 ± 1.5324 | 28.23 | 3:48:24 |
76
+
77
+ ## References
78
+ \[1\] [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
79
+ ](https://arxiv.org/abs/1910.13461)
80
+ \[2\] [You Had Me at Hello: How Phrasing Affects Memorability](https://aclanthology.org/P12-1094/)
81
+ \[3\] [Quote Detection: A New Task and Dataset for NLP](https://aclanthology.org/2023.latechclfl-1.3/)
82
+ \[4\] [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)