ChrisBridges
/

bart-large-quotes

+---
+license: mit
+language:
+- en
+base_model:
+- facebook/bart-large-cnn
+pipeline_tag: summarization
+library_name: transformers
+---
+# BART-large-quotes
+BART \[1\] fine-tuned for extractive summarization on a dataset of movie and book quotes.
+Continued fine-tuning from the BART-large-cnn checkpoint, which was fine-tuned on the CNN Daily Mail, which is more extractive than abstractive.
+## Training Description
+### Dataset
+The model was trained on 11295 quotes, comprising 6280 movie quotes from the Cornell Movie Quotes dataset \[2\] and 5015 book quotes from the T50 dataset \[3\].
+As described in the T50 paper, each movie quote is accompanied by a context of 4 sentences each on the left and the right, while 10 sentences are used for book quotes.
+Training/Development/Test splits of proportions 7:1:2 were created with stratified sampling.
+Split sizes:
+| Split | Total | Movie | Book |
+| ----- | ----- | ----- | ---- |
+| Train | 7906  | 4396  | 3510 |
+| Dev   | 1130  | 628   | 502  |
+| Test  | 2259  | 1256  | 1003 |
+Length statistics:
+| Data          | min | median | max  | mean ± std       |
+| ------------- | --- | ------ | ---- | ---------------- |
+| Movie Context | 38  | 148    | 3358 | 167.13 ± 102.57  |
+| Movie Quote   | 5   | 20     | 592  | 28.22 ± 27.79    |
+| T50 Context   | 86  | 628    | 6497 | 659.14 ± 310.49  |
+| T50 Quote     | 6   | 41     | 877  | 61.87 ± 63.89    |
+| Total Context | 38  | 246    | 6497 | 385.58 ± 329.258 |
+| Total Quote   | 5   | 26     | 877  | 43.16 ± 50.21    |
+### Parameters
+Each experiment uses a max input length of 1024 and a max output length of 128 to account for the short average length of quotes.
+While there is a significant variance in the length of quotes, we are more interested in poignant statements.
+Each BART model is trained with a batch size of 32 for 30 epochs (7440 steps) using AdamW with 0.01 weight decay and a linearly annealing learning rate of 5e-5.
+The first 5% of steps, i.e., 1.5 epochs, are used for a linear warmup. The model is evaluated every 500 steps w.r.t. ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-Lsum.
+After the training, the checkpoint with the best eval_rougeL is loaded to prefer extractive over abstractive summarization. FP16 mixed precision is used.
+In addition, we evaluate T5-base \[4\] with a batch size of 8 (29670 steps) due to the greater memory footprint, and a peak learning rate of 3e-4.
+The learning rates were chosen empirically on shorter training runs of 5 epochs.
+### Evaluation
+Since no data splits were published with the T50 paper \[3\], we cannot reproduce the results and evaluate on the previously described training data.
+Rather than using the whole test set at once for evaluation, we split it into 3 equally-sized disjoint random samples of size 753.
+Each model is evaluated on all 3 samples, and we report the mean scores and 95% confidence interval for all scores.
+Additionally, we report the average predicted quote length, the number of epochs, and the total training time.
+| Model          | ROUGE-1         | ROUGE-2         | ROUGE-L         | ROUGE-Lsum      | Avg Quote Length | Epochs | Time    |
+| -------------- | --------------- | --------------- | --------------- | --------------- | ---------------- | ------ | ------- |
+| T5-base        | 0.3758 ± 0.0175 | 0.2990 ± 0.0128 | 0.3628 ± 0.0189 | 0.3684 ± 0.0201 | 18.1576 ± 0.1084 | 1.01   | 3:39:14 |
+| BART-base      | 0.4236 ± 0.0133 | 0.3498 ± 0.0116 | 0.4112 ± 0.0135 | 0.4165 ± 0.0107 | 19.1027 ± 0.1755 | 12.10  | 0:44:48 |
+| BART-large     | 0.4252 ± 0.0240 | 0.3456 ± 0.0204 | 0.4115 ± 0.0206 | 0.4171 ± 0.0209 | 19.2877 ± 0.1819 | 6.05   | 2:43:56 |
+| BART-large-cnn | 0.4384 ± 0.0225 | 0.3693 ± 0.0197 | 0.4165 ± 0.0239 | 0.4317 ± 0.0234 | 81.8623 ± 1.5324 | 28.23  | 3:48:24 |
+## References
+\[1\] (BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
+)[https://arxiv.org/abs/1910.13461]
+\[2\] (You Had Me at Hello: How Phrasing Affects Memorability)[https://aclanthology.org/P12-1094/]
+\[3\] (Quote Detection: A New Task and Dataset for NLP)[https://aclanthology.org/2023.latechclfl-1.3/]
+\[4\] (Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer)[https://arxiv.org/abs/1910.10683]