José Ángel González
commited on
Commit
·
20fa076
1
Parent(s):
c33c85c
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,3 +12,19 @@ widget:
|
|
| 12 |
News Abstractive Summarization for Catalan (NASCA) is a Transformer encoder-decoder model, with the same hyper-parameters than BART, to perform summarization of Catalan news articles. It is pre-trained on a combination of several self-supervised tasks that help to increase the abstractivity of the generated summaries. Four pre-training tasks have been combined: sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation. Catalan newspapers, the Catalan subset of the OSCAR corpus and Wikipedia articles in Catalan were used for pre-training the model (9.3GB of raw text -2.5 millions of documents-).
|
| 13 |
|
| 14 |
NASCA is finetuned for the summarization task on 636.596 (document, summary) pairs from the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
News Abstractive Summarization for Catalan (NASCA) is a Transformer encoder-decoder model, with the same hyper-parameters than BART, to perform summarization of Catalan news articles. It is pre-trained on a combination of several self-supervised tasks that help to increase the abstractivity of the generated summaries. Four pre-training tasks have been combined: sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation. Catalan newspapers, the Catalan subset of the OSCAR corpus and Wikipedia articles in Catalan were used for pre-training the model (9.3GB of raw text -2.5 millions of documents-).
|
| 13 |
|
| 14 |
NASCA is finetuned for the summarization task on 636.596 (document, summary) pairs from the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA).
|
| 15 |
+
|
| 16 |
+
More details about the pretraining/finetuning datasets and the models soon:
|
| 17 |
+
|
| 18 |
+
@unpublished{DACSA,
|
| 19 |
+
author = "Vicent Ahuir, Lluís-F. Hurtado , José Ángel González and Encarna Segarra",
|
| 20 |
+
title = "DACSA: a Dataset for Automatic summarization of Catalan and Spanish
|
| 21 |
+
newspaper Articles",
|
| 22 |
+
note = "Unsubmitted",
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
@unpublished{NAS,
|
| 26 |
+
author = "Vicent Ahuir, Lluís-F. Hurtado , José Ángel González and Encarna Segarra",
|
| 27 |
+
title = "NAS CA and NAS ES : Two monolingual pre-trained models for
|
| 28 |
+
abstractive summarization in Catalan and Spanish",
|
| 29 |
+
note = "Submitted to the Special Issue on Current Approaches and Applications in Natural Language Processing (Applied Sciences)",
|
| 30 |
+
}
|