ELiRF
/

NASCA

text2text-generation

Model card Files Files and versions

José Ángel González commited on Sep 24, 2021

Commit

20fa076

·

1 Parent(s): c33c85c

Update README.md

Files changed (1) hide show

README.md +16 -0

README.md CHANGED Viewed

@@ -12,3 +12,19 @@ widget:
 News Abstractive Summarization for Catalan (NASCA) is a Transformer encoder-decoder model, with the same hyper-parameters than BART, to perform summarization of Catalan news articles. It is pre-trained on a combination of several self-supervised tasks that help to increase the abstractivity of the generated summaries. Four pre-training tasks have been combined: sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation. Catalan newspapers, the Catalan subset of the OSCAR corpus and Wikipedia articles in Catalan were used for pre-training the model (9.3GB of raw text -2.5 millions of documents-).
 NASCA is finetuned for the summarization task on 636.596 (document, summary) pairs from the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA).

 News Abstractive Summarization for Catalan (NASCA) is a Transformer encoder-decoder model, with the same hyper-parameters than BART, to perform summarization of Catalan news articles. It is pre-trained on a combination of several self-supervised tasks that help to increase the abstractivity of the generated summaries. Four pre-training tasks have been combined: sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation. Catalan newspapers, the Catalan subset of the OSCAR corpus and Wikipedia articles in Catalan were used for pre-training the model (9.3GB of raw text -2.5 millions of documents-).
 NASCA is finetuned for the summarization task on 636.596 (document, summary) pairs from the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA).
+More details about the pretraining/finetuning datasets and the models soon:
+@unpublished{DACSA,
+  author = "Vicent Ahuir, Lluís-F. Hurtado , José Ángel González and Encarna Segarra",
+  title = "DACSA: a Dataset for Automatic summarization of Catalan and Spanish
+    newspaper Articles",
+  note = "Unsubmitted",
+}
+@unpublished{NAS,
+  author = "Vicent Ahuir, Lluís-F. Hurtado , José Ángel González and Encarna Segarra",
+  title = "NAS CA and NAS ES : Two monolingual pre-trained models for
+abstractive summarization in Catalan and Spanish",
+  note = "Submitted to the Special Issue on Current Approaches and Applications in Natural Language Processing (Applied Sciences)",
+}