AndyReas
/

roberta-news

@@ -19,7 +19,7 @@ The model parameters of a [RobertaForMaskedLM](https://huggingface.co/docs/trans
 ## Training Data
 The model's training data consists of almost 13,000,000 English articles from ~90 outlets, which each consists of a headline (title) and a subheading (description). The articles were collected from the [Sciride News Mine](http://sciride.org/news.html), after which some additional cleaning was performed on the data, such as removing duplicate articles and removing repeated "outlet tags" appearing before or after headlines such as "| Daily Mail Online".
-The cleaned dataset can be found on huggingface [here](https://huggingface.co/datasets/AndyReas/frontpage-news). roberta-news was pre-trained on a large subset (12,928,029 / 13,219,867) of the linked dataset, after repacking the data a bit to avoid abrupt truncation.
 ## How to use
 The model can be used with the HuggingFace pipeline like so:

 ## Training Data
 The model's training data consists of almost 13,000,000 English articles from ~90 outlets, which each consists of a headline (title) and a subheading (description). The articles were collected from the [Sciride News Mine](http://sciride.org/news.html), after which some additional cleaning was performed on the data, such as removing duplicate articles and removing repeated "outlet tags" appearing before or after headlines such as "| Daily Mail Online".
+The cleaned dataset can be found on huggingface [here](https://huggingface.co/datasets/AndyReas/frontpage-news). roberta-news was pre-trained on a large subset (12,928,029 / 13,118,041) of the linked dataset, after repacking the data a bit to avoid abrupt truncation.
 ## How to use
 The model can be used with the HuggingFace pipeline like so: