Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ The model parameters of a [RobertaForMaskedLM](https://huggingface.co/docs/trans
|
|
| 19 |
## Training Data
|
| 20 |
The model's training data consists of almost 13,000,000 English articles from ~90 outlets, which each consists of a headline (title) and a subheading (description). The articles were collected from the [Sciride News Mine](http://sciride.org/news.html), after which some additional cleaning was performed on the data, such as removing duplicate articles and removing repeated "outlet tags" appearing before or after headlines such as "| Daily Mail Online".
|
| 21 |
|
| 22 |
-
The cleaned dataset can be found on huggingface [here](https://huggingface.co/datasets/AndyReas/frontpage-news). roberta-news was pre-trained on a large subset (12,928,029 / 13,
|
| 23 |
|
| 24 |
## How to use
|
| 25 |
The model can be used with the HuggingFace pipeline like so:
|
|
|
|
| 19 |
## Training Data
|
| 20 |
The model's training data consists of almost 13,000,000 English articles from ~90 outlets, which each consists of a headline (title) and a subheading (description). The articles were collected from the [Sciride News Mine](http://sciride.org/news.html), after which some additional cleaning was performed on the data, such as removing duplicate articles and removing repeated "outlet tags" appearing before or after headlines such as "| Daily Mail Online".
|
| 21 |
|
| 22 |
+
The cleaned dataset can be found on huggingface [here](https://huggingface.co/datasets/AndyReas/frontpage-news). roberta-news was pre-trained on a large subset (12,928,029 / 13,118,041) of the linked dataset, after repacking the data a bit to avoid abrupt truncation.
|
| 23 |
|
| 24 |
## How to use
|
| 25 |
The model can be used with the HuggingFace pipeline like so:
|