Update README.md
Browse files
README.md
CHANGED
|
@@ -19,10 +19,10 @@ The model is similar to [gpt2](https://huggingface.co/gpt2) in that it shares it
|
|
| 19 |
The model parameters of a [GPT2LMHeadModel](https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/gpt2#transformers.GPT2LMHeadModel) model were randomly initialized and pre-trained from scratch using a dataset consisting only of news.
|
| 20 |
|
| 21 |
## Training Data
|
| 22 |
-
The model's training data consists of ~13,000,000
|
| 23 |
|
| 24 |
## How to use
|
| 25 |
-
The model can be used with the
|
| 26 |
```python
|
| 27 |
>>> from transformers import pipeline
|
| 28 |
>>> generator = pipeline('text-generation', model='andyreas/newsgpt')
|
|
|
|
| 19 |
The model parameters of a [GPT2LMHeadModel](https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/gpt2#transformers.GPT2LMHeadModel) model were randomly initialized and pre-trained from scratch using a dataset consisting only of news.
|
| 20 |
|
| 21 |
## Training Data
|
| 22 |
+
The model's training data consists of ~13,000,000 English articles from ~90 outlets, which each consists of a headline (title) and a subheading (description). The articles were collected from the [Sciride News Mine](http://sciride.org/news.html), after which some additional cleaning was performed on the data, such as removing duplicate articles and removing repeated "outlet tags" appearing before or after headlines such as "| Daily Mail Online".
|
| 23 |
|
| 24 |
## How to use
|
| 25 |
+
The model can be used with the HuggingFace pipeline like so:
|
| 26 |
```python
|
| 27 |
>>> from transformers import pipeline
|
| 28 |
>>> generator = pipeline('text-generation', model='andyreas/newsgpt')
|