Update README.md
Browse files
README.md
CHANGED
|
@@ -12,13 +12,13 @@ license: mit
|
|
| 12 |
pipeline_tag: fill-mask
|
| 13 |
---
|
| 14 |
|
| 15 |
-
#
|
| 16 |
|
| 17 |
-
**
|
| 18 |
- [HMD14](https://bl.iro.bl.uk/concern/datasets/2800eb7d-8b49-4398-a6e9-c2c5692a1304)
|
| 19 |
- [LwM](https://bl.iro.bl.uk/concern/datasets/99dc570a-9460-48ac-baed-9d2b8c4c13c0?locale=en)
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
---
|
| 24 |
|
|
@@ -42,7 +42,7 @@ NewsBERT retains the architecture and vocabulary of BERT-base (uncased), with on
|
|
| 42 |
```python
|
| 43 |
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
|
| 44 |
|
| 45 |
-
model_id = "
|
| 46 |
|
| 47 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 48 |
model = AutoModelForMaskedLM.from_pretrained(model_id)
|
|
@@ -63,7 +63,7 @@ for p in preds:
|
|
| 63 |
import torch
|
| 64 |
from transformers import AutoTokenizer, AutoModel
|
| 65 |
|
| 66 |
-
model_id = "
|
| 67 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 68 |
|
| 69 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
|
|
|
| 12 |
pipeline_tag: fill-mask
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# NewsBERT_1800-1920
|
| 16 |
|
| 17 |
+
**NewsBERT_1800-1920** is a domain-adapted masked language model based on [`google-bert/bert-base-uncased`](https://huggingface.co/google-bert/bert-base-uncased). It has been fine-tuned with a **masked language modeling (MLM)** objective on all **historical English newspaper text** (1800-1920) from the following two collections:
|
| 18 |
- [HMD14](https://bl.iro.bl.uk/concern/datasets/2800eb7d-8b49-4398-a6e9-c2c5692a1304)
|
| 19 |
- [LwM](https://bl.iro.bl.uk/concern/datasets/99dc570a-9460-48ac-baed-9d2b8c4c13c0?locale=en)
|
| 20 |
|
| 21 |
+
NewsBERT_1800-1920 retains the architecture and vocabulary of BERT-base (uncased), with only weights being adapted to these datasets.
|
| 22 |
|
| 23 |
---
|
| 24 |
|
|
|
|
| 42 |
```python
|
| 43 |
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
|
| 44 |
|
| 45 |
+
model_id = "TextMachineProject/NewsBERT_1800-1920"
|
| 46 |
|
| 47 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 48 |
model = AutoModelForMaskedLM.from_pretrained(model_id)
|
|
|
|
| 63 |
import torch
|
| 64 |
from transformers import AutoTokenizer, AutoModel
|
| 65 |
|
| 66 |
+
model_id = "TextMachineProject/NewsBERT_1800-1920"
|
| 67 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 68 |
|
| 69 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|