TextMachineProject
/

NewsBERT_1800-1920

feature-extraction

semantic-similarity

historical-text

Model card Files Files and versions

npedrazzini commited on 3 days ago

Commit

edc3efe

·

verified ·

1 Parent(s): 979715e

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -12,13 +12,13 @@ license: mit
 pipeline_tag: fill-mask
 ---
-# NewsBERT
-**NewsBERT** is a domain-adapted masked language model based on [`google-bert/bert-base-uncased`](https://huggingface.co/google-bert/bert-base-uncased). It has been fine-tuned with a **masked language modeling (MLM)** objective on all **historical English newspaper text** (1800-1920) from the following two collections:
 - [HMD14](https://bl.iro.bl.uk/concern/datasets/2800eb7d-8b49-4398-a6e9-c2c5692a1304)
 - [LwM](https://bl.iro.bl.uk/concern/datasets/99dc570a-9460-48ac-baed-9d2b8c4c13c0?locale=en)
-NewsBERT retains the architecture and vocabulary of BERT-base (uncased), with only weights being adapted to these datasets.
 ---
@@ -42,7 +42,7 @@ NewsBERT retains the architecture and vocabulary of BERT-base (uncased), with on
 ```python
 from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
-model_id = "npedrazzini/NewsBERT"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForMaskedLM.from_pretrained(model_id)
@@ -63,7 +63,7 @@ for p in preds:
 import torch
 from transformers import AutoTokenizer, AutoModel
-model_id = "npedrazzini/NewsBERT"
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 tokenizer = AutoTokenizer.from_pretrained(model_id)

 pipeline_tag: fill-mask
 ---
+# NewsBERT_1800-1920
+**NewsBERT_1800-1920** is a domain-adapted masked language model based on [`google-bert/bert-base-uncased`](https://huggingface.co/google-bert/bert-base-uncased). It has been fine-tuned with a **masked language modeling (MLM)** objective on all **historical English newspaper text** (1800-1920) from the following two collections:
 - [HMD14](https://bl.iro.bl.uk/concern/datasets/2800eb7d-8b49-4398-a6e9-c2c5692a1304)
 - [LwM](https://bl.iro.bl.uk/concern/datasets/99dc570a-9460-48ac-baed-9d2b8c4c13c0?locale=en)
+NewsBERT_1800-1920 retains the architecture and vocabulary of BERT-base (uncased), with only weights being adapted to these datasets.
 ---
 ```python
 from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
+model_id = "TextMachineProject/NewsBERT_1800-1920"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForMaskedLM.from_pretrained(model_id)
 import torch
 from transformers import AutoTokenizer, AutoModel
+model_id = "TextMachineProject/NewsBERT_1800-1920"
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 tokenizer = AutoTokenizer.from_pretrained(model_id)