npedrazzini commited on
Commit
edc3efe
·
verified ·
1 Parent(s): 979715e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -12,13 +12,13 @@ license: mit
12
  pipeline_tag: fill-mask
13
  ---
14
 
15
- # NewsBERT
16
 
17
- **NewsBERT** is a domain-adapted masked language model based on [`google-bert/bert-base-uncased`](https://huggingface.co/google-bert/bert-base-uncased). It has been fine-tuned with a **masked language modeling (MLM)** objective on all **historical English newspaper text** (1800-1920) from the following two collections:
18
  - [HMD14](https://bl.iro.bl.uk/concern/datasets/2800eb7d-8b49-4398-a6e9-c2c5692a1304)
19
  - [LwM](https://bl.iro.bl.uk/concern/datasets/99dc570a-9460-48ac-baed-9d2b8c4c13c0?locale=en)
20
 
21
- NewsBERT retains the architecture and vocabulary of BERT-base (uncased), with only weights being adapted to these datasets.
22
 
23
  ---
24
 
@@ -42,7 +42,7 @@ NewsBERT retains the architecture and vocabulary of BERT-base (uncased), with on
42
  ```python
43
  from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
44
 
45
- model_id = "npedrazzini/NewsBERT"
46
 
47
  tokenizer = AutoTokenizer.from_pretrained(model_id)
48
  model = AutoModelForMaskedLM.from_pretrained(model_id)
@@ -63,7 +63,7 @@ for p in preds:
63
  import torch
64
  from transformers import AutoTokenizer, AutoModel
65
 
66
- model_id = "npedrazzini/NewsBERT"
67
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
68
 
69
  tokenizer = AutoTokenizer.from_pretrained(model_id)
 
12
  pipeline_tag: fill-mask
13
  ---
14
 
15
+ # NewsBERT_1800-1920
16
 
17
+ **NewsBERT_1800-1920** is a domain-adapted masked language model based on [`google-bert/bert-base-uncased`](https://huggingface.co/google-bert/bert-base-uncased). It has been fine-tuned with a **masked language modeling (MLM)** objective on all **historical English newspaper text** (1800-1920) from the following two collections:
18
  - [HMD14](https://bl.iro.bl.uk/concern/datasets/2800eb7d-8b49-4398-a6e9-c2c5692a1304)
19
  - [LwM](https://bl.iro.bl.uk/concern/datasets/99dc570a-9460-48ac-baed-9d2b8c4c13c0?locale=en)
20
 
21
+ NewsBERT_1800-1920 retains the architecture and vocabulary of BERT-base (uncased), with only weights being adapted to these datasets.
22
 
23
  ---
24
 
 
42
  ```python
43
  from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
44
 
45
+ model_id = "TextMachineProject/NewsBERT_1800-1920"
46
 
47
  tokenizer = AutoTokenizer.from_pretrained(model_id)
48
  model = AutoModelForMaskedLM.from_pretrained(model_id)
 
63
  import torch
64
  from transformers import AutoTokenizer, AutoModel
65
 
66
+ model_id = "TextMachineProject/NewsBERT_1800-1920"
67
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
68
 
69
  tokenizer = AutoTokenizer.from_pretrained(model_id)