| --- |
| license: cc-by-nc-3.0 |
| language: |
| - da |
| pipeline_tag: fill-mask |
| tags: |
| - bert |
| - danish |
| widget: |
| - text: Hvide blodlegemer beskytter kroppen mod [MASK] |
| --- |
| |
|
|
| # Danish medical BERT |
|
|
| MeDa-BERT was initialized with weights from a [pretrained Danish BERT model](https://huggingface.co/Maltehb/danish-bert-botxo) and pretrained for 48 epochs using the MLM objective on a Danish medical corpus of 123M tokens. |
|
|
| The development of the corpus and model is described further in [this paper](https://aclanthology.org/2023.nodalida-1.31/). |
|
|
| Here is an example on how to load the model in PyTorch using the [🤗Transformers](https://github.com/huggingface/transformers) library: |
|
|
|
|
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForMaskedLM |
| tokenizer = AutoTokenizer.from_pretrained("jannikskytt/MeDa-Bert") |
| model = AutoModelForMaskedLM.from_pretrained("jannikskytt/MeDa-Bert") |
| ``` |
|
|
| ### Citing |
|
|
| ``` |
| @inproceedings{pedersen-etal-2023-meda, |
| title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model", |
| author = "Pedersen, Jannik and |
| Laursen, Martin and |
| Vinholt, Pernille and |
| Savarimuthu, Thiusius Rajeeth", |
| booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)", |
| month = may, |
| year = "2023", |
| address = "T{\'o}rshavn, Faroe Islands", |
| publisher = "University of Tartu Library", |
| url = "https://aclanthology.org/2023.nodalida-1.31", |
| pages = "301--307", |
| } |
| ``` |