Fill-Mask
Transformers
PyTorch
Norwegian
Norwegian Bokmål
Norwegian Nynorsk
BERT
NorBERT
Norwegian
encoder
custom_code
Instructions to use ltg/norbert3-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ltg/norbert3-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="ltg/norbert3-base", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("ltg/norbert3-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - 'no' | |
| - nb | |
| - nn | |
| inference: false | |
| tags: | |
| - BERT | |
| - NorBERT | |
| - Norwegian | |
| - encoder | |
| license: apache-2.0 | |
| # NorBERT 3 base | |
| <img src="https://huggingface.co/ltg/norbert3-base/resolve/main/norbert.png" width=12.5%> | |
| The official release of a new generation of NorBERT language models described in paper [**NorBench — A Benchmark for Norwegian Language Models**](https://aclanthology.org/2023.nodalida-1.61/). Plese read the paper to learn more details about the model. | |
| ## Other sizes: | |
| - [NorBERT 3 xs (15M)](https://huggingface.co/ltg/norbert3-xs) | |
| - [NorBERT 3 small (40M)](https://huggingface.co/ltg/norbert3-small) | |
| - [NorBERT 3 base (123M)](https://huggingface.co/ltg/norbert3-base) | |
| - [NorBERT 3 large (323M)](https://huggingface.co/ltg/norbert3-large) | |
| ## Generative NorT5 siblings: | |
| - [NorT5 xs (32M)](https://huggingface.co/ltg/nort5-xs) | |
| - [NorT5 small (88M)](https://huggingface.co/ltg/nort5-small) | |
| - [NorT5 base (228M)](https://huggingface.co/ltg/nort5-base) | |
| - [NorT5 large (808M)](https://huggingface.co/ltg/nort5-large) | |
| ## Example usage | |
| This model currently needs a custom wrapper from `modeling_norbert.py`, you should therefore load the model with `trust_remote_code=True`. | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForMaskedLM | |
| tokenizer = AutoTokenizer.from_pretrained("ltg/norbert3-base") | |
| model = AutoModelForMaskedLM.from_pretrained("ltg/norbert3-base", trust_remote_code=True) | |
| mask_id = tokenizer.convert_tokens_to_ids("[MASK]") | |
| input_text = tokenizer("Nå ønsker de seg en[MASK] bolig.", return_tensors="pt") | |
| output_p = model(**input_text) | |
| output_text = torch.where(input_text.input_ids == mask_id, output_p.logits.argmax(-1), input_text.input_ids) | |
| # should output: '[CLS] Nå ønsker de seg en ny bolig.[SEP]' | |
| print(tokenizer.decode(output_text[0].tolist())) | |
| ``` | |
| The following classes are currently implemented: `AutoModel`, `AutoModelMaskedLM`, `AutoModelForSequenceClassification`, `AutoModelForTokenClassification`, `AutoModelForQuestionAnswering` and `AutoModeltForMultipleChoice`. | |
| ## Cite us | |
| ```bibtex | |
| @inproceedings{samuel-etal-2023-norbench, | |
| title = "{N}or{B}ench {--} A Benchmark for {N}orwegian Language Models", | |
| author = "Samuel, David and | |
| Kutuzov, Andrey and | |
| Touileb, Samia and | |
| Velldal, Erik and | |
| {\O}vrelid, Lilja and | |
| R{\o}nningstad, Egil and | |
| Sigdel, Elina and | |
| Palatkina, Anna", | |
| booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)", | |
| month = may, | |
| year = "2023", | |
| address = "T{\'o}rshavn, Faroe Islands", | |
| publisher = "University of Tartu Library", | |
| url = "https://aclanthology.org/2023.nodalida-1.61", | |
| pages = "618--633", | |
| abstract = "We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based). Finally, we compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench.", | |
| } | |
| ``` |