vlhandfo commited on
Commit
84bedb8
·
1 Parent(s): 2e5f8c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -1
README.md CHANGED
@@ -86,7 +86,7 @@ The model maps sentences & paragraphs to a 768-dimensional dense vector space an
86
 
87
  ### EU AI Act
88
 
89
- This release is a **non-generative encoder model** whose outputs are vectors/scores rather than language or media. Its intended functionality is limited to representation, retrieval, ranking, or classification support. On that basis, the release is preliminarily assessed as not falling within the provider obligations for GPAI models under the EU AI Act definitions, subject to legal confirmation if capability scope or marketed generality changes.
90
 
91
  ### Model Sources
92
 
@@ -504,6 +504,28 @@ You can finetune this model on your own dataset.
504
  }
505
  ```
506
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
507
  <!--
508
  ## Glossary
509
 
 
86
 
87
  ### EU AI Act
88
 
89
+ This release is a **non-generative encoder model** whose outputs are vectors/scores rather than language or media. Its intended functionality is limited to representation, retrieval, ranking, or classification support. On that basis, the release is preliminarily assessed as not falling within the provider obligations for GPAI models under the EU AI Act definitions, subject to legal confirmation if capability scope or marketed generality changes. For more information, see the Model Documentation Form [here](https://huggingface.co/NbAiLab/nb-sbert-v2-base/tree/main).
90
 
91
  ### Model Sources
92
 
 
504
  }
505
  ```
506
 
507
+ #### NbAiLab/nb-bert-base
508
+ ```bibtex
509
+ @inproceedings{kummervold-etal-2021-operationalizing,
510
+ title = {Operationalizing a National Digital Library: The Case for a {N}orwegian Transformer Model},
511
+ author = {Kummervold, Per E and
512
+ De la Rosa, Javier and
513
+ Wetjen, Freddy and
514
+ Brygfjeld, Svein Arne},
515
+ booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
516
+ year = {2021},
517
+ address = {Reykjavik, Iceland (Online)},
518
+ publisher = {Linköping University Electronic Press, Sweden},
519
+ url = {https://aclanthology.org/2021.nodalida-main.3},
520
+ pages = {20--29},
521
+ abstract = {In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library.
522
+ The resulting Bidirectional Encoder Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models
523
+ in several token and sequence classification tasks for both Norwegian Bokmål and Norwegian Nynorsk. Our model also improves the mBERT performance for other
524
+ languages present in the corpus such as English, Swedish, and Danish. For languages not included in the corpus, the weights degrade moderately while keeping strong multilingual properties. Therefore,
525
+ we show that building high-quality models within a memory institution using somewhat noisy optical character recognition (OCR) content is feasible, and we hope to pave the way for other memory institutions to follow.},
526
+ }
527
+ ```
528
+
529
  <!--
530
  ## Glossary
531