NbAiLab
/

nb-sbert-v2-base

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

dataset_size:527098

loss:MultipleNegativesRankingLoss

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

vlhandfo commited on 5 days ago

Commit

84bedb8

·

1 Parent(s): 2e5f8c6

Update README.md

Files changed (1) hide show

README.md +23 -1

README.md CHANGED Viewed

@@ -86,7 +86,7 @@ The model maps sentences & paragraphs to a 768-dimensional dense vector space an
 ### EU AI Act
-This release is a **non-generative encoder model** whose outputs are vectors/scores rather than language or media. Its intended functionality is limited to representation, retrieval, ranking, or classification support. On that basis, the release is preliminarily assessed as not falling within the provider obligations for GPAI models under the EU AI Act definitions, subject to legal confirmation if capability scope or marketed generality changes.
 ### Model Sources
@@ -504,6 +504,28 @@ You can finetune this model on your own dataset.
 }
 ```
 <!--
 ## Glossary

 ### EU AI Act
+This release is a **non-generative encoder model** whose outputs are vectors/scores rather than language or media. Its intended functionality is limited to representation, retrieval, ranking, or classification support. On that basis, the release is preliminarily assessed as not falling within the provider obligations for GPAI models under the EU AI Act definitions, subject to legal confirmation if capability scope or marketed generality changes. For more information, see the Model Documentation Form [here](https://huggingface.co/NbAiLab/nb-sbert-v2-base/tree/main).
 ### Model Sources
 }
 ```
+#### NbAiLab/nb-bert-base
+```bibtex
+@inproceedings{kummervold-etal-2021-operationalizing,
+  title     = {Operationalizing a National Digital Library: The Case for a {N}orwegian Transformer Model},
+  author    = {Kummervold, Per E  and
+               De la Rosa, Javier  and
+               Wetjen, Freddy  and
+               Brygfjeld, Svein Arne},
+  booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
+  year      = {2021},
+  address   = {Reykjavik, Iceland (Online)},
+  publisher = {Linköping University Electronic Press, Sweden},
+  url       = {https://aclanthology.org/2021.nodalida-main.3},
+  pages     = {20--29},
+  abstract  = {In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library.
+  The resulting Bidirectional Encoder Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models
+  in several token and sequence classification tasks for both Norwegian Bokmål and Norwegian Nynorsk. Our model also improves the mBERT performance for other
+  languages present in the corpus such as English, Swedish, and Danish. For languages not included in the corpus, the weights degrade moderately while keeping strong multilingual properties. Therefore,
+  we show that building high-quality models within a memory institution using somewhat noisy optical character recognition (OCR) content is feasible, and we hope to pave the way for other memory institutions to follow.},
+}
+```
 <!--
 ## Glossary