Sentence Similarity
sentence-transformers
Safetensors
Norwegian
bert
feature-extraction
dense
Generated from Trainer
dataset_size:527098
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
🇪🇺 Region: EU
Instructions to use NbAiLab/nb-sbert-v2-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use NbAiLab/nb-sbert-v2-large with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("NbAiLab/nb-sbert-v2-large") sentences = [ "The man talked to a girl over the internet camera.", "A group of elderly people pose around a dining table.", "A teenager talks to a girl over a webcam.", "There is no 'still' that is not relative to some other object." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -68,7 +68,7 @@ model-index:
|
|
| 68 |
|
| 69 |
# SentenceTransformer based on NbAiLab/nb-bert-large
|
| 70 |
|
| 71 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [NbAiLab/nb-bert-large](https://huggingface.co/NbAiLab/nb-bert-large).
|
| 72 |
|
| 73 |
The model maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. The easiest way is to simply measure the cosine distance between two sentences. Sentences that are close to each other in meaning, will have a small cosine distance and a similarity close to 1. The model is trained in such a way that similar sentences in different languages should also be close to each other. Ideally, an English-Norwegian sentence pair should have high similarity.
|
| 74 |
|
|
@@ -547,7 +547,7 @@ You can finetune this model on your own dataset.
|
|
| 547 |
year = {2021},
|
| 548 |
address = {Reykjavik, Iceland (Online)},
|
| 549 |
publisher = {Linköping University Electronic Press, Sweden},
|
| 550 |
-
url = {https://
|
| 551 |
pages = {20--29},
|
| 552 |
abstract = {In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library.
|
| 553 |
The resulting Bidirectional Encoder Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models
|
|
|
|
| 68 |
|
| 69 |
# SentenceTransformer based on NbAiLab/nb-bert-large
|
| 70 |
|
| 71 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [NbAiLab/nb-bert-large](https://huggingface.co/NbAiLab/nb-bert-large). It builds on the previous work of the existing [NbAiLab/nb-sbert-base](https://huggingface.co/NbAiLab/nb-sbert-base) model, using a larger foundational model and providing a larger max sequence length for inputs.
|
| 72 |
|
| 73 |
The model maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. The easiest way is to simply measure the cosine distance between two sentences. Sentences that are close to each other in meaning, will have a small cosine distance and a similarity close to 1. The model is trained in such a way that similar sentences in different languages should also be close to each other. Ideally, an English-Norwegian sentence pair should have high similarity.
|
| 74 |
|
|
|
|
| 547 |
year = {2021},
|
| 548 |
address = {Reykjavik, Iceland (Online)},
|
| 549 |
publisher = {Linköping University Electronic Press, Sweden},
|
| 550 |
+
url = {https://huggingface.co/papers/2104.09617},
|
| 551 |
pages = {20--29},
|
| 552 |
abstract = {In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library.
|
| 553 |
The resulting Bidirectional Encoder Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models
|