NbAiLab
/

nb-sbert-base

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

pere commited on Nov 21, 2022

Commit

15002e3

·

1 Parent(s): 011fb39

Update README.md

Files changed (1) hide show

README.md +9 -2

README.md CHANGED Viewed

@@ -58,9 +58,14 @@ kw_model.extract_keywords(doc, stop_words=None)
 The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
-## Keyword Extraction
-[ToDo - Per Egil - https://github.com/MaartenGr/BERTopic]
 ## Similarity Search
 Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
@@ -82,6 +87,7 @@ model = SentenceTransformer('NbAiLab/nb-sbert')
 embeddings = model.encode(sentences)
 index, index_infos = build_index(embeddings, save_on_disk=False)
 query = model.encode(["A young boy"])
 _, index_matches = index.search(query, 1)
 print(index_matches)
@@ -163,6 +169,7 @@ print(scipy_cosine_scores)
 ```
 # Evaluation and Parameters
 ## Evaluation

 The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
+## Topic Modeling
+To analyse a group of documents and determine the topics, has a lot of use cases. [BERTopic](https://github.com/MaartenGr/BERTopic) combines the power of sentence transformers with c-TF-IDF to create clusters for easily interpretable topics.
+It would take too much time to explain topic modeling here. Instead we recommend that you take a look at the link above, as well as the [dokumentation](https://maartengr.github.io/BERTopic/index.html). The main adaptation you would need to do to use the Norwegian nb-sbert, is to add the following:
+```python
+topic_model = BERTopic(embedding_model='NbAiLab/nb-sbert').fit(docs)
+```
 ## Similarity Search
 Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
 embeddings = model.encode(sentences)
 index, index_infos = build_index(embeddings, save_on_disk=False)
+# Search for the closest matches
 query = model.encode(["A young boy"])
 _, index_matches = index.search(query, 1)
 print(index_matches)
 ```
 # Evaluation and Parameters
 ## Evaluation