Update README.md
Browse files
README.md
CHANGED
|
@@ -58,9 +58,14 @@ kw_model.extract_keywords(doc, stop_words=None)
|
|
| 58 |
|
| 59 |
The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
|
| 60 |
|
| 61 |
-
##
|
| 62 |
-
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
## Similarity Search
|
| 66 |
Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
|
|
@@ -82,6 +87,7 @@ model = SentenceTransformer('NbAiLab/nb-sbert')
|
|
| 82 |
embeddings = model.encode(sentences)
|
| 83 |
index, index_infos = build_index(embeddings, save_on_disk=False)
|
| 84 |
|
|
|
|
| 85 |
query = model.encode(["A young boy"])
|
| 86 |
_, index_matches = index.search(query, 1)
|
| 87 |
print(index_matches)
|
|
@@ -163,6 +169,7 @@ print(scipy_cosine_scores)
|
|
| 163 |
|
| 164 |
```
|
| 165 |
|
|
|
|
| 166 |
# Evaluation and Parameters
|
| 167 |
|
| 168 |
## Evaluation
|
|
|
|
| 58 |
|
| 59 |
The [KeyBERT homepage](https://github.com/MaartenGr/KeyBERT) provides other several interesting examples: combining KeyBERT with stop words, extracting longer phrases, or directly producing highlighted text.
|
| 60 |
|
| 61 |
+
## Topic Modeling
|
| 62 |
+
To analyse a group of documents and determine the topics, has a lot of use cases. [BERTopic](https://github.com/MaartenGr/BERTopic) combines the power of sentence transformers with c-TF-IDF to create clusters for easily interpretable topics.
|
| 63 |
|
| 64 |
+
It would take too much time to explain topic modeling here. Instead we recommend that you take a look at the link above, as well as the [dokumentation](https://maartengr.github.io/BERTopic/index.html). The main adaptation you would need to do to use the Norwegian nb-sbert, is to add the following:
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
topic_model = BERTopic(embedding_model='NbAiLab/nb-sbert').fit(docs)
|
| 68 |
+
```
|
| 69 |
|
| 70 |
## Similarity Search
|
| 71 |
Another common use case for a SentenceTransformers model is to find relevant documents or passages of documents given a certain query text. In this scenario, it is pretty common to have a vector database that stores the embedding vectors for all our documents. Then, at runtime, an embedding for the query text is generated and compared efficiently against the vector database.
|
|
|
|
| 87 |
embeddings = model.encode(sentences)
|
| 88 |
index, index_infos = build_index(embeddings, save_on_disk=False)
|
| 89 |
|
| 90 |
+
# Search for the closest matches
|
| 91 |
query = model.encode(["A young boy"])
|
| 92 |
_, index_matches = index.search(query, 1)
|
| 93 |
print(index_matches)
|
|
|
|
| 169 |
|
| 170 |
```
|
| 171 |
|
| 172 |
+
|
| 173 |
# Evaluation and Parameters
|
| 174 |
|
| 175 |
## Evaluation
|