Sentence Similarity
sentence-transformers
Safetensors
Russian
xlm-roberta
feature-extraction
text-embeddings-inference
Instructions to use deepvk/USER-bge-m3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use deepvk/USER-bge-m3 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("deepvk/USER-bge-m3") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Inference
- Notebooks
- Google Colab
- Kaggle
Boris Malashenko commited on
Tokenizer fix
Browse filesGot "data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 90 column 3" error. Found that tokenizer.json causes it, reinialized it from previous model.
- tokenizer.json +5 -5
tokenizer.json
CHANGED
|
@@ -85,8 +85,8 @@
|
|
| 85 |
"pre_tokenizer": {
|
| 86 |
"type": "Metaspace",
|
| 87 |
"replacement": "▁",
|
| 88 |
-
"
|
| 89 |
-
"
|
| 90 |
},
|
| 91 |
"post_processor": {
|
| 92 |
"type": "TemplateProcessing",
|
|
@@ -172,8 +172,8 @@
|
|
| 172 |
"decoder": {
|
| 173 |
"type": "Metaspace",
|
| 174 |
"replacement": "▁",
|
| 175 |
-
"
|
| 176 |
-
"
|
| 177 |
},
|
| 178 |
"model": {
|
| 179 |
"type": "Unigram",
|
|
@@ -184846,4 +184846,4 @@
|
|
| 184846 |
],
|
| 184847 |
"byte_fallback": false
|
| 184848 |
}
|
| 184849 |
-
}
|
|
|
|
| 85 |
"pre_tokenizer": {
|
| 86 |
"type": "Metaspace",
|
| 87 |
"replacement": "▁",
|
| 88 |
+
"add_prefix_space": true,
|
| 89 |
+
"prepend_scheme": "always"
|
| 90 |
},
|
| 91 |
"post_processor": {
|
| 92 |
"type": "TemplateProcessing",
|
|
|
|
| 172 |
"decoder": {
|
| 173 |
"type": "Metaspace",
|
| 174 |
"replacement": "▁",
|
| 175 |
+
"add_prefix_space": true,
|
| 176 |
+
"prepend_scheme": "always"
|
| 177 |
},
|
| 178 |
"model": {
|
| 179 |
"type": "Unigram",
|
|
|
|
| 184846 |
],
|
| 184847 |
"byte_fallback": false
|
| 184848 |
}
|
| 184849 |
+
}
|