Update README.md
Browse files
README.md
CHANGED
|
@@ -4,12 +4,22 @@ tags:
|
|
| 4 |
- sentence-transformers
|
| 5 |
- feature-extraction
|
| 6 |
- sentence-similarity
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
# FractalGPT/SberDistil
|
| 11 |
|
| 12 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
|
|
|
| 13 |
|
| 14 |
<!--- Describe your model here -->
|
| 15 |
|
|
@@ -32,15 +42,15 @@ embeddings = model.encode(sentences)
|
|
| 32 |
print(embeddings)
|
| 33 |
```
|
| 34 |
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
|
| 45 |
## Full Model Architecture
|
| 46 |
```
|
|
@@ -49,8 +59,4 @@ SentenceTransformer(
|
|
| 49 |
(1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
|
| 50 |
(2): Dense({'in_features': 312, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
|
| 51 |
)
|
| 52 |
-
```
|
| 53 |
-
|
| 54 |
-
## Citing & Authors
|
| 55 |
-
|
| 56 |
-
<!--- Describe where people can find more information -->
|
|
|
|
| 4 |
- sentence-transformers
|
| 5 |
- feature-extraction
|
| 6 |
- sentence-similarity
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
datasets:
|
| 9 |
+
- wikimedia/wikipedia
|
| 10 |
+
- SiberiaSoft/SiberianPersonaChat-2
|
| 11 |
+
language:
|
| 12 |
+
- ru
|
| 13 |
+
- en
|
| 14 |
+
metrics:
|
| 15 |
+
- mse
|
| 16 |
+
library_name: transformers
|
| 17 |
---
|
| 18 |
|
| 19 |
# FractalGPT/SberDistil
|
| 20 |
|
| 21 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
| 22 |
+
This is a fast and small model for solving the problem of determining the proximity between sentences, in the future we will reduce and speed it up. [Project](https://github.com/FractalGPT/ModelEmbedderDistilation)
|
| 23 |
|
| 24 |
<!--- Describe your model here -->
|
| 25 |
|
|
|
|
| 42 |
print(embeddings)
|
| 43 |
```
|
| 44 |
|
| 45 |
+
## Training
|
| 46 |
|
| 47 |
+
* The original weights was taken from [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2).
|
| 48 |
+
*
|
| 49 |
+
* Training was conducted in two stages:
|
| 50 |
+
1. In the first stage, the model was trained on Wikipedia texts (4 million texts) for three epochs.
|
| 51 |
+
<img src="https://github.com/FractalGPT/ModelEmbedderDistilation/blob/main/DistilSBERT/Train/1_st_en.JPG?raw=true" width=700 />
|
| 52 |
+
3. In the second stage, training was conducted on Wikipedia, a dialog dataset, and NLI for one epoch.
|
| 53 |
+
<img src="https://github.com/FractalGPT/ModelEmbedderDistilation/blob/main/DistilSBERT/Train/2_st_en.JPG?raw=true" width=700 />
|
|
|
|
| 54 |
|
| 55 |
## Full Model Architecture
|
| 56 |
```
|
|
|
|
| 59 |
(1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
|
| 60 |
(2): Dense({'in_features': 312, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
|
| 61 |
)
|
| 62 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|