djovak
/

embedic-base

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

djovak commited on Sep 9, 2024

Commit

c298ab6

·

1 Parent(s): e84bd54

update readme.md

Files changed (1) hide show

README.md +8 -2

README.md CHANGED Viewed

@@ -45,9 +45,16 @@ print(embeddings)
 ```
 ### Important usage notes
-- "ošišana ćirilica" (usage of c instead of ć, etc...) significantly deacreases search quality
 - The usage of uppercase letters for named entities can significantly improve search quality
 ## Evaluation
@@ -86,7 +93,6 @@ Evaluation datasets will be published as Part of [MTEB benchmark](https://huggin
 If you have any question or sugestion related to this project, you can open an issue or pull request. You can also email me at novakzivanic@gmail.com
 ## Full Model Architecture
 ```
 SentenceTransformer(

 ```
 ### Important usage notes
+- "ošišana latinica" (usage of c instead of ć, etc...) significantly deacreases search quality
 - The usage of uppercase letters for named entities can significantly improve search quality
+## Training
+- Embedić models are fine-tuned from multilingual-e5 models and they come in 3 sizes (small, base, large).
+- Training is done on a single 4070ti super GPU
+- 3-step training: distillation, training on (query, text) pairs and finally fine-tuning with triplets.
 ## Evaluation
 If you have any question or sugestion related to this project, you can open an issue or pull request. You can also email me at novakzivanic@gmail.com
 ## Full Model Architecture
 ```
 SentenceTransformer(