bi-matrix
/

gmatrix-embedding1

@@ -1,16 +1,24 @@
 ---
-library_name: sentence-transformers
 pipeline_tag: sentence-similarity
 tags:
 - sentence-transformers
 - feature-extraction
 - sentence-similarity
 - transformers
 ---
-# {MODEL_NAME}
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 <!--- Describe your model here -->
@@ -29,7 +37,7 @@ Then you can use the model like this:
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
-model = SentenceTransformer('{MODEL_NAME}')
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
@@ -55,8 +63,8 @@ def mean_pooling(model_output, attention_mask):
 sentences = ['This is an example sentence', 'Each sentence is converted']
 # Load model from HuggingFace Hub
-tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
-model = AutoModel.from_pretrained('{MODEL_NAME}')
 # Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -73,69 +81,102 @@ print(sentence_embeddings)
 ```
 ## Evaluation Results
 <!--- Describe how your model was evaluated -->
-For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
-## Training
-The model was trained with the parameters:
-**DataLoader**:
-`sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader` of length 4442 with parameters:
-```
-{'batch_size': 128}
-```
-**Loss**:
-`sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters:
-  ```
-  {'scale': 20.0, 'similarity_fct': 'cos_sim'}
-  ```
 **DataLoader**:
-`torch.utils.data.dataloader.DataLoader` of length 719 with parameters:
 ```
-{'batch_size': 8, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
 ```
 **Loss**:
 `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
-Parameters of the fit()-Method:
-```
-{
-    "epochs": 4,
-    "evaluation_steps": 1000,
-    "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
-    "max_grad_norm": 1.0,
-    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
-    "optimizer_params": {
-        "lr": 1e-06
-    },
-    "scheduler": "WarmupLinear",
-    "steps_per_epoch": null,
-    "warmup_steps": 288,
-    "weight_decay": 0.01
-}
-```
 ## Full Model Architecture
 ```
 SentenceTransformer(
-  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DebertaV2Model
-  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
 )
 ```
 ## Citing & Authors
-<!--- Describe where people can find more information -->

 ---
 pipeline_tag: sentence-similarity
 tags:
 - sentence-transformers
 - feature-extraction
 - sentence-similarity
 - transformers
+datasets:
+- kornlu
+language:
+- ko
+license: cc-by-4.0
 ---
+# bi-matrix/gmatrix-embedding
+해당 모델은 [KF-DeBERTa](https://huggingface.co/kakaobank/kf-deberta-base) 모델과 KorSTS, KorNLI 데이터셋을 활용하였으며, sentence-transformers의 공식 문서 내 소개된 [continue-learning](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py) 방법을 통해 아래와 같이 학습되었습니다.
+1. NLI 데이터셋을 통해 nagative sampling 후 MultipleNegativeRankingLoss 활용 및 STS 데이터셋을 통해 CosineSimilarityLoss를 활용하여 Multi-task Learning 학습 10epoch 진행
+2. Learning Rate를 1e-06으로 줄여서 4epoch 추가 Multi-task 학습 진행
+---
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 <!--- Describe your model here -->
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
+model = SentenceTransformer("bi-matrix/gmatrix-embedding")
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
 sentences = ['This is an example sentence', 'Each sentence is converted']
 # Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained("bi-matrix/gmatrix-embedding")
+model = AutoModel.from_pretrained("bi-matrix/gmatrix-embedding")
 # Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 ```
 ## Evaluation Results
 <!--- Describe how your model was evaluated -->
+KorSTS 평가 데이터셋으로 평가한 결과입니다.
+- Cosine Pearson: 85.77
+- Cosine Spearman: 86.30
+- Manhattan Pearson: 84.84
+- Manhattan Spearman: 85.33
+- Euclidean Pearson: 84.82
+- Euclidean Spearman: 85.29
+- Dot Pearson: 83.19
+- Dot Spearman: 83.19
+<br>
+|model|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
+|:-------------------------|-----------------:|------------------:|--------------------:|---------------------:|--------------------:|---------------------:|--------------:|---------------:|
+|[**gmatrix-embedding**](https://huggingface.co/bi-matrix/gmatrix-embedding)|**85.77**|**86.30**|**84.82**|**85.29**|**84.84**|**85.33**|**83.19**|**83.19**|
+|[kf-deberta-multitask](https://huggingface.co/upskyy/kf-deberta-multitask)|85.75|86.25|84.79|85.25|84.80|85.27|82.93|82.86|
+|[ko-sroberta-multitask](https://huggingface.co/jhgan/ko-sroberta-multitask)|84.77|85.6|83.71|84.40|83.70|84.38|82.42|82.33|
+|[ko-sbert-multitask](https://huggingface.co/jhgan/ko-sbert-multitask)|84.13|84.71|82.42|82.66|82.41|82.69|80.05|79.69|
+|[ko-sroberta-base-nli](https://huggingface.co/jhgan/ko-sroberta-nli)|82.83|83.85|82.87|83.29|82.88|83.28|80.34|79.69|
+|[ko-sbert-nli](https://huggingface.co/jhgan/ko-sbert-multitask)|82.24|83.16|82.19|82.31|82.18|82.3|79.3|78.78|
+|[ko-sroberta-sts](https://huggingface.co/jhgan/ko-sroberta-sts)|81.84|81.82|81.15|81.25|81.14|81.25|79.09|78.54|
+|[ko-sbert-sts](https://huggingface.co/jhgan/ko-sbert-sts)|81.55|81.23|79.94|79.79|79.9|79.75|76.02|75.31|
+<br>
+<!--- Describe how your model was evaluated -->
+G-MATRIX Embedding 데이터셋 측정 결과입니다.
+사람 3명이서 0~5점으로 두 문장간의 유사도를 측정하여 점수를 내고 평균을 구하여 각 모델의 임베딩값을 통해
+코사인 유사도, 유클리디안 거리, 맨하탄 거리, Dot-product를 구하여 피어슨, 스피어만 상관계수를 구한 값입니다.
+- Cosine Pearson: 75.86
+- Cosine Spearman: 65.75
+- Manhattan Pearson: 72.65
+- Manhattan Spearman: 65.20
+- Euclidean Pearson: 72.48
+- Euclidean Spearman: 65.32
+- Dot Pearson: 64.71
+- Dot Spearman: 53.90
+<br>
+model|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
+|:-------------------------|-----------------:|------------------:|--------------------:|---------------------:|--------------------:|---------------------:|--------------:|---------------:|
+|[**gmatrix-embedding**](https://huggingface.co/bi-matrix/gmatrix-embedding)|**75.86**|**65.75**|**72.65**|**65.20**|**72.48**|**65.32**|**64.71**|**53.90**|
+|[ko-sroberta-multitask](https://huggingface.co/jhgan/ko-sroberta-multitask)|71.78|63.16|70.80|63.47|70.89|63.72|53.57|44.23|
+|[bge-m3](https://huggingface.co/BAAI/bge-m3)|64.15|60.65|61.88|60.68|61.88|60.19|64.16|60.71|
+<br>
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6350f6750b94548566da3279/CcK0QL3oQAz7sJOCtH6PB.png)
+<br>
+## G-MATRIX Embedding 레이블링 판단 기준 (KLUE-RoBERTa의 STS 데이터 생성 참고)
+1. 두 문장의 유사한 정도를 보고 0~5점으로 판단
+2. 맞춤법, 띄어쓰기, 온점이나 쉼표 차이는 판단 대상이 아님
+3. 문장의 의도, 표현이 담고 있는 의미를 비교
+4. 두 문장에 공통적으로 사용된 단어의 유무를 찾는 것이 아닌, 문장의 의미가 유사한지를 비교
+5. 0은 의미적 유사성이 없는 경우이고, 5는 의미적으로 동등함을 뜻함
+## Training
+The model was trained with the parameters:
 **DataLoader**:
+`torch.utils.data.dataloader.DataLoader` of length 329 with parameters:
 ```
+{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
 ```
 **Loss**:
 `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
 ## Full Model Architecture
 ```
 SentenceTransformer(
+  (0): Transformer({'max_seq_length': 128, 'do_lower_case': True}) with Transformer model: DeBERTaV2Model
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
 )
 ```
 ## Citing & Authors
+<!--- Describe where people can find more information -->
+[MINSANG SONG] at [BI-Matrix](https://www.bimatrix.co.kr/)