smartmind
/

roberta-ko-small-tsdae

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

Bingsu commited on Sep 19, 2022

Commit

e9a9cb9

·

1 Parent(s): 3ed7855

Update README.md

Files changed (1) hide show

README.md +21 -6

README.md CHANGED Viewed

@@ -5,14 +5,27 @@ tags:
 - feature-extraction
 - sentence-similarity
 - transformers
 ---
 # smartmind/roberta-ko-small-tsdae
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 256 dimensional dense vector space and can be used for tasks like clustering or semantic search.
-<!--- Describe your model here -->
 ## Usage (Sentence-Transformers)
@@ -72,16 +85,18 @@ print(sentence_embeddings)
 ## Evaluation Results
-<!--- Describe how your model was evaluated -->
-For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=smartmind/roberta-ko-small-tsdae)
 ## Full Model Architecture
 ```
 SentenceTransformer(
-  (0): Transformer({'max_seq_length': 508, 'do_lower_case': False}) with Transformer model: RobertaModel
   (1): Pooling({'word_embedding_dimension': 256, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
 )
 ```

 - feature-extraction
 - sentence-similarity
 - transformers
+language:
+- ko
+license:
+- mit
+widget:
+  source_sentence: "대한민국의 수도는 서울입니다."
+  sentences:
+    - "미국의 수도는 뉴욕이 아닙니다."
+    - "대한민국의 수도 요금은 저렴한 편입니다."
+    - "서울은 대한민국의 수도입니다."
 ---
 # smartmind/roberta-ko-small-tsdae
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 256 dimensional dense vector space and can be used for tasks like clustering or semantic search.
+Korean roberta small model pretrained with [TSDAE](https://arxiv.org/abs/2104.06979).
+[TSDAE](https://arxiv.org/abs/2104.06979)로 사전학습된 한국어 roberta모델입니다. 모델의 구조는 [lassl/roberta-ko-small](https://huggingface.co/lassl/roberta-ko-small)과 동일합니다. 토크나이저는 다릅니다.
+sentence-similarity를 구하는 용도로 바로 사용할 수도 있고, 목적에 맞게 파인튜닝하여 사용할 수도 있습니다.
 ## Usage (Sentence-Transformers)
 ## Evaluation Results
+[klue](https://huggingface.co/datasets/klue) STS 데이터에 대해 다음 점수를 얻었습니다. 이 데이터에 대해 파인튜닝하지 **않은** 상태로 구한 점수입니다.
+|split|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
+|-----|--------------|---------------|-----------------|------------------|-----------------|------------------|-----------|------------|
+|train|0.8735|0.8676|0.8268|0.8357|0.8248|0.8336|0.8449|0.8383|
+|validation|0.5409|0.5349|0.4786|0.4657|0.4775|0.4625|0.5284|0.5252|
 ## Full Model Architecture
 ```
 SentenceTransformer(
+  (0): Transformer({'max_seq_length': 508, 'do_lower_case': False}) with Transformer model: RobertaModel
   (1): Pooling({'word_embedding_dimension': 256, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
 )
 ```