hellonlp
/

simcse-roberta-base-zh

Sentence Similarity

Model card Files Files and versions

hellonlp commited on Jan 29, 2024

Commit

61070a0

·

verified ·

1 Parent(s): 561a030

Update README.md

Files changed (1) hide show

README.md +22 -12

README.md CHANGED Viewed

@@ -8,18 +8,6 @@ pipeline_tag: sentence-similarity
 # SimCSE(sup)
-## Model List
-The evaluation dataset is in Chinese, and we used the same language model **RoBERTa base** on different methods.
-|          Model          | STS-B(w-avg) | ATEC | BQ | LCQMC | PAWSX | Avg. |
-|:-----------------------:|:------------:|:-----------:|:----------|:-------------|:------------:|:----------:|
-|  BERT-Whitening  |  65.27| -| -| -| -| -|
-|  SimBERT   |  70.01| -| -| -| -| -|
-|  SBERT-Whitening  |  71.75| -| -| -| -| -|
-|  [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh)  |  78.61| -| -| -| -| -|
-|  [hellonlp/simcse-base-zh(sup)](https://huggingface.co/hellonlp/simcse-roberta-base-zh)  |  **80.96**| -| -| -| -| -|
 ## Data List
 The following datasets are all in Chinese.
 |          Data          | size(train) | size(valid) | size(test) |
@@ -35,6 +23,28 @@ The following datasets are all in Chinese.
 ## Uses
 You can use our model for encoding sentences into embeddings
 ```python

 # SimCSE(sup)
 ## Data List
 The following datasets are all in Chinese.
 |          Data          | size(train) | size(valid) | size(test) |
+## Model List
+The evaluation dataset is in Chinese, and we used the same language model **RoBERTa base** on different methods.  In addition, considering that the test set of some datasets is small, which may lead to a large deviation in evaluation accuracy, the evaluation data here uses train, valid and test at the same time, and the final evaluation result adopts the **weighted average (w-avg)** method.
+|          Model          | STS-B(w-avg) | ATEC | BQ | LCQMC | PAWSX | Avg. |
+|:-----------------------:|:------------:|:-----------:|:----------|:----------|:----------:|:----------:|
+|  [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh)  |  78.61| -| -| -| -| -|
+|  [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5)  |  79.07| -| -| -| -| -|
+|  [hellonlp/simcse-large-zh](https://huggingface.co/hellonlp/simcse-roberta-large-zh)  |  81.32| -| -| -| -| -|
+|          Model          | STS-B(w-avg) | ATEC | BQ | LCQMC | PAWSX | Avg. |
+|:-----------------------:|:------------:|:-----------:|:----------|:-------------|:------------:|:----------:|
+|  BERT-Whitening  |  65.27| -| -| -| -| -|
+|  SimBERT   |  70.01| -| -| -| -| -|
+|  SBERT-Whitening  |  71.75| -| -| -| -| -|
+|  [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh)  |  78.61| -| -| -| -| -|
+|  [hellonlp/simcse-base-zh(sup)](https://huggingface.co/hellonlp/simcse-roberta-base-zh)  |  **80.96**| -| -| -| -| -|
 ## Uses
 You can use our model for encoding sentences into embeddings
 ```python