namdp-ptit
/

ViDense

@@ -95,22 +95,22 @@ print(cosine_sim_2.item())  # 0.9861876964569092
 ## Performance
 Below is a comparision table of the results I achieved compared to some other embedding models on three
-benchmarks: [ZAC](https://huggingface.co/datasets/GreenNode/zalo-ai-legal-text-retrieval-vn/viewer/default?views%5B%5D=default_train), [WebFaq](https://huggingface.co/datasets/PaDaS-Lab/webfaq-retrieval), [OwiFaq](https://huggingface.co/datasets/PaDaS-Lab/owi-faq-retrieval), [ViQuAD2.0](https://huggingface.co/datasets/taidng/UIT-ViQuAD2.0), [Vietnamese-Legal](https://huggingface.co/datasets/CATI-AI/vietnamese-legal-retrieval-with-negatives)
 with metric **Recall@3**
-| Model Name                                                                                                          | ZAC       | WebFaq    | OwiFaq    | ViQuAD2.0 | Vietnamese-Legal |
-|---------------------------------------------------------------------------------------------------------------------|:----------|:----------|:----------|:----------|:-----------------|
-| [namdp-ptit/ViDense](https://huggingface.co/namdp-ptit/ViDense)                                                     | **54.72** | 82.26     | 85.62     | **61.28** | **58.42**        |
-| [VoVanPhuc/sup-SimCSE-VietNamese-phobert-base](https://huggingface.co/VoVanPhuc/sup-SimCSE-VietNamese-phobert-base) | 53.64     | 81.52     | 85.02     | 59.12     | 55.70            |
-| [keepitreal/vietnamese-sbert](https://huggingface.co/keepitreal/vietnamese-sbert)                                   | 50.45     | 80.54     | 78.58     | 52.67     | 51.86            |
-| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)                                                                   | 46.12     | **83.45** | **86.08** | 58.27     | 49.02            |
 Here are the information of these 3 benchmarks:
 * ZAC: merge train and test into a new benchmark, ~ 3200 queries, ~ 330K documents in corpus.
 * WebFAQ and OwiFaq: merge train and test into a new benchmark, ~ 124K queries, ~ 124K documents in corpus.
 * ViQuAD2.0: merge train, validation and test into a new benchmark, ~ 39.6K queries, ~ 39.6K documents in corpus.
-* Vietnamese-Legal: ~ 144K queries, ~ 144K documents in corpus.
 ## Contact

 ## Performance
 Below is a comparision table of the results I achieved compared to some other embedding models on three
+benchmarks: [ZAC](https://huggingface.co/datasets/GreenNode/zalo-ai-legal-text-retrieval-vn/viewer/default?views%5B%5D=default_train), [WebFaq](https://huggingface.co/datasets/PaDaS-Lab/webfaq-retrieval), [OwiFaq](https://huggingface.co/datasets/PaDaS-Lab/owi-faq-retrieval), [ViQuAD2.0](https://huggingface.co/datasets/taidng/UIT-ViQuAD2.0), [ViLegal](https://huggingface.co/datasets/CATI-AI/vietnamese-legal-retrieval-with-negatives)
 with metric **Recall@3**
+| Model Name                                                                                                          | ZAC       | WebFaq    | OwiFaq    | ViQuAD2.0 | ViLegal   |
+|---------------------------------------------------------------------------------------------------------------------|:----------|:----------|:----------|:----------|:----------|
+| [namdp-ptit/ViDense](https://huggingface.co/namdp-ptit/ViDense)                                                     | **54.72** | 82.26     | 85.62     | **61.28** | **58.42** |
+| [VoVanPhuc/sup-SimCSE-VietNamese-phobert-base](https://huggingface.co/VoVanPhuc/sup-SimCSE-VietNamese-phobert-base) | 53.64     | 81.52     | 85.02     | 59.12     | 55.70     |
+| [keepitreal/vietnamese-sbert](https://huggingface.co/keepitreal/vietnamese-sbert)                                   | 50.45     | 80.54     | 78.58     | 52.67     | 51.86     |
+| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)                                                                   | 46.12     | **83.45** | **86.08** | 58.27     | 49.02     |
 Here are the information of these 3 benchmarks:
 * ZAC: merge train and test into a new benchmark, ~ 3200 queries, ~ 330K documents in corpus.
 * WebFAQ and OwiFaq: merge train and test into a new benchmark, ~ 124K queries, ~ 124K documents in corpus.
 * ViQuAD2.0: merge train, validation and test into a new benchmark, ~ 39.6K queries, ~ 39.6K documents in corpus.
+* ViLegal: ~ 144K queries, ~ 144K documents in corpus.
 ## Contact