namdp-ptit
/

ViDense

@@ -29,7 +29,8 @@ base_model:
 ## Introduce
-**ViDense** is a **VietNamese Embedding Model**. Fine-tuned and enhanced with tailored methods, ViDense incorporates advanced
 techniques to optimize performance for text embeddings in various applications.
 Model Configuration and Methods:
@@ -94,20 +95,22 @@ print(cosine_sim_2.item())  # 0.9861876964569092
 ## Performance
 Below is a comparision table of the results I achieved compared to some other embedding models on three
-benchmarks: [ZAC](https://huggingface.co/datasets/GreenNode/zalo-ai-legal-text-retrieval-vn/viewer/default?views%5B%5D=default_train), [WebFaq](https://huggingface.co/datasets/PaDaS-Lab/webfaq-retrieval), [OwiFaq](https://huggingface.co/datasets/PaDaS-Lab/owi-faq-retrieval)
 with metric **Recall@3**
-| Model Name                                                                                                          | ZAC       | WebFaq    | OwiFaq    |
-|---------------------------------------------------------------------------------------------------------------------|:----------|:----------|:----------|
-| [namdp-ptit/ViDense](https://huggingface.co/namdp-ptit/ViDense)                                                     | **54.72** | 82.26     | 85.62     |
-| [VoVanPhuc/sup-SimCSE-VietNamese-phobert-base](https://huggingface.co/VoVanPhuc/sup-SimCSE-VietNamese-phobert-base) | 53.64     | 81.52     | 85.02     |
-| [keepitreal/vietnamese-sbert](https://huggingface.co/keepitreal/vietnamese-sbert)                                   | 50.45     | 80.54     | 78.58     |
-| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)                                                                   | 46.12     | **83.45** | **86.08** |
 Here are the information of these 3 benchmarks:
-* ZAC: merge train and test into a new benchmark, ~ 3200 queries, ~ 330K documents in corpus
-* WebFAQ and OwiFaq: merge train and test into a new benchmark, ~ 124K queries, ~ 124K documents in corpus
 ## Contact
@@ -142,4 +145,9 @@ Please cite as
   year={2025},
   publisher={Huggingface},
 }
-```

 ## Introduce
+**ViDense** is a **VietNamese Embedding Model**. Fine-tuned and enhanced with tailored methods, ViDense incorporates
+advanced
 techniques to optimize performance for text embeddings in various applications.
 Model Configuration and Methods:
 ## Performance
 Below is a comparision table of the results I achieved compared to some other embedding models on three
+benchmarks: [ZAC](https://huggingface.co/datasets/GreenNode/zalo-ai-legal-text-retrieval-vn/viewer/default?views%5B%5D=default_train), [WebFaq](https://huggingface.co/datasets/PaDaS-Lab/webfaq-retrieval), [OwiFaq](https://huggingface.co/datasets/PaDaS-Lab/owi-faq-retrieval), [ViQuAD2.0](https://huggingface.co/datasets/taidng/UIT-ViQuAD2.0), [Vietnamese-Legal](https://huggingface.co/datasets/CATI-AI/vietnamese-legal-retrieval-with-negatives)
 with metric **Recall@3**
+| Model Name                                                                                                          | ZAC       | WebFaq    | OwiFaq    | ViQuAD2.0 | Vietnamese-Legal |
+|---------------------------------------------------------------------------------------------------------------------|:----------|:----------|:----------|:----------|:-----------------|
+| [namdp-ptit/ViDense](https://huggingface.co/namdp-ptit/ViDense)                                                     | **54.72** | 82.26     | 85.62     | **61.28** | **58.42**        |
+| [VoVanPhuc/sup-SimCSE-VietNamese-phobert-base](https://huggingface.co/VoVanPhuc/sup-SimCSE-VietNamese-phobert-base) | 53.64     | 81.52     | 85.02     | 59.12     | 55.70            |
+| [keepitreal/vietnamese-sbert](https://huggingface.co/keepitreal/vietnamese-sbert)                                   | 50.45     | 80.54     | 78.58     | 52.67     | 51.86            |
+| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)                                                                   | 46.12     | **83.45** | **86.08** | 58.27     | 49.02            |
 Here are the information of these 3 benchmarks:
+* ZAC: merge train and test into a new benchmark, ~ 3200 queries, ~ 330K documents in corpus.
+* WebFAQ and OwiFaq: merge train and test into a new benchmark, ~ 124K queries, ~ 124K documents in corpus.
+* ViQuAD2.0: merge train, validation and test into a new benchmark, ~ 39.6K queries, ~ 39.6K documents in corpus.
+* Vietnamese-Legal: ~ 144K queries, ~ 144K documents in corpus.
 ## Contact
   year={2025},
   publisher={Huggingface},
 }
+```
+Beta
+0 / 0
+used queries
+1