namdp-ptit commited on
Commit
dcd991e
·
verified ·
1 Parent(s): b41dfcf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -11
README.md CHANGED
@@ -29,7 +29,8 @@ base_model:
29
 
30
  ## Introduce
31
 
32
- **ViDense** is a **VietNamese Embedding Model**. Fine-tuned and enhanced with tailored methods, ViDense incorporates advanced
 
33
  techniques to optimize performance for text embeddings in various applications.
34
 
35
  Model Configuration and Methods:
@@ -94,20 +95,22 @@ print(cosine_sim_2.item()) # 0.9861876964569092
94
  ## Performance
95
 
96
  Below is a comparision table of the results I achieved compared to some other embedding models on three
97
- benchmarks: [ZAC](https://huggingface.co/datasets/GreenNode/zalo-ai-legal-text-retrieval-vn/viewer/default?views%5B%5D=default_train), [WebFaq](https://huggingface.co/datasets/PaDaS-Lab/webfaq-retrieval), [OwiFaq](https://huggingface.co/datasets/PaDaS-Lab/owi-faq-retrieval)
98
  with metric **Recall@3**
99
 
100
- | Model Name | ZAC | WebFaq | OwiFaq |
101
- |---------------------------------------------------------------------------------------------------------------------|:----------|:----------|:----------|
102
- | [namdp-ptit/ViDense](https://huggingface.co/namdp-ptit/ViDense) | **54.72** | 82.26 | 85.62 |
103
- | [VoVanPhuc/sup-SimCSE-VietNamese-phobert-base](https://huggingface.co/VoVanPhuc/sup-SimCSE-VietNamese-phobert-base) | 53.64 | 81.52 | 85.02 |
104
- | [keepitreal/vietnamese-sbert](https://huggingface.co/keepitreal/vietnamese-sbert) | 50.45 | 80.54 | 78.58 |
105
- | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 46.12 | **83.45** | **86.08** |
106
 
107
  Here are the information of these 3 benchmarks:
108
 
109
- * ZAC: merge train and test into a new benchmark, ~ 3200 queries, ~ 330K documents in corpus
110
- * WebFAQ and OwiFaq: merge train and test into a new benchmark, ~ 124K queries, ~ 124K documents in corpus
 
 
111
 
112
  ## Contact
113
 
@@ -142,4 +145,9 @@ Please cite as
142
  year={2025},
143
  publisher={Huggingface},
144
  }
145
- ```
 
 
 
 
 
 
29
 
30
  ## Introduce
31
 
32
+ **ViDense** is a **VietNamese Embedding Model**. Fine-tuned and enhanced with tailored methods, ViDense incorporates
33
+ advanced
34
  techniques to optimize performance for text embeddings in various applications.
35
 
36
  Model Configuration and Methods:
 
95
  ## Performance
96
 
97
  Below is a comparision table of the results I achieved compared to some other embedding models on three
98
+ benchmarks: [ZAC](https://huggingface.co/datasets/GreenNode/zalo-ai-legal-text-retrieval-vn/viewer/default?views%5B%5D=default_train), [WebFaq](https://huggingface.co/datasets/PaDaS-Lab/webfaq-retrieval), [OwiFaq](https://huggingface.co/datasets/PaDaS-Lab/owi-faq-retrieval), [ViQuAD2.0](https://huggingface.co/datasets/taidng/UIT-ViQuAD2.0), [Vietnamese-Legal](https://huggingface.co/datasets/CATI-AI/vietnamese-legal-retrieval-with-negatives)
99
  with metric **Recall@3**
100
 
101
+ | Model Name | ZAC | WebFaq | OwiFaq | ViQuAD2.0 | Vietnamese-Legal |
102
+ |---------------------------------------------------------------------------------------------------------------------|:----------|:----------|:----------|:----------|:-----------------|
103
+ | [namdp-ptit/ViDense](https://huggingface.co/namdp-ptit/ViDense) | **54.72** | 82.26 | 85.62 | **61.28** | **58.42** |
104
+ | [VoVanPhuc/sup-SimCSE-VietNamese-phobert-base](https://huggingface.co/VoVanPhuc/sup-SimCSE-VietNamese-phobert-base) | 53.64 | 81.52 | 85.02 | 59.12 | 55.70 |
105
+ | [keepitreal/vietnamese-sbert](https://huggingface.co/keepitreal/vietnamese-sbert) | 50.45 | 80.54 | 78.58 | 52.67 | 51.86 |
106
+ | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 46.12 | **83.45** | **86.08** | 58.27 | 49.02 |
107
 
108
  Here are the information of these 3 benchmarks:
109
 
110
+ * ZAC: merge train and test into a new benchmark, ~ 3200 queries, ~ 330K documents in corpus.
111
+ * WebFAQ and OwiFaq: merge train and test into a new benchmark, ~ 124K queries, ~ 124K documents in corpus.
112
+ * ViQuAD2.0: merge train, validation and test into a new benchmark, ~ 39.6K queries, ~ 39.6K documents in corpus.
113
+ * Vietnamese-Legal: ~ 144K queries, ~ 144K documents in corpus.
114
 
115
  ## Contact
116
 
 
145
  year={2025},
146
  publisher={Huggingface},
147
  }
148
+ ```
149
+
150
+ Beta
151
+ 0 / 0
152
+ used queries
153
+ 1