BM-K commited on
Commit
f455d76
·
verified ·
1 Parent(s): 2ca4e16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -73,32 +73,32 @@ Descriptions of the benchmark datasets used for evaluation are as follows:
73
  A dataset for retrieving relevant content from web and news articles in Korean.
74
  - **MultiLongDocRetrieval**
75
  A long-document retrieval benchmark based on Korean Wikipedia and mC4 corpus.
76
- - **XPQARetrieval**
77
- A real-world dataset constructed from user queries and relevant product documents in a Korean e-commerce platform.
78
 
79
  > **Tip:**
80
- > While many benchmark datasets are available for evaluation, in this project we chose to use only those that contain clean positive documents for each query. Keep in mind that a benchmark dataset is just thata benchmark. For real-world applications, it is best to construct an evaluation dataset tailored to your specific domain and evaluate embedding models, such as PIXIE, in that environment to determine the most suitable one.
81
 
82
  #### Sparse Embedding
83
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
84
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
85
- | telepix/PIXIE-Splade-Preview | 0.1B | 0.6677 | 0.6238 | 0.6628 | 0.6831 | 0.7009 |
86
  | | | | | | | |
87
- | [BM25](https://github.com/xhluca/bm25s) | N/A | 0.4251 | 0.3798 | 0.4238 | 0.4400 | 0.4566 |
88
- | naver/splade-v3 | 0.1B | 0.0587 | 0.0468 | 0.0568 | 0.0620 | 0.0690 |
89
 
90
  #### Dense Embedding
91
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
92
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
93
- | telepix/PIXIE-Rune-Preview | 0.5B | 0.6905 | 0.6461 | 0.6859 | 0.7063 | 0.7238 |
 
 
94
  | | | | | | | |
95
- | nlpai-lab/KURE-v1 | 0.5B | 0.6751 | 0.6277 | 0.6725 | 0.6907 | 0.7095 |
96
- | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.6592 | 0.6118 | 0.6542 | 0.6759 | 0.6949 |
97
- | BAAI/bge-m3 | 0.5B | 0.6573 | 0.6099 | 0.6533 | 0.6732 | 0.6930 |
98
- | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6321 | 0.5894 | 0.6274 | 0.6455 | 0.6662 |
99
- | jinaai/jina-embeddings-v3 | 0.6B | 0.6293 | 0.5800 | 0.6254 | 0.6456 | 0.6665 |
100
- | Alibaba-NLP/gte-multilingual-base | 0.3B | 0.6111 | 0.5542 | 0.6089 | 0.6302 | 0.6511 |
101
- | openai/text-embedding-3-large | N/A | 0.6015 | 0.5466 | 0.5999 | 0.6187 | 0.6409 |
102
 
103
  ## Direct Use (Inverted-Index Retrieval)
104
 
 
73
  A dataset for retrieving relevant content from web and news articles in Korean.
74
  - **MultiLongDocRetrieval**
75
  A long-document retrieval benchmark based on Korean Wikipedia and mC4 corpus.
 
 
76
 
77
  > **Tip:**
78
+ > While many benchmark datasets are available for evaluation, in this project we chose to use only those that contain clean positive documents for each query. Keep in mind that a benchmark dataset is just that a benchmark. For real-world applications, it is best to construct an evaluation dataset tailored to your specific domain and evaluate embedding models, such as PIXIE, in that environment to determine the most suitable one.
79
 
80
  #### Sparse Embedding
81
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
82
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
83
+ | telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
84
  | | | | | | | |
85
+ | [BM25](https://github.com/xhluca/bm25s) | N/A | 0.4714 | 0.4194 | 0.4708 | 0.4886 | 0.5071 |
86
+ | naver/splade-v3 | 0.1B | 0.0582 | 0.0462 | 0.0566 | 0.0612 | 0.0685 |
87
 
88
  #### Dense Embedding
89
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
90
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
91
+ | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
92
+ | telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
93
+ | telepix/PIXIE-Rune-Preview | 0.5B | 0.7383 | 0.6936 | 0.7356 | 0.7545 | 0.7698 |
94
  | | | | | | | |
95
+ | nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
96
+ | BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
97
+ | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
98
+ | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
99
+ | jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
100
+ | Alibaba-NLP/gte-multilingual-base | 0.3B | 0.6679 | 0.6068 | 0.6673 | 0.6892 | 0.7084 |
101
+ | openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |
102
 
103
  ## Direct Use (Inverted-Index Retrieval)
104