BM-K commited on
Commit
1f684cf
·
verified ·
1 Parent(s): 228b985

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -13
README.md CHANGED
@@ -60,19 +60,6 @@ All evaluations were conducted using the open-source **[Korean-MTEB-Retrieval-Ev
60
  Our model, **telepix/PIXIE-Splade-Preview**, achieves strong performance across most metrics and benchmarks,
61
  demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.
62
 
63
- | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
64
- |------|:---:|:---:|:---:|:---:|:---:|:---:|
65
- | telepix/PIXIE-Rune-Preview | 0.5B | 0.6905 | 0.6461 | 0.6859 | 0.7063 | 0.7238 |
66
- | telepix/PIXIE-Splade-Preview | 0.1B | **0.6677** | **0.6238** | **0.6628** | **0.6831** | **0.7009** |
67
- | | | | | | | |
68
- | nlpai-lab/KURE-v1 | 0.5B | 0.6751 | 0.6277 | 0.6725 | 0.6907 | 0.7095 |
69
- | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.6592 | 0.6118 | 0.6542 | 0.6759 | 0.6949 |
70
- | BAAI/bge-m3 | 0.5B | 0.6573 | 0.6099 | 0.6533 | 0.6732 | 0.6930 |
71
- | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6321 | 0.5894 | 0.6274 | 0.6455 | 0.6662 |
72
- | jinaai/jina-embeddings-v3 | 0.6B | 0.6293 | 0.5800 | 0.6254 | 0.6456 | 0.6665 |
73
- | Alibaba-NLP/gte-multilingual-base | 0.3B | 0.6111 | 0.5542 | 0.6089 | 0.6302 | 0.6511 |
74
- | openai/text-embedding-3-large | N/A | 0.6015 | 0.5466 | 0.5999 | 0.6187 | 0.6409 |
75
-
76
  Descriptions of the benchmark datasets used for evaluation are as follows:
77
  - **Ko-StrategyQA**
78
  A Korean multi-hop open-domain question answering dataset designed for complex reasoning over multiple documents.
@@ -89,6 +76,27 @@ Descriptions of the benchmark datasets used for evaluation are as follows:
89
  - **XPQARetrieval**
90
  A real-world dataset constructed from user queries and relevant product documents in a Korean e-commerce platform.
91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  ## Direct Use (Inverted-Index Retrieval)
93
 
94
  First install the Sentence Transformers library:
 
60
  Our model, **telepix/PIXIE-Splade-Preview**, achieves strong performance across most metrics and benchmarks,
61
  demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  Descriptions of the benchmark datasets used for evaluation are as follows:
64
  - **Ko-StrategyQA**
65
  A Korean multi-hop open-domain question answering dataset designed for complex reasoning over multiple documents.
 
76
  - **XPQARetrieval**
77
  A real-world dataset constructed from user queries and relevant product documents in a Korean e-commerce platform.
78
 
79
+ #### Sparse Embedding
80
+ | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
81
+ |------|:---:|:---:|:---:|:---:|:---:|:---:|
82
+ | telepix/PIXIE-Splade-Preview | Sparse(0.1B) | 0.6677 | 0.6238 | 0.6628 | 0.6831 | 0.7009 |
83
+ | | | | | | | |
84
+ | [BM25](https://github.com/xhluca/bm25s) | Sparse | 0.4251 | 0.3798 | 0.4238 | 0.4400 | 0.4566 |
85
+ | naver/splade-v3 | Sparse(0.1B) | 0.1000 | - | - | - | - |
86
+
87
+ #### Dense Embedding
88
+ | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
89
+ |------|:---:|:---:|:---:|:---:|:---:|:---:|
90
+ | telepix/PIXIE-Rune-Preview | 0.5B | 0.6905 | 0.6461 | 0.6859 | 0.7063 | 0.7238 |
91
+ | | | | | | | |
92
+ | nlpai-lab/KURE-v1 | 0.5B | 0.6751 | 0.6277 | 0.6725 | 0.6907 | 0.7095 |
93
+ | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.6592 | 0.6118 | 0.6542 | 0.6759 | 0.6949 |
94
+ | BAAI/bge-m3 | 0.5B | 0.6573 | 0.6099 | 0.6533 | 0.6732 | 0.6930 |
95
+ | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6321 | 0.5894 | 0.6274 | 0.6455 | 0.6662 |
96
+ | jinaai/jina-embeddings-v3 | 0.6B | 0.6293 | 0.5800 | 0.6254 | 0.6456 | 0.6665 |
97
+ | Alibaba-NLP/gte-multilingual-base | 0.3B | 0.6111 | 0.5542 | 0.6089 | 0.6302 | 0.6511 |
98
+ | openai/text-embedding-3-large | N/A | 0.6015 | 0.5466 | 0.5999 | 0.6187 | 0.6409 |
99
+
100
  ## Direct Use (Inverted-Index Retrieval)
101
 
102
  First install the Sentence Transformers library: