Update README.md
Browse files
README.md
CHANGED
|
@@ -52,21 +52,23 @@ We report **Normalized Discounted Cumulative Gain (NDCG)** scores, which measure
|
|
| 52 |
|
| 53 |
All evaluations were conducted using the open-source **[Korean-MTEB-Retrieval-Evaluators](https://github.com/BM-K/Korean-MTEB-Retrieval-Evaluators)** codebase to ensure consistent dataset handling, indexing, retrieval, and NDCG@k computation across models.
|
| 54 |
|
| 55 |
-
####
|
| 56 |
Our model, **telepix/PIXIE-Rune-Preview**, achieves strong performance across most metrics and benchmarks, demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.
|
| 57 |
|
| 58 |
| Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 59 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 60 |
-
|
|
| 61 |
-
| telepix/PIXIE-
|
|
|
|
|
|
|
| 62 |
| | | | | | | |
|
| 63 |
-
| nlpai-lab/KURE-v1 | 0.5B | 0.
|
| 64 |
-
|
|
| 65 |
-
|
|
| 66 |
-
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.
|
| 67 |
-
| jinaai/jina-embeddings-v3 | 0.
|
| 68 |
-
| Alibaba-NLP/gte-multilingual-base | 0.3B | 0.
|
| 69 |
-
| openai/text-embedding-3-large | N/A | 0.
|
| 70 |
|
| 71 |
Descriptions of the benchmark datasets used for evaluation are as follows:
|
| 72 |
- **Ko-StrategyQA**
|
|
@@ -81,17 +83,17 @@ Descriptions of the benchmark datasets used for evaluation are as follows:
|
|
| 81 |
A dataset for retrieving relevant content from web and news articles in Korean.
|
| 82 |
- **MultiLongDocRetrieval**
|
| 83 |
A long-document retrieval benchmark based on Korean Wikipedia and mC4 corpus.
|
| 84 |
-
- **XPQARetrieval**
|
| 85 |
-
A real-world dataset constructed from user queries and relevant product documents in a Korean e-commerce platform.
|
| 86 |
|
| 87 |
> **Tip:**
|
| 88 |
-
> While many benchmark datasets are available for evaluation, in this project we chose to use only those that contain clean positive documents for each query. Keep in mind that a benchmark dataset is just that
|
| 89 |
|
| 90 |
#### 7 Datasets of BEIR (English)
|
| 91 |
Our model, **telepix/PIXIE-Rune-Preview**, achieves strong performance on a wide range of tasks, including fact verification, multi-hop question answering, financial QA, and scientific document retrieval, demonstrating competitive generalization across diverse domains.
|
| 92 |
|
| 93 |
| Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 94 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
|
|
|
|
|
| 95 |
| **telepix/PIXIE-Rune-Preview** | 0.5B | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
|
| 96 |
| | | | | | | |
|
| 97 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
|
|
|
|
| 52 |
|
| 53 |
All evaluations were conducted using the open-source **[Korean-MTEB-Retrieval-Evaluators](https://github.com/BM-K/Korean-MTEB-Retrieval-Evaluators)** codebase to ensure consistent dataset handling, indexing, retrieval, and NDCG@k computation across models.
|
| 54 |
|
| 55 |
+
#### 6 Datasets of MTEB (Korean)
|
| 56 |
Our model, **telepix/PIXIE-Rune-Preview**, achieves strong performance across most metrics and benchmarks, demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.
|
| 57 |
|
| 58 |
| Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 59 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 60 |
+
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
|
| 61 |
+
| telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
|
| 62 |
+
| **telepix/PIXIE-Rune-Preview** | 0.5B | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
|
| 63 |
+
| telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
|
| 64 |
| | | | | | | |
|
| 65 |
+
| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
|
| 66 |
+
| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
|
| 67 |
+
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
|
| 68 |
+
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
|
| 69 |
+
| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
|
| 70 |
+
| Alibaba-NLP/gte-multilingual-base | 0.3B | 0.6679 | 0.6068 | 0.6673 | 0.6892 | 0.7084 |
|
| 71 |
+
| openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |
|
| 72 |
|
| 73 |
Descriptions of the benchmark datasets used for evaluation are as follows:
|
| 74 |
- **Ko-StrategyQA**
|
|
|
|
| 83 |
A dataset for retrieving relevant content from web and news articles in Korean.
|
| 84 |
- **MultiLongDocRetrieval**
|
| 85 |
A long-document retrieval benchmark based on Korean Wikipedia and mC4 corpus.
|
|
|
|
|
|
|
| 86 |
|
| 87 |
> **Tip:**
|
| 88 |
+
> While many benchmark datasets are available for evaluation, in this project we chose to use only those that contain clean positive documents for each query. Keep in mind that a benchmark dataset is just that a benchmark. For real-world applications, it is best to construct an evaluation dataset tailored to your specific domain and evaluate embedding models, such as PIXIE, in that environment to determine the most suitable one.
|
| 89 |
|
| 90 |
#### 7 Datasets of BEIR (English)
|
| 91 |
Our model, **telepix/PIXIE-Rune-Preview**, achieves strong performance on a wide range of tasks, including fact verification, multi-hop question answering, financial QA, and scientific document retrieval, demonstrating competitive generalization across diverse domains.
|
| 92 |
|
| 93 |
| Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 94 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 95 |
+
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
|
| 96 |
+
| telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
|
| 97 |
| **telepix/PIXIE-Rune-Preview** | 0.5B | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
|
| 98 |
| | | | | | | |
|
| 99 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
|