BM-K commited on
Commit
d157d1c
·
verified ·
1 Parent(s): 75ba2bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -11,8 +11,8 @@ library_name: sentence-transformers
11
  <img src="https://cdn-uploads.huggingface.co/production/uploads/61d6f4a4d49065ee28a1ee7e/V8n2En7BlMNHoi1YXVv8Q.png" width="400"/>
12
  <p>
13
 
14
- # PIXIE-Rune-M-v1.0
15
- **PIXIE-Rune-M-v1.0** is an encoder-based embedding model trained on Korean and English triplets, developed by [TelePIX Co., Ltd](https://telepix.net/).
16
  **PIXIE** stands for Tele**PIX** **I**ntelligent **E**mbedding, representing TelePIX’s high-performance embedding technology.
17
  The model is multilingual, specifically optimized for both Korean and English.
18
  It demonstrates strong performance on retrieval tasks in both languages, achieving robust results across a wide range of Korean- and English-language benchmarks.
@@ -39,7 +39,7 @@ SentenceTransformer(
39
  ```
40
 
41
  ## Quality Benchmarks
42
- **PIXIE-Rune-M-v1.0** is a multilingual embedding model specialized for Korean and English retrieval tasks.
43
  It delivers consistently strong performance across a diverse set of domain-specific and open-domain benchmarks in both languages, demonstrating its effectiveness in real-world semantic search applications.
44
  The table below presents the retrieval performance of several embedding models evaluated on a variety of Korean and English benchmarks.
45
  We report **Normalized Discounted Cumulative Gain (NDCG)** scores, which measure how well a ranked list of documents aligns with ground truth relevance. Higher values indicate better retrieval quality.
@@ -47,11 +47,11 @@ We report **Normalized Discounted Cumulative Gain (NDCG)** scores, which measure
47
  - **NDCG@k**: Relevance quality of the top-*k* retrieved results.
48
 
49
  #### Korean Retrieval Benchmarks
50
- Our model, **telepix/PIXIE-Rune-M-v1.0**, achieves state-of-the-art performance across most metrics and benchmarks, demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.
51
 
52
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
53
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
54
- | **telepix/PIXIE-Rune-M-v1.0** | 568M | **0.6905** | **0.6461** | **0.6859** | **0.7063** | **0.7238** |
55
  | nlpai-lab/KURE-v1 | 568M | 0.6751 | 0.6277 | 0.6725 | 0.6907 | 0.7095 |
56
  | dragonekue/BGE-m3-ko | 568M | 0.6658 | 0.6225 | 0.6627 | 0.6795 | 0.6985 |
57
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 568M | 0.6592 | 0.6118 | 0.6542 | 0.6759 | 0.6949 |
@@ -78,11 +78,11 @@ Descriptions of the benchmark datasets used for evaluation are as follows:
78
  A real-world dataset constructed from user queries and relevant product documents in a Korean e-commerce platform.
79
 
80
  #### English Retrieval Benchmarks
81
- Our model, **telepix/PIXIE-Rune-M-v1.0**, achieves strong performance on a wide range of tasks, including fact verification, multi-hop question answering, financial QA, and scientific document retrieval, demonstrating competitive generalization across diverse domains.
82
 
83
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
84
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
85
- | **telepix/PIXIE-Rune-M-v1.0** | 568M | **123** | **123** | **123** | **123** | **123** |
86
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 568M | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
87
  | Qwen/Qwen3-Embedding-0.6B | 595M | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
88
  | BAAI/bge-m3 | 568M | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
@@ -120,7 +120,7 @@ Then you can load this model and run inference.
120
  from sentence_transformers import SentenceTransformer
121
 
122
  # Load the model
123
- model_name = 'PIXIE-Rune-M-v1.0'
124
  model = SentenceTransformer(model_name)
125
 
126
  # Define the queries and documents
 
11
  <img src="https://cdn-uploads.huggingface.co/production/uploads/61d6f4a4d49065ee28a1ee7e/V8n2En7BlMNHoi1YXVv8Q.png" width="400"/>
12
  <p>
13
 
14
+ # PIXIE-Rune-v1.0
15
+ **PIXIE-Rune-v1.0** is an encoder-based embedding model trained on Korean and English triplets, developed by [TelePIX Co., Ltd](https://telepix.net/).
16
  **PIXIE** stands for Tele**PIX** **I**ntelligent **E**mbedding, representing TelePIX’s high-performance embedding technology.
17
  The model is multilingual, specifically optimized for both Korean and English.
18
  It demonstrates strong performance on retrieval tasks in both languages, achieving robust results across a wide range of Korean- and English-language benchmarks.
 
39
  ```
40
 
41
  ## Quality Benchmarks
42
+ **PIXIE-Rune-v1.0** is a multilingual embedding model specialized for Korean and English retrieval tasks.
43
  It delivers consistently strong performance across a diverse set of domain-specific and open-domain benchmarks in both languages, demonstrating its effectiveness in real-world semantic search applications.
44
  The table below presents the retrieval performance of several embedding models evaluated on a variety of Korean and English benchmarks.
45
  We report **Normalized Discounted Cumulative Gain (NDCG)** scores, which measure how well a ranked list of documents aligns with ground truth relevance. Higher values indicate better retrieval quality.
 
47
  - **NDCG@k**: Relevance quality of the top-*k* retrieved results.
48
 
49
  #### Korean Retrieval Benchmarks
50
+ Our model, **telepix/PIXIE-Rune-v1.0**, achieves state-of-the-art performance across most metrics and benchmarks, demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.
51
 
52
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
53
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
54
+ | **telepix/PIXIE-Rune-v1.0** | 568M | **0.6905** | **0.6461** | **0.6859** | **0.7063** | **0.7238** |
55
  | nlpai-lab/KURE-v1 | 568M | 0.6751 | 0.6277 | 0.6725 | 0.6907 | 0.7095 |
56
  | dragonekue/BGE-m3-ko | 568M | 0.6658 | 0.6225 | 0.6627 | 0.6795 | 0.6985 |
57
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 568M | 0.6592 | 0.6118 | 0.6542 | 0.6759 | 0.6949 |
 
78
  A real-world dataset constructed from user queries and relevant product documents in a Korean e-commerce platform.
79
 
80
  #### English Retrieval Benchmarks
81
+ Our model, **telepix/PIXIE-Rune-v1.0**, achieves strong performance on a wide range of tasks, including fact verification, multi-hop question answering, financial QA, and scientific document retrieval, demonstrating competitive generalization across diverse domains.
82
 
83
  | Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
84
  |------|:---:|:---:|:---:|:---:|:---:|:---:|
85
+ | **telepix/PIXIE-Rune-v1.0** | 568M | **123** | **123** | **123** | **123** | **123** |
86
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 568M | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
87
  | Qwen/Qwen3-Embedding-0.6B | 595M | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
88
  | BAAI/bge-m3 | 568M | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
 
120
  from sentence_transformers import SentenceTransformer
121
 
122
  # Load the model
123
+ model_name = 'PIXIE-Rune-v1.0'
124
  model = SentenceTransformer(model_name)
125
 
126
  # Define the queries and documents