File size: 8,832 Bytes
f027ec8
 
 
 
 
0cb7797
f027ec8
 
 
75ba2bb
 
 
f027ec8
d157d1c
 
4b2bf9d
8aa5bd1
 
bdda79b
f027ec8
4f14b35
f027ec8
 
 
 
 
 
cef4066
0cb7797
f027ec8
 
 
 
 
 
 
 
 
 
 
0d71e4d
d157d1c
0d71e4d
 
 
 
 
 
 
d157d1c
0d71e4d
 
 
d157d1c
6134dc1
0d71e4d
 
 
 
 
a646fc7
 
0d71e4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d157d1c
0d71e4d
 
 
a646fc7
6134dc1
0d71e4d
 
a646fc7
0d71e4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f027ec8
 
 
 
 
 
 
 
 
 
 
 
 
0cb7797
 
d157d1c
0cb7797
 
 
 
 
 
 
 
 
 
 
 
 
 
f027ec8
 
0cb7797
 
 
f027ec8
0cb7797
 
f027ec8
0cb7797
 
 
 
 
 
 
f027ec8
0cb7797
f027ec8
 
 
 
 
 
 
 
 
 
 
0cb7797
f027ec8
0cb7797
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- telepix
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
<p align="center">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/61d6f4a4d49065ee28a1ee7e/V8n2En7BlMNHoi1YXVv8Q.png" width="400"/>
<p>

# PIXIE-Rune-v1.0
**PIXIE-Rune-v1.0** is an encoder-based embedding model trained on Korean and English triplets, developed by [TelePIX Co., Ltd](https://telepix.net/). 
**PIXIE** stands for Tele**PIX** **I**ntelligent **E**mbedding, representing TelePIXโ€™s high-performance embedding technology. 
The model is multilingual, specifically optimized for both Korean and English. 
It demonstrates strong performance on retrieval tasks in both languages, achieving robust results across a wide range of Korean- and English-language benchmarks. 
This makes it well-suited for real-world applications that require high-quality semantic search in Korean, English, or both.

## Model Description
- **Model Type:** Sentence Transformer
<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
- **Maximum Sequence Length:** 8192 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
- **Language:** Bilingual โ€” optimized for high performance in Korean and English
- **License:** apache-2.0 

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Quality Benchmarks
**PIXIE-Rune-v1.0** is a multilingual embedding model specialized for Korean and English retrieval tasks. 
It delivers consistently strong performance across a diverse set of domain-specific and open-domain benchmarks in both languages, demonstrating its effectiveness in real-world semantic search applications.
The table below presents the retrieval performance of several embedding models evaluated on a variety of Korean and English benchmarks.
We report **Normalized Discounted Cumulative Gain (NDCG)** scores, which measure how well a ranked list of documents aligns with ground truth relevance. Higher values indicate better retrieval quality.  
- **Avg. NDCG**: Average of NDCG@1, @3, @5, and @10 across all benchmark datasets.  
- **NDCG@k**: Relevance quality of the top-*k* retrieved results.
 
#### Korean Retrieval Benchmarks
Our model, **telepix/PIXIE-Rune-v1.0**, achieves state-of-the-art performance across most metrics and benchmarks, demonstrating strong generalization across domains such as multi-hop QA, long-document retrieval, public health, and e-commerce.

| Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
| **telepix/PIXIE-Rune-v1.0** | 568M | **0.6905** | **0.6461** | **0.6859** | **0.7063** | **0.7238** |
|  |  |  |  |  |  |  |
| nlpai-lab/KURE-v1 | 568M | 0.6751 | 0.6277 | 0.6725 | 0.6907 | 0.7095 |
| dragonekue/BGE-m3-ko | 568M | 0.6658 | 0.6225 | 0.6627 | 0.6795 | 0.6985 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 568M | 0.6592 | 0.6118 | 0.6542 | 0.6759 | 0.6949 |
| BAAI/bge-m3 | 568M | 0.6573 | 0.6099 | 0.6533 | 0.6732 | 0.6930 |
| Qwen/Qwen3-Embedding-0.6B | 595M | 0.6321 | 0.5894 | 0.6274 | 0.6455 | 0.6662 |
| jinaai/jina-embeddings-v3 | 572M | 0.6293 | 0.5800 | 0.6254 | 0.6456 | 0.6665 |
| Alibaba-NLP/gte-multilingual-base | 305M | 0.6111 | 0.5542 | 0.6089 | 0.6302 | 0.6511 |
| openai/text-embedding-3-large | N/A | 0.6015 | 0.5466 | 0.5999 | 0.6187 | 0.6409 |

Descriptions of the benchmark datasets used for evaluation are as follows:
- **Ko-StrategyQA**  
  A Korean multi-hop open-domain question answering dataset designed for complex reasoning over multiple documents.
- **AutoRAGRetrieval**  
  A domain-diverse retrieval dataset covering finance, government, healthcare, legal, and e-commerce sectors.
- **MIRACLRetrieval**  
  A document retrieval benchmark built on Korean Wikipedia articles.
- **PublicHealthQA**  
  A retrieval dataset focused on medical and public health topics.
- **BelebeleRetrieval**  
  A dataset for retrieving relevant content from web and news articles in Korean.
- **MultiLongDocRetrieval**  
  A long-document retrieval benchmark based on Korean Wikipedia and mC4 corpus.
- **XPQARetrieval**  
  A real-world dataset constructed from user queries and relevant product documents in a Korean e-commerce platform.

#### English Retrieval Benchmarks
Our model, **telepix/PIXIE-Rune-v1.0**, achieves strong performance on a wide range of tasks, including fact verification, multi-hop question answering, financial QA, and scientific document retrieval, demonstrating competitive generalization across diverse domains.
 
| Model Name | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|------|:---:|:---:|:---:|:---:|:---:|:---:|
| **telepix/PIXIE-Rune-v1.0** | 568M | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
|  |  |  |  |  |  |  |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 568M | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
| Qwen/Qwen3-Embedding-0.6B | 595M | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
| Alibaba-NLP/gte-multilingual-base | 305M | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 |
| BAAI/bge-m3 | 568M | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
| dragonekue/BGE-m3-ko | 568M | 0.5307 | 0.5125 | 0.5174 | 0.5362 | 0.5566 |
| nlpai-lab/KURE-v1 | 568M | 0.5272 | 0.5017 | 0.5171 | 0.5353 | 0.5548 |

Descriptions of the benchmark datasets used for evaluation are as follows:
- **ArguAna**  
  A dataset for argument retrieval based on claim-counterclaim pairs from online debate forums.
- **FEVER**  
  A fact verification dataset using Wikipedia for evidence-based claim validation.
- **FiQA-2018**  
  A retrieval benchmark tailored to the finance domain with real-world questions and answers.
- **HotpotQA**  
  A multi-hop open-domain QA dataset requiring reasoning across multiple documents.
- **MSMARCO**  
  A large-scale benchmark using real Bing search queries and corresponding web documents.
- **NQ**  
  A Google QA dataset where user questions are answered using Wikipedia articles.
- **SCIDOCS**  
  A citation-based document retrieval dataset focused on scientific papers.
  
## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
  
# Load the model
model_name = 'PIXIE-Rune-v1.0'
model = SentenceTransformer(model_name)

# Define the queries and documents
queries = [
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์–ด๋–ค ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ์œ„์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋‚˜์š”?",
    "๊ตญ๋ฐฉ ๋ถ„์•ผ์— ์–ด๋–ค ์œ„์„ฑ ์„œ๋น„์Šค๊ฐ€ ์ œ๊ณต๋˜๋‚˜์š”?",
    "ํ…”๋ ˆํ”ฝ์Šค์˜ ๊ธฐ์ˆ  ์ˆ˜์ค€์€ ์–ด๋А ์ •๋„์ธ๊ฐ€์š”?",
]
documents = [
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ๊ตญ๋ฐฉ, ๋†์—…, ์ž์›, ํ•ด์–‘ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์œ„์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
    "์ •์ฐฐ ๋ฐ ๊ฐ์‹œ ๋ชฉ์ ์˜ ์œ„์„ฑ ์˜์ƒ์„ ํ†ตํ•ด ๊ตญ๋ฐฉ ๊ด€๋ จ ์ •๋ฐ€ ๋ถ„์„ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
    "TelePIX์˜ ๊ด‘ํ•™ ํƒ‘์žฌ์ฒด ๋ฐ AI ๋ถ„์„ ๊ธฐ์ˆ ์€ Global standard๋ฅผ ์ƒํšŒํ•˜๋Š” ์ˆ˜์ค€์œผ๋กœ ํ‰๊ฐ€๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์šฐ์ฃผ์—์„œ ์ˆ˜์ง‘ํ•œ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜์—ฌ '์šฐ์ฃผ ๊ฒฝ์ œ(Space Economy)'๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐ€์น˜๋ฅผ ์ฐฝ์ถœํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
    "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์œ„์„ฑ ์˜์ƒ ํš๋“๋ถ€ํ„ฐ ๋ถ„์„, ์„œ๋น„์Šค ์ œ๊ณต๊นŒ์ง€ ์ „ ์ฃผ๊ธฐ๋ฅผ ์•„์šฐ๋ฅด๋Š” ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
]

# Compute embeddings: use `prompt_name="query"` to encode queries!
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute cosine similarity scores
scores = model.similarity(query_embeddings, document_embeddings)

# Output the results
for query, query_scores in zip(queries, scores):
    doc_score_pairs = list(zip(documents, query_scores))
    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
    print("Query:", query)
    for document, score in doc_score_pairs:
        print(score, document)

```

### Framework Versions
- Python: 3.10.16
- Sentence Transformers: 4.0.1
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 2.21.0
- Tokenizers: 0.21.1


## Contact

If you have any suggestions or questions about this Model, please reach out to the authors at bmkim@telepix.net.