File size: 8,315 Bytes
8a5712e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c7c3d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8a5712e
 
5c7c3d6
 
 
 
8a5712e
 
 
 
 
 
 
 
 
5c7c3d6
8a5712e
 
5c7c3d6
 
 
 
 
8a5712e
5c7c3d6
 
 
 
 
 
8a5712e
 
5c7c3d6
8a5712e
 
5c7c3d6
8a5712e
 
5c7c3d6
8a5712e
5c7c3d6
 
 
 
 
 
 
 
 
 
 
 
 
8a5712e
 
5c7c3d6
8a5712e
 
 
 
5c7c3d6
 
 
8a5712e
5c7c3d6
 
 
 
 
8a5712e
 
 
5c7c3d6
 
8a5712e
5c7c3d6
 
 
 
 
 
8a5712e
 
 
 
5c7c3d6
 
 
 
8a5712e
5c7c3d6
 
 
 
 
 
8a5712e
 
5c7c3d6
 
 
 
 
 
8a5712e
5c7c3d6
8a5712e
5c7c3d6
 
 
 
 
8a5712e
 
 
5c7c3d6
 
 
 
 
 
 
8a5712e
5c7c3d6
8a5712e
5c7c3d6
 
 
8a5712e
5c7c3d6
 
 
 
 
8a5712e
5c7c3d6
 
 
 
 
8a5712e
 
 
 
5c7c3d6
 
 
 
 
8a5712e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
language:
  - multilingual
  - ko
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - onnx
  - quantized
  - xlm-roberta
  - dense-encoder
  - dense
  - fastembed
base_model: telepix/PIXIE-Rune-v1.0
pipeline_tag: feature-extraction
---

# PIXIE-Rune-v1.0 β€” ONNX Quantized Variants

ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic
retrieval across 74 languages with specialization in Korean/English aerospace domain applications.

> **Original model:** [`telepix/PIXIE-Rune-v1.0`](https://huggingface.co/telepix/PIXIE-Rune-v1.0) β€”
> safetensors weights + FP32 ONNX (`onnx/model.onnx` + `onnx/model.onnx_data`).
> This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment.

---

## Model Description

| Property | Value |
|---|---|
| Base model | `telepix/PIXIE-Rune-v1.0` (XLM-RoBERTa-large) |
| Architecture | Transformer encoder |
| Output dimensionality | 1024 |
| Pooling | Mean pooling + L2 normalize |
| Max sequence length | 6,000 tokens |
| Languages | 74 (XLM-RoBERTa vocabulary: 250,002 tokens) |
| Domain | General multilingual + aerospace specialization |
| License | Apache 2.0 |

---

## ONNX Variants

| File | Quantization | Size | Avg cos vs FP32 | Pearson r | MRR | Notes |
|---|---|---|---|---|---|---|
| `onnx/model_quantized.onnx` | INT8 dynamic | 542 MB | 0.969 | 0.998 | 1.00 | `quantize_dynamic`, all weights |
| `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
| `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |

**Metrics** measured on 8 semantically diverse sentences vs FP32 reference.
Pearson r = correlation of pairwise cosine similarity matrices (structure preservation).
MRR = Mean Reciprocal Rank on a retrieval probe β€” 1.00 = perfect retrieval ranking preserved.

### Quantization methodology

The XLM-RoBERTa vocabulary has 250,002 tokens Γ— 1024 dimensions, making the word embedding
table the dominant weight (~977 MB FP32). Each variant handles it differently:

- **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic(weight_type=QInt8)` β€”
  quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility.
- **INT4 + INT8 emb** (`model_int4.onnx`): Two-pass.
  Pass 1: `MatMulNBitsQuantizer(block_size=32, symmetric=True)` packs transformer MatMul weights
  to 4-bit nibbles. Pass 2: `quantize_dynamic(op_types=["Gather"], weight_type=QInt8)` brings
  the embedding table from 977 MB FP32 β†’ 244 MB INT8.
- **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
  `DequantizeLinear(axis=0)` node insertion packs the embedding table as per-row symmetric
  INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14β†’21. Embedding: 977 MB β†’ 122 MB.

---

## Usage

### fastembed (Rust)

This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):

```rust
use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};

// INT8 β€” most compatible, 542 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?;

// INT4 + INT8 embedding β€” 434 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?;

// INT4 full β€” smallest, 337 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?;

let embeddings = model.embed(vec!["μ•ˆλ…•ν•˜μ„Έμš”", "Hello world"], None)?;
```

### ONNX Runtime (Python)

```python
import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_truncation(max_length=512)
tokenizer.enable_padding(pad_token="<pad>", pad_id=1)

session = ort.InferenceSession("onnx/model_quantized.onnx",
                                providers=["CPUExecutionProvider"])

texts = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?",
         "ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]

enc  = tokenizer.encode_batch(texts)
ids  = np.array([e.ids            for e in enc], dtype=np.int64)
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)

out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]  # (batch, seq, 1024)

# Mean pooling + L2 normalize
pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
norms  = np.linalg.norm(pooled, axis=-1, keepdims=True)
embeddings = pooled / norms.clip(1e-12)
# cosine similarity
scores = embeddings @ embeddings.T
print(scores)
```

### sentence-transformers (original FP32 weights)

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")

queries   = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?",
             "κ΅­λ°© 뢄야에 μ–΄λ–€ μœ„μ„± μ„œλΉ„μŠ€κ°€ μ œκ³΅λ˜λ‚˜μš”?"]
documents = ["ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.",
             "μ •μ°° 및 κ°μ‹œ λͺ©μ μ˜ μœ„μ„± μ˜μƒμ„ 톡해 κ΅­λ°© κ΄€λ ¨ μ •λ°€ 뢄석 μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]

q_emb = model.encode(queries,   prompt_name="query")
d_emb = model.encode(documents)
scores = model.similarity(q_emb, d_emb)
print(scores)
```

---

## Quality Benchmarks (original model)

Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
evaluated using [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).

### 6 Datasets of MTEB (Korean)

| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|---|
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
| telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
| telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
| openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |

Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval.

### 7 Datasets of BEIR (English)

| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|---|
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.5630 | 0.5446 | 0.5529 | 0.5660 | 0.5885 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
| Alibaba-NLP/gte-multilingual-base | 0.3B | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 |
| BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
| jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |

Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS.

---

## License

Apache 2.0 β€” same as the original [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).

## Citation

```bibtex
@software{TelePIX-PIXIE-Rune-v1,
  title  = {PIXIE-Rune-v1.0},
  author = {TelePIX AI Research Team and Bongmin Kim},
  year   = {2025},
  url    = {https://huggingface.co/telepix/PIXIE-Rune-v1.0}
}
```

## Contact

Original model authors: bmkim@telepix.net
ONNX quantization: [cstr](https://huggingface.co/cstr) β€” open an issue on this repo for questions.