File size: 8,315 Bytes
8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e 5c7c3d6 8a5712e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | ---
language:
- multilingual
- ko
- en
license: apache-2.0
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- onnx
- quantized
- xlm-roberta
- dense-encoder
- dense
- fastembed
base_model: telepix/PIXIE-Rune-v1.0
pipeline_tag: feature-extraction
---
# PIXIE-Rune-v1.0 β ONNX Quantized Variants
ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic
retrieval across 74 languages with specialization in Korean/English aerospace domain applications.
> **Original model:** [`telepix/PIXIE-Rune-v1.0`](https://huggingface.co/telepix/PIXIE-Rune-v1.0) β
> safetensors weights + FP32 ONNX (`onnx/model.onnx` + `onnx/model.onnx_data`).
> This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment.
---
## Model Description
| Property | Value |
|---|---|
| Base model | `telepix/PIXIE-Rune-v1.0` (XLM-RoBERTa-large) |
| Architecture | Transformer encoder |
| Output dimensionality | 1024 |
| Pooling | Mean pooling + L2 normalize |
| Max sequence length | 6,000 tokens |
| Languages | 74 (XLM-RoBERTa vocabulary: 250,002 tokens) |
| Domain | General multilingual + aerospace specialization |
| License | Apache 2.0 |
---
## ONNX Variants
| File | Quantization | Size | Avg cos vs FP32 | Pearson r | MRR | Notes |
|---|---|---|---|---|---|---|
| `onnx/model_quantized.onnx` | INT8 dynamic | 542 MB | 0.969 | 0.998 | 1.00 | `quantize_dynamic`, all weights |
| `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
| `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
**Metrics** measured on 8 semantically diverse sentences vs FP32 reference.
Pearson r = correlation of pairwise cosine similarity matrices (structure preservation).
MRR = Mean Reciprocal Rank on a retrieval probe β 1.00 = perfect retrieval ranking preserved.
### Quantization methodology
The XLM-RoBERTa vocabulary has 250,002 tokens Γ 1024 dimensions, making the word embedding
table the dominant weight (~977 MB FP32). Each variant handles it differently:
- **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic(weight_type=QInt8)` β
quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility.
- **INT4 + INT8 emb** (`model_int4.onnx`): Two-pass.
Pass 1: `MatMulNBitsQuantizer(block_size=32, symmetric=True)` packs transformer MatMul weights
to 4-bit nibbles. Pass 2: `quantize_dynamic(op_types=["Gather"], weight_type=QInt8)` brings
the embedding table from 977 MB FP32 β 244 MB INT8.
- **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
`DequantizeLinear(axis=0)` node insertion packs the embedding table as per-row symmetric
INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14β21. Embedding: 977 MB β 122 MB.
---
## Usage
### fastembed (Rust)
This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
```rust
use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
// INT8 β most compatible, 542 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?;
// INT4 + INT8 embedding β 434 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?;
// INT4 full β smallest, 337 MB
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?;
let embeddings = model.embed(vec!["μλ
νμΈμ", "Hello world"], None)?;
```
### ONNX Runtime (Python)
```python
import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_truncation(max_length=512)
tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
session = ort.InferenceSession("onnx/model_quantized.onnx",
providers=["CPUExecutionProvider"])
texts = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?",
"ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€."]
enc = tokenizer.encode_batch(texts)
ids = np.array([e.ids for e in enc], dtype=np.int64)
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0] # (batch, seq, 1024)
# Mean pooling + L2 normalize
pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
embeddings = pooled / norms.clip(1e-12)
# cosine similarity
scores = embeddings @ embeddings.T
print(scores)
```
### sentence-transformers (original FP32 weights)
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
queries = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?",
"κ΅λ°© λΆμΌμ μ΄λ€ μμ± μλΉμ€κ° μ 곡λλμ?"]
documents = ["ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€.",
"μ μ°° λ° κ°μ λͺ©μ μ μμ± μμμ ν΅ν΄ κ΅λ°© κ΄λ ¨ μ λ° λΆμ μλΉμ€λ₯Ό μ 곡ν©λλ€."]
q_emb = model.encode(queries, prompt_name="query")
d_emb = model.encode(documents)
scores = model.similarity(q_emb, d_emb)
print(scores)
```
---
## Quality Benchmarks (original model)
Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
evaluated using [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
### 6 Datasets of MTEB (Korean)
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|---|
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
| telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
| telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
| openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |
Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval.
### 7 Datasets of BEIR (English)
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|---|---|
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.5630 | 0.5446 | 0.5529 | 0.5660 | 0.5885 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
| Alibaba-NLP/gte-multilingual-base | 0.3B | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 |
| BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
| jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS.
---
## License
Apache 2.0 β same as the original [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
## Citation
```bibtex
@software{TelePIX-PIXIE-Rune-v1,
title = {PIXIE-Rune-v1.0},
author = {TelePIX AI Research Team and Bongmin Kim},
year = {2025},
url = {https://huggingface.co/telepix/PIXIE-Rune-v1.0}
}
```
## Contact
Original model authors: bmkim@telepix.net
ONNX quantization: [cstr](https://huggingface.co/cstr) β open an issue on this repo for questions.
|