Feature Extraction
sentence-transformers
ONNX
multilingual
Korean
English
xlm-roberta
sentence-similarity
quantized
dense-encoder
dense
fastembed
text-embeddings-inference
Instructions to use cstr/PIXIE-Rune-v1.0-ONNX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cstr/PIXIE-Rune-v1.0-ONNX with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("cstr/PIXIE-Rune-v1.0-ONNX") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PIXIE-Rune-v1.0 β ONNX Quantized Variants
|
| 2 |
+
|
| 3 |
+
ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
|
| 4 |
+
an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic
|
| 5 |
+
retrieval across 74 languages with specialization in Korean/English aerospace domain applications.
|
| 6 |
+
|
| 7 |
+
> **Original model:** [`telepix/PIXIE-Rune-v1.0`](https://huggingface.co/telepix/PIXIE-Rune-v1.0) β
|
| 8 |
+
> safetensors weights + FP32 ONNX (`onnx/model.onnx` + `onnx/model.onnx_data`).
|
| 9 |
+
> This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment.
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## Model Description
|
| 14 |
+
|
| 15 |
+
| Property | Value |
|
| 16 |
+
|---|---|
|
| 17 |
+
| Base model | `telepix/PIXIE-Rune-v1.0` (XLM-RoBERTa-large) |
|
| 18 |
+
| Architecture | Transformer encoder |
|
| 19 |
+
| Output dimensionality | 1024 |
|
| 20 |
+
| Pooling | Mean pooling + L2 normalize |
|
| 21 |
+
| Max sequence length | 6,000 tokens |
|
| 22 |
+
| Languages | 74 (XLM-RoBERTa vocabulary: 250,002 tokens) |
|
| 23 |
+
| Domain | General multilingual + aerospace specialization |
|
| 24 |
+
| License | Apache 2.0 |
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## ONNX Variants
|
| 29 |
+
|
| 30 |
+
| File | Quantization | Size | Avg cos vs FP32 | Pearson r | MRR | Notes |
|
| 31 |
+
|---|---|---|---|---|---|---|
|
| 32 |
+
| `onnx/model_quantized.onnx` | INT8 dynamic | 542 MB | 0.969 | 0.998 | 1.00 | `quantize_dynamic`, all weights |
|
| 33 |
+
| `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
|
| 34 |
+
| `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
|
| 35 |
+
|
| 36 |
+
**Metrics** measured on 8 semantically diverse English sentences vs the FP32 reference.
|
| 37 |
+
Pearson r is the correlation of pairwise cosine similarity matrices (structure preservation).
|
| 38 |
+
MRR = Mean Reciprocal Rank on a retrieval probe β 1.00 = perfect retrieval ranking preserved.
|
| 39 |
+
|
| 40 |
+
### Quantization methodology
|
| 41 |
+
|
| 42 |
+
- **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic` with
|
| 43 |
+
`weight_type=QInt8` β quantizes all weight tensors (MatMul + embedding Gather) to INT8.
|
| 44 |
+
- **INT4+INT8 emb** (`model_int4.onnx`): Two-pass approach.
|
| 45 |
+
Pass 1: `MatMulNBitsQuantizer(block_size=32, is_symmetric=True)` quantizes transformer
|
| 46 |
+
MatMul weights to 4-bit. Pass 2: `quantize_dynamic` with `op_types_to_quantize=["Gather"]`
|
| 47 |
+
compresses the 250K-token embedding table to INT8. Net: 977 MB FP32 embedding β 244 MB INT8.
|
| 48 |
+
- **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
|
| 49 |
+
`DequantizeLinear(axis=0)` node insertion packs the word embedding table as INT4 nibbles
|
| 50 |
+
(per-row symmetric, scale = max(|row|)/7). Requires opset 21 for INT4 DequantizeLinear.
|
| 51 |
+
The 977 MB FP32 embedding table becomes 122 MB packed INT4.
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## Usage
|
| 56 |
+
|
| 57 |
+
### fastembed (Rust / Python)
|
| 58 |
+
|
| 59 |
+
This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
|
| 60 |
+
|
| 61 |
+
```rust
|
| 62 |
+
use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
|
| 63 |
+
|
| 64 |
+
let model = TextEmbedding::try_new(
|
| 65 |
+
InitOptions::new(EmbeddingModel::PixieRuneV1Q) // INT8
|
| 66 |
+
// EmbeddingModel::PixieRuneV1Int4 // INT4+INT8 emb
|
| 67 |
+
// EmbeddingModel::PixieRuneV1Int4Full // INT4 full
|
| 68 |
+
)?;
|
| 69 |
+
|
| 70 |
+
let embeddings = model.embed(vec!["Hello", "World"], None)?;
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
```python
|
| 74 |
+
from fastembed import TextEmbedding
|
| 75 |
+
|
| 76 |
+
model = TextEmbedding("telepix/PIXIE-Rune-v1.0", model_file="onnx/model_quantized.onnx")
|
| 77 |
+
embeddings = list(model.embed(["Hello", "World"]))
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
### ONNX Runtime (Python)
|
| 81 |
+
|
| 82 |
+
```python
|
| 83 |
+
import onnxruntime as ort
|
| 84 |
+
import numpy as np
|
| 85 |
+
from tokenizers import Tokenizer
|
| 86 |
+
|
| 87 |
+
tokenizer = Tokenizer.from_file("tokenizer.json")
|
| 88 |
+
tokenizer.enable_truncation(max_length=512)
|
| 89 |
+
tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
|
| 90 |
+
|
| 91 |
+
session = ort.InferenceSession("onnx/model_quantized.onnx")
|
| 92 |
+
|
| 93 |
+
texts = ["Hello, world!", "This is a test."]
|
| 94 |
+
enc = tokenizer.encode_batch(texts)
|
| 95 |
+
ids = np.array([e.ids for e in enc], dtype=np.int64)
|
| 96 |
+
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
|
| 97 |
+
|
| 98 |
+
out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]
|
| 99 |
+
|
| 100 |
+
# Mean pooling + L2 normalize
|
| 101 |
+
pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
|
| 102 |
+
norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
|
| 103 |
+
embeddings = pooled / norms.clip(1e-12)
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
### sentence-transformers (original weights)
|
| 107 |
+
|
| 108 |
+
```python
|
| 109 |
+
from sentence_transformers import SentenceTransformer
|
| 110 |
+
|
| 111 |
+
model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
|
| 112 |
+
|
| 113 |
+
queries = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?"]
|
| 114 |
+
documents = ["ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€."]
|
| 115 |
+
|
| 116 |
+
q_emb = model.encode(queries, prompt_name="query")
|
| 117 |
+
d_emb = model.encode(documents)
|
| 118 |
+
scores = model.similarity(q_emb, d_emb)
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## Quality Benchmarks (original model)
|
| 124 |
+
|
| 125 |
+
Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
|
| 126 |
+
|
| 127 |
+
### 6 Datasets of MTEB (Korean)
|
| 128 |
+
|
| 129 |
+
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 130 |
+
|---|---|---|---|---|---|---|
|
| 131 |
+
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
|
| 132 |
+
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
|
| 133 |
+
| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
|
| 134 |
+
| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
|
| 135 |
+
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
|
| 136 |
+
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
|
| 137 |
+
| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
|
| 138 |
+
|
| 139 |
+
### 7 Datasets of BEIR (English)
|
| 140 |
+
|
| 141 |
+
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 142 |
+
|---|---|---|---|---|---|---|
|
| 143 |
+
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
|
| 144 |
+
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
|
| 145 |
+
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
|
| 146 |
+
| BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
|
| 147 |
+
| jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
|
| 148 |
+
|
| 149 |
+
Benchmarks from [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
|
| 150 |
+
|
| 151 |
+
---
|
| 152 |
+
|
| 153 |
+
## License
|
| 154 |
+
|
| 155 |
+
Apache 2.0 β same as the original model.
|
| 156 |
+
|
| 157 |
+
## Citation
|
| 158 |
+
|
| 159 |
+
```bibtex
|
| 160 |
+
@software{TelePIX-PIXIE-Rune-v1,
|
| 161 |
+
title={PIXIE-Rune-v1.0},
|
| 162 |
+
author={TelePIX AI Research Team and Bongmin Kim},
|
| 163 |
+
year={2025},
|
| 164 |
+
url={https://huggingface.co/telepix/PIXIE-Rune-v1.0}
|
| 165 |
+
}
|
| 166 |
+
```
|
| 167 |
+
|
| 168 |
+
## Contact
|
| 169 |
+
|
| 170 |
+
Original model: bmkim@telepix.net
|
| 171 |
+
ONNX quantization: [cstr](https://huggingface.co/cstr) β issues welcome.
|