cstr
/

PIXIE-Rune-v1.0-ONNX

+# PIXIE-Rune-v1.0 — ONNX Quantized Variants
+ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
+an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic
+retrieval across 74 languages with specialization in Korean/English aerospace domain applications.
+> **Original model:** [`telepix/PIXIE-Rune-v1.0`](https://huggingface.co/telepix/PIXIE-Rune-v1.0) —
+> safetensors weights + FP32 ONNX (`onnx/model.onnx` + `onnx/model.onnx_data`).
+> This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment.
+---
+## Model Description
+| Property | Value |
+|---|---|
+| Base model | `telepix/PIXIE-Rune-v1.0` (XLM-RoBERTa-large) |
+| Architecture | Transformer encoder |
+| Output dimensionality | 1024 |
+| Pooling | Mean pooling + L2 normalize |
+| Max sequence length | 6,000 tokens |
+| Languages | 74 (XLM-RoBERTa vocabulary: 250,002 tokens) |
+| Domain | General multilingual + aerospace specialization |
+| License | Apache 2.0 |
+---
+## ONNX Variants
+| File | Quantization | Size | Avg cos vs FP32 | Pearson r | MRR | Notes |
+|---|---|---|---|---|---|---|
+| `onnx/model_quantized.onnx` | INT8 dynamic | 542 MB | 0.969 | 0.998 | 1.00 | `quantize_dynamic`, all weights |
+| `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
+| `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
+**Metrics** measured on 8 semantically diverse English sentences vs the FP32 reference.
+Pearson r is the correlation of pairwise cosine similarity matrices (structure preservation).
+MRR = Mean Reciprocal Rank on a retrieval probe — 1.00 = perfect retrieval ranking preserved.
+### Quantization methodology
+- **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic` with
+  `weight_type=QInt8` — quantizes all weight tensors (MatMul + embedding Gather) to INT8.
+- **INT4+INT8 emb** (`model_int4.onnx`): Two-pass approach.
+  Pass 1: `MatMulNBitsQuantizer(block_size=32, is_symmetric=True)` quantizes transformer
+  MatMul weights to 4-bit. Pass 2: `quantize_dynamic` with `op_types_to_quantize=["Gather"]`
+  compresses the 250K-token embedding table to INT8. Net: 977 MB FP32 embedding → 244 MB INT8.
+- **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
+  `DequantizeLinear(axis=0)` node insertion packs the word embedding table as INT4 nibbles
+  (per-row symmetric, scale = max(|row|)/7). Requires opset 21 for INT4 DequantizeLinear.
+  The 977 MB FP32 embedding table becomes 122 MB packed INT4.
+---
+## Usage
+### fastembed (Rust / Python)
+This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
+```rust
+use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
+let model = TextEmbedding::try_new(
+    InitOptions::new(EmbeddingModel::PixieRuneV1Q)   // INT8
+    // EmbeddingModel::PixieRuneV1Int4                // INT4+INT8 emb
+    // EmbeddingModel::PixieRuneV1Int4Full            // INT4 full
+)?;
+let embeddings = model.embed(vec!["Hello", "World"], None)?;
+```
+```python
+from fastembed import TextEmbedding
+model = TextEmbedding("telepix/PIXIE-Rune-v1.0", model_file="onnx/model_quantized.onnx")
+embeddings = list(model.embed(["Hello", "World"]))
+```
+### ONNX Runtime (Python)
+```python
+import onnxruntime as ort
+import numpy as np
+from tokenizers import Tokenizer
+tokenizer = Tokenizer.from_file("tokenizer.json")
+tokenizer.enable_truncation(max_length=512)
+tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
+session = ort.InferenceSession("onnx/model_quantized.onnx")
+texts = ["Hello, world!", "This is a test."]
+enc = tokenizer.encode_batch(texts)
+ids  = np.array([e.ids            for e in enc], dtype=np.int64)
+mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
+out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]
+# Mean pooling + L2 normalize
+pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
+norms  = np.linalg.norm(pooled, axis=-1, keepdims=True)
+embeddings = pooled / norms.clip(1e-12)
+```
+### sentence-transformers (original weights)
+```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
+queries   = ["텔레픽스는 어떤 산업 분야에서 위성 데이터를 활용하나요?"]
+documents = ["텔레픽스는 해양, 자원, 농업 등 다양한 분야에서 위성 데이터를 분석하여 서비스를 제공합니다."]
+q_emb = model.encode(queries,   prompt_name="query")
+d_emb = model.encode(documents)
+scores = model.similarity(q_emb, d_emb)
+```
+---
+## Quality Benchmarks (original model)
+Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
+### 6 Datasets of MTEB (Korean)
+| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
+|---|---|---|---|---|---|---|
+| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
+| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
+| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
+| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
+| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
+| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
+| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
+### 7 Datasets of BEIR (English)
+| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
+|---|---|---|---|---|---|---|
+| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
+| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
+| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
+| BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
+| jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
+Benchmarks from [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
+---
+## License
+Apache 2.0 — same as the original model.
+## Citation
+```bibtex
+@software{TelePIX-PIXIE-Rune-v1,
+  title={PIXIE-Rune-v1.0},
+  author={TelePIX AI Research Team and Bongmin Kim},
+  year={2025},
+  url={https://huggingface.co/telepix/PIXIE-Rune-v1.0}
+}
+```
+## Contact
+Original model: bmkim@telepix.net
+ONNX quantization: [cstr](https://huggingface.co/cstr) — issues welcome.