cstr commited on
Commit
5c7c3d6
Β·
verified Β·
1 Parent(s): 7497ec9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +171 -0
README.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PIXIE-Rune-v1.0 β€” ONNX Quantized Variants
2
+
3
+ ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
4
+ an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic
5
+ retrieval across 74 languages with specialization in Korean/English aerospace domain applications.
6
+
7
+ > **Original model:** [`telepix/PIXIE-Rune-v1.0`](https://huggingface.co/telepix/PIXIE-Rune-v1.0) β€”
8
+ > safetensors weights + FP32 ONNX (`onnx/model.onnx` + `onnx/model.onnx_data`).
9
+ > This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment.
10
+
11
+ ---
12
+
13
+ ## Model Description
14
+
15
+ | Property | Value |
16
+ |---|---|
17
+ | Base model | `telepix/PIXIE-Rune-v1.0` (XLM-RoBERTa-large) |
18
+ | Architecture | Transformer encoder |
19
+ | Output dimensionality | 1024 |
20
+ | Pooling | Mean pooling + L2 normalize |
21
+ | Max sequence length | 6,000 tokens |
22
+ | Languages | 74 (XLM-RoBERTa vocabulary: 250,002 tokens) |
23
+ | Domain | General multilingual + aerospace specialization |
24
+ | License | Apache 2.0 |
25
+
26
+ ---
27
+
28
+ ## ONNX Variants
29
+
30
+ | File | Quantization | Size | Avg cos vs FP32 | Pearson r | MRR | Notes |
31
+ |---|---|---|---|---|---|---|
32
+ | `onnx/model_quantized.onnx` | INT8 dynamic | 542 MB | 0.969 | 0.998 | 1.00 | `quantize_dynamic`, all weights |
33
+ | `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
34
+ | `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
35
+
36
+ **Metrics** measured on 8 semantically diverse English sentences vs the FP32 reference.
37
+ Pearson r is the correlation of pairwise cosine similarity matrices (structure preservation).
38
+ MRR = Mean Reciprocal Rank on a retrieval probe β€” 1.00 = perfect retrieval ranking preserved.
39
+
40
+ ### Quantization methodology
41
+
42
+ - **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic` with
43
+ `weight_type=QInt8` β€” quantizes all weight tensors (MatMul + embedding Gather) to INT8.
44
+ - **INT4+INT8 emb** (`model_int4.onnx`): Two-pass approach.
45
+ Pass 1: `MatMulNBitsQuantizer(block_size=32, is_symmetric=True)` quantizes transformer
46
+ MatMul weights to 4-bit. Pass 2: `quantize_dynamic` with `op_types_to_quantize=["Gather"]`
47
+ compresses the 250K-token embedding table to INT8. Net: 977 MB FP32 embedding β†’ 244 MB INT8.
48
+ - **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
49
+ `DequantizeLinear(axis=0)` node insertion packs the word embedding table as INT4 nibbles
50
+ (per-row symmetric, scale = max(|row|)/7). Requires opset 21 for INT4 DequantizeLinear.
51
+ The 977 MB FP32 embedding table becomes 122 MB packed INT4.
52
+
53
+ ---
54
+
55
+ ## Usage
56
+
57
+ ### fastembed (Rust / Python)
58
+
59
+ This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
60
+
61
+ ```rust
62
+ use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
63
+
64
+ let model = TextEmbedding::try_new(
65
+ InitOptions::new(EmbeddingModel::PixieRuneV1Q) // INT8
66
+ // EmbeddingModel::PixieRuneV1Int4 // INT4+INT8 emb
67
+ // EmbeddingModel::PixieRuneV1Int4Full // INT4 full
68
+ )?;
69
+
70
+ let embeddings = model.embed(vec!["Hello", "World"], None)?;
71
+ ```
72
+
73
+ ```python
74
+ from fastembed import TextEmbedding
75
+
76
+ model = TextEmbedding("telepix/PIXIE-Rune-v1.0", model_file="onnx/model_quantized.onnx")
77
+ embeddings = list(model.embed(["Hello", "World"]))
78
+ ```
79
+
80
+ ### ONNX Runtime (Python)
81
+
82
+ ```python
83
+ import onnxruntime as ort
84
+ import numpy as np
85
+ from tokenizers import Tokenizer
86
+
87
+ tokenizer = Tokenizer.from_file("tokenizer.json")
88
+ tokenizer.enable_truncation(max_length=512)
89
+ tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
90
+
91
+ session = ort.InferenceSession("onnx/model_quantized.onnx")
92
+
93
+ texts = ["Hello, world!", "This is a test."]
94
+ enc = tokenizer.encode_batch(texts)
95
+ ids = np.array([e.ids for e in enc], dtype=np.int64)
96
+ mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
97
+
98
+ out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]
99
+
100
+ # Mean pooling + L2 normalize
101
+ pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
102
+ norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
103
+ embeddings = pooled / norms.clip(1e-12)
104
+ ```
105
+
106
+ ### sentence-transformers (original weights)
107
+
108
+ ```python
109
+ from sentence_transformers import SentenceTransformer
110
+
111
+ model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
112
+
113
+ queries = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?"]
114
+ documents = ["ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]
115
+
116
+ q_emb = model.encode(queries, prompt_name="query")
117
+ d_emb = model.encode(documents)
118
+ scores = model.similarity(q_emb, d_emb)
119
+ ```
120
+
121
+ ---
122
+
123
+ ## Quality Benchmarks (original model)
124
+
125
+ Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
126
+
127
+ ### 6 Datasets of MTEB (Korean)
128
+
129
+ | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
130
+ |---|---|---|---|---|---|---|
131
+ | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
132
+ | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
133
+ | nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
134
+ | BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
135
+ | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
136
+ | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
137
+ | jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
138
+
139
+ ### 7 Datasets of BEIR (English)
140
+
141
+ | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
142
+ |---|---|---|---|---|---|---|
143
+ | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
144
+ | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
145
+ | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
146
+ | BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
147
+ | jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
148
+
149
+ Benchmarks from [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
150
+
151
+ ---
152
+
153
+ ## License
154
+
155
+ Apache 2.0 β€” same as the original model.
156
+
157
+ ## Citation
158
+
159
+ ```bibtex
160
+ @software{TelePIX-PIXIE-Rune-v1,
161
+ title={PIXIE-Rune-v1.0},
162
+ author={TelePIX AI Research Team and Bongmin Kim},
163
+ year={2025},
164
+ url={https://huggingface.co/telepix/PIXIE-Rune-v1.0}
165
+ }
166
+ ```
167
+
168
+ ## Contact
169
+
170
+ Original model: bmkim@telepix.net
171
+ ONNX quantization: [cstr](https://huggingface.co/cstr) β€” issues welcome.