card: model.onnx + INT8 ONNX are bundled (fix not-bundled note)
Browse files
README.md
CHANGED
|
@@ -68,10 +68,10 @@ This model trades long-context capability for raw throughput on short code units
|
|
| 68 |
passage embeddings, unlike the teacher whose prefix is query-only). Mean-pool β **L2-normalize**.
|
| 69 |
- For smaller indexes, truncate to **256** or **512** dims (MRL) before normalizing.
|
| 70 |
|
| 71 |
-
The daemon runs the bundled engines directly (this repo is its CDN)
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
|
| 76 |
```python
|
| 77 |
import onnxruntime as ort, sentencepiece as spm, numpy as np
|
|
@@ -102,8 +102,9 @@ hardware. No compilation on the user's machine.
|
|
| 102 |
- **OpenVINO** `*.xml` + `*.bin` β Intel **CPU / iGPU / NPU**, per bucket.
|
| 103 |
- **Metal** `*_tvm_metal.*` β Apple Silicon (macOS), per bucket.
|
| 104 |
- **Tokenizer** β `sentencepiece.bpe.model` (the model's SentencePiece; specials baked at
|
| 105 |
-
pad=0 / unk=1 / bos=2 / eos=3, byte-fallback) + `tokenizer_config.json`. The daemon loads the SP
|
| 106 |
-
|
|
|
|
| 107 |
|
| 108 |
## Evaluation β in-scope CoIR (sub-CoIR)
|
| 109 |
|
|
|
|
| 68 |
passage embeddings, unlike the teacher whose prefix is query-only). Mean-pool β **L2-normalize**.
|
| 69 |
- For smaller indexes, truncate to **256** or **512** dims (MRL) before normalizing.
|
| 70 |
|
| 71 |
+
The daemon runs the bundled engines directly (this repo is its CDN), but the FP32 `model.onnx` is
|
| 72 |
+
**also bundled** for standalone use. The recipe below runs it with `onnxruntime`: tokenize with the
|
| 73 |
+
bundled `sentencepiece.bpe.model`, run, and the pooled `[B,768]` is already produced β just
|
| 74 |
+
L2-normalize:
|
| 75 |
|
| 76 |
```python
|
| 77 |
import onnxruntime as ort, sentencepiece as spm, numpy as np
|
|
|
|
| 102 |
- **OpenVINO** `*.xml` + `*.bin` β Intel **CPU / iGPU / NPU**, per bucket.
|
| 103 |
- **Metal** `*_tvm_metal.*` β Apple Silicon (macOS), per bucket.
|
| 104 |
- **Tokenizer** β `sentencepiece.bpe.model` (the model's SentencePiece; specials baked at
|
| 105 |
+
pad=0 / unk=1 / bos=2 / eos=3, byte-fallback) + `tokenizer_config.json`. The daemon loads the SP directly.
|
| 106 |
+
- **ONNX source** β `model.onnx` (+ `model.onnx.data`) FP32 and `model_int8qdt.onnx` (INT8 W8A16) β for
|
| 107 |
+
standalone `onnxruntime` / optimum use, and the source the engines are compiled from.
|
| 108 |
|
| 109 |
## Evaluation β in-scope CoIR (sub-CoIR)
|
| 110 |
|