faxenoff
/

code-daemon-embed-v1

@@ -68,8 +68,10 @@ This model trades long-context capability for raw throughput on short code units
   passage embeddings, unlike the teacher whose prefix is query-only). Mean-pool → **L2-normalize**.
 - For smaller indexes, truncate to **256** or **512** dims (MRL) before normalizing.
-Primarily consumed by the UltraCode daemon via the bundled engines. For standalone use, run
-`model.onnx` with `onnxruntime` + the bundled `sentencepiece.bpe.model`:
 ```python
 import onnxruntime as ort, sentencepiece as spm, numpy as np
@@ -99,8 +101,9 @@ hardware. No compilation on the user's machine.
 - **TVM** `*_tvm_vulkan.{dll,so}` — Vulkan fallback for non-TRT / older NVIDIA & other GPUs, per bucket.
 - **OpenVINO** `*.xml` + `*.bin` — Intel **CPU / iGPU / NPU**, per bucket.
 - **Metal** `*_tvm_metal.*` — Apple Silicon (macOS), per bucket.
-- **Source / tokenizer** — `model.onnx` (+ `model.onnx.data`) FP32 · `model_int8qdt.onnx` INT8 Q/DQ ·
-  `sentencepiece.bpe.model` · `tokenizer.json`.
 ## Evaluation — in-scope CoIR (sub-CoIR)

   passage embeddings, unlike the teacher whose prefix is query-only). Mean-pool → **L2-normalize**.
 - For smaller indexes, truncate to **256** or **512** dims (MRL) before normalizing.
+The daemon runs the bundled engines directly (this repo is its CDN). The embedding recipe below is
+illustrative — `model.onnx` is **not bundled** here; it shows how an engine maps text → vector
+(tokenize with the bundled `sentencepiece.bpe.model`, run, the pooled `[B,768]` is already produced,
+then L2-normalize):
 ```python
 import onnxruntime as ort, sentencepiece as spm, numpy as np
 - **TVM** `*_tvm_vulkan.{dll,so}` — Vulkan fallback for non-TRT / older NVIDIA & other GPUs, per bucket.
 - **OpenVINO** `*.xml` + `*.bin` — Intel **CPU / iGPU / NPU**, per bucket.
 - **Metal** `*_tvm_metal.*` — Apple Silicon (macOS), per bucket.
+- **Tokenizer** — `sentencepiece.bpe.model` (the model's SentencePiece; specials baked at
+  pad=0 / unk=1 / bos=2 / eos=3, byte-fallback) + `tokenizer_config.json`. The daemon loads the SP
+  directly; the FP32 `model.onnx` source is not bundled here (this repo is the daemon's engine CDN).
 ## Evaluation — in-scope CoIR (sub-CoIR)