Feature Extraction
Transformers
ONNX
nomic_bert
quantized
int8
code-search
embedding
nomic-bert
text-embeddings-inference
Instructions to use mrsladoje/CodeRankEmbed-onnx-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mrsladoje/CodeRankEmbed-onnx-int8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="mrsladoje/CodeRankEmbed-onnx-int8")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("mrsladoje/CodeRankEmbed-onnx-int8") model = AutoModel.from_pretrained("mrsladoje/CodeRankEmbed-onnx-int8") - Notebooks
- Google Colab
- Kaggle
fix(quantization): re-quantize with reduce_range=True
Browse filesOriginal v1 quantization omitted reduce_range=True, producing
full-range INT8 weights (-128..127). On pre-VNNI x86 CPUs
(AMD Zen 3 / EPYC 7543, older Intel), ORT's AVX2-only INT8
kernel has an int16 accumulator that overflows on full-range
weights, producing degenerate embeddings (all inputs collapse
to a near-constant manifold).
reduce_range=True clamps weights to [-64, 63], leaving 1 bit
of accumulator headroom. VNNI CPUs (Intel Cascade Lake+, AMD
Zen 4+) and Apple Silicon are unaffected — they use different
kernel paths that handle full-range INT8 natively.
New SHA256: 4eae31d09b1843103a1ebd5e2b2e24b5a5cad441a33906b35b12b1e2ed91d1db
- onnx/model.onnx +1 -1
onnx/model.onnx
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 138619279
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4eae31d09b1843103a1ebd5e2b2e24b5a5cad441a33906b35b12b1e2ed91d1db
|
| 3 |
size 138619279
|