electroglyph
/

snowflake2_m_uint8

Sentence Similarity

sentence-transformers

Transformers.js

feature-extraction

snowflake2_m_uint8

Model card Files Files and versions

electroglyph commited on May 4, 2025

Commit

5cd7704

·

verified ·

1 Parent(s): 87023b6

Upload folder using huggingface_hub

Files changed (3) hide show

README.md +3 -10
quant_model.png +0 -0
snowflake2_m_uint8.onnx +1 -1

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ tags:
 - embedding
 - snowflake2_m_uint8
 - snowflake
 license: apache-2.0
 language:
 - af
@@ -86,12 +87,6 @@ language:
 - yo
 - zh
 ---
-# NOTICE
-Currently benchmarking this, not sure how accurate it is yet. I'll be updating this.
-Update: Still testing, but this seems to be pretty close to where it should be. I might be able to improve it by 1-2%.
 # snowflake2_m_uint8
 This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0
@@ -100,13 +95,11 @@ I have added a linear quantization node before the `sentence_embedding` output s
 This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections.
-No benchmarks, but in my limited testing it's exactly equivalent to the FP32 output of the uint8 quantized ONNX model.
 # Quantization method
-I ran every token through the unmodified uint8 ONNX model and logged the highest/lowest FP32 value seen in the output tensor.
-This approximate range is from -0.25 to 0.31.  I adjusted the zero point and quantized according to that scale directly in this ONNX model.
 Here's what the graph of the original output looks like:

 - embedding
 - snowflake2_m_uint8
 - snowflake
+- transformers.js
 license: apache-2.0
 language:
 - af
 - yo
 - zh
 ---
 # snowflake2_m_uint8
 This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0
 This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections.
+No benchmarks, but it in my limited testing it's exactly equivalent to the FP32 output of the uint8 quantized ONNX model.
 # Quantization method
+Determined value range for FP32 tensor is -.31 to 0.31.  I quantized according to that scale directly in this ONNX model.
 Here's what the graph of the original output looks like:

quant_model.png CHANGED Viewed

snowflake2_m_uint8.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b6eaaab219a11856e35ef2185adc6e8cb283b8fa6dc30ad8c8a5792ecd967a9a
 size 310916367

 version https://git-lfs.github.com/spec/v1
+oid sha256:0ef454690f03d5b63dc333b2cb41611fa26d64a3efe7e0b21947cb953cce2a43
 size 310916367