electroglyph commited on
Commit
5cd7704
·
verified ·
1 Parent(s): 87023b6

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +3 -10
  2. quant_model.png +0 -0
  3. snowflake2_m_uint8.onnx +1 -1
README.md CHANGED
@@ -9,6 +9,7 @@ tags:
9
  - embedding
10
  - snowflake2_m_uint8
11
  - snowflake
 
12
  license: apache-2.0
13
  language:
14
  - af
@@ -86,12 +87,6 @@ language:
86
  - yo
87
  - zh
88
  ---
89
- # NOTICE
90
-
91
- Currently benchmarking this, not sure how accurate it is yet. I'll be updating this.
92
-
93
- Update: Still testing, but this seems to be pretty close to where it should be. I might be able to improve it by 1-2%.
94
-
95
  # snowflake2_m_uint8
96
 
97
  This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0
@@ -100,13 +95,11 @@ I have added a linear quantization node before the `sentence_embedding` output s
100
 
101
  This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections.
102
 
103
- No benchmarks, but in my limited testing it's exactly equivalent to the FP32 output of the uint8 quantized ONNX model.
104
 
105
  # Quantization method
106
 
107
- I ran every token through the unmodified uint8 ONNX model and logged the highest/lowest FP32 value seen in the output tensor.
108
-
109
- This approximate range is from -0.25 to 0.31. I adjusted the zero point and quantized according to that scale directly in this ONNX model.
110
 
111
  Here's what the graph of the original output looks like:
112
 
 
9
  - embedding
10
  - snowflake2_m_uint8
11
  - snowflake
12
+ - transformers.js
13
  license: apache-2.0
14
  language:
15
  - af
 
87
  - yo
88
  - zh
89
  ---
 
 
 
 
 
 
90
  # snowflake2_m_uint8
91
 
92
  This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0
 
95
 
96
  This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections.
97
 
98
+ No benchmarks, but it in my limited testing it's exactly equivalent to the FP32 output of the uint8 quantized ONNX model.
99
 
100
  # Quantization method
101
 
102
+ Determined value range for FP32 tensor is -.31 to 0.31. I quantized according to that scale directly in this ONNX model.
 
 
103
 
104
  Here's what the graph of the original output looks like:
105
 
quant_model.png CHANGED
snowflake2_m_uint8.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b6eaaab219a11856e35ef2185adc6e8cb283b8fa6dc30ad8c8a5792ecd967a9a
3
  size 310916367
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ef454690f03d5b63dc333b2cb41611fa26d64a3efe7e0b21947cb953cce2a43
3
  size 310916367