nijaru commited on
Commit
b1f8f7c
·
verified ·
1 Parent(s): 4315808

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +61 -0
  2. model_int8.onnx +3 -0
  3. tokenizer.json +0 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: jinaai/jina-embeddings-v2-base-code
4
+ tags:
5
+ - onnx
6
+ - int8
7
+ - quantized
8
+ - code-embeddings
9
+ - sentence-transformers
10
+ library_name: onnxruntime
11
+ pipeline_tag: feature-extraction
12
+ ---
13
+
14
+ # jina-embeddings-v2-base-code (INT8 Quantized)
15
+
16
+ INT8 dynamically quantized version of [jinaai/jina-embeddings-v2-base-code](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) for efficient CPU inference.
17
+
18
+ ## Model Details
19
+
20
+ | Property | Value |
21
+ |----------|-------|
22
+ | Base Model | jinaai/jina-embeddings-v2-base-code |
23
+ | Quantization | INT8 (dynamic) |
24
+ | Size | 154 MB (vs 612 MB fp32) |
25
+ | Dimensions | 768 |
26
+ | Max Tokens | 8192 |
27
+ | Languages | English + 30 programming languages |
28
+
29
+ ## Usage
30
+
31
+ ```python
32
+ import onnxruntime as ort
33
+ from huggingface_hub import hf_hub_download
34
+ from tokenizers import Tokenizer
35
+ import numpy as np
36
+
37
+ # Load
38
+ tokenizer = Tokenizer.from_file(hf_hub_download("nijaru/jina-code-int8", "tokenizer.json"))
39
+ tokenizer.enable_padding(pad_id=0, pad_token="[PAD]")
40
+ tokenizer.enable_truncation(max_length=512)
41
+ session = ort.InferenceSession(hf_hub_download("nijaru/jina-code-int8", "model_int8.onnx"))
42
+
43
+ def embed(texts):
44
+ encoded = tokenizer.encode_batch(texts)
45
+ input_ids = np.array([e.ids for e in encoded], dtype=np.int64)
46
+ attention_mask = np.array([e.attention_mask for e in encoded], dtype=np.int64)
47
+ outputs = session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})
48
+ embeddings = outputs[0]
49
+ mask = attention_mask[:, :, np.newaxis]
50
+ return (embeddings * mask).sum(axis=1) / mask.sum(axis=1)
51
+
52
+ embeddings = embed(["def hello(): pass", "authentication flow"])
53
+ ```
54
+
55
+ ## License
56
+
57
+ Apache-2.0 (same as base model)
58
+
59
+ ## Attribution
60
+
61
+ Quantized from [jinaai/jina-embeddings-v2-base-code](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) by Jina AI.
model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66bf87bf5d75595f8b7278be1ae9a770e69d58fd7e78a4661307a017f5c7b309
3
+ size 161297497
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff