sasha-smirnov commited on
Commit
ac61f3b
·
verified ·
1 Parent(s): 1d89473

Initial publish via td-embeddings

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - ar
5
+ - az
6
+ - bg
7
+ - bn
8
+ - ca
9
+ - cs
10
+ - da
11
+ - de
12
+ - el
13
+ - en
14
+ - es
15
+ - et
16
+ - fa
17
+ - fi
18
+ - fr
19
+ - he
20
+ - hi
21
+ - hr
22
+ - hu
23
+ - id
24
+ - is
25
+ - it
26
+ - ja
27
+ - ka
28
+ - kk
29
+ - km
30
+ - ko
31
+ - lt
32
+ - lv
33
+ - mr
34
+ - ms
35
+ - nl
36
+ - no
37
+ - pl
38
+ - pt
39
+ - ro
40
+ - ru
41
+ - sk
42
+ - sl
43
+ - sq
44
+ - sr
45
+ - sv
46
+ - sw
47
+ - te
48
+ - th
49
+ - tl
50
+ - tr
51
+ - uk
52
+ - ur
53
+ - uz
54
+ - vi
55
+ - zh
56
+ library_name: sentence-transformers
57
+ pipeline_tag: feature-extraction
58
+ base_model: google/embeddinggemma-300m
59
+ tags:
60
+ - onnx
61
+ - teradata
62
+ - byom
63
+ - embeddings
64
+ - feature-extraction
65
+ - gemma
66
+ - gemma3
67
+ extra_gated_heading: Access embeddinggemma-300m on Hugging Face
68
+ extra_gated_prompt: >-
69
+ To access this model on Hugging Face, you must review and agree to
70
+ Google's usage license. To do this, please ensure you're logged in
71
+ to Hugging Face and click below. Requests are processed immediately.
72
+ extra_gated_button_content: Acknowledge license
73
+ extra_gated_fields:
74
+ Name: text
75
+ Affiliation: text
76
+ Country: country
77
+ Use case: text
78
+ I agree to use this model in accordance with the Gemma terms of use: checkbox
79
+ ---
80
+
81
+
82
+
83
+ > Read the disclaimer below before using this model.
84
+
85
+ ----
86
+
87
+ # embeddinggemma-300m -- ONNX for Teradata BYOM
88
+
89
+ This repository hosts an **ONNX-converted** version of the upstream
90
+ model [`google/embeddinggemma-300m`](https://huggingface.co/google/embeddinggemma-300m),
91
+ packaged for the Teradata Vantage `mldb.ONNXEmbeddings` BYOM
92
+ function. It is **not** the original PyTorch model -- only the
93
+ inference graph and tokenizer needed for in-database embedding
94
+ generation.
95
+
96
+ What's different from upstream:
97
+
98
+ - **Format**: ONNX (opset 14, IR version 8 -- BYOM 6+ compatible),
99
+ produced from the upstream weights with architecture-aware
100
+ post-processing baked in.
101
+ - **Precision**: dynamic int8 quantization. See the variants table
102
+ below for what is shipped for this model.
103
+ - **Pooling and post-processing**: this graph emits the raw
104
+ `sentence_embedding` tensor. Pooling rule is
105
+ **mean**.
106
+ - **Verification**: every variant's cosine fidelity vs. the
107
+ upstream PyTorch reference is recorded on a fixed FLORES-200
108
+ sample. Numbers may not generalize to your data.
109
+
110
+ ## Model details
111
+
112
+ | | |
113
+ |---|---|
114
+ | Upstream repo | [`google/embeddinggemma-300m`](https://huggingface.co/google/embeddinggemma-300m) |
115
+ | Architecture | `Gemma3TextModel` (encoder) |
116
+ | Parameters | 302,863,104 |
117
+ | Output dimensions | 768 |
118
+ | Pooling | `mean` |
119
+ | Instruction prefix | no |
120
+ | Max input tokens (advertised) | 2048 |
121
+ | Languages | 100 |
122
+ | License | gemma |
123
+ | ONNX opset | 14 |
124
+ | ONNX IR version | 8 (BYOM 6+ compatible) |
125
+
126
+ <details>
127
+ <summary>Full language list (100)</summary>
128
+
129
+ - `ar`
130
+ - `az`
131
+ - `bg`
132
+ - `bn`
133
+ - `ca`
134
+ - `cs`
135
+ - `da`
136
+ - `de`
137
+ - `el`
138
+ - `en`
139
+ - `es`
140
+ - `et`
141
+ - `fa`
142
+ - `fi`
143
+ - `fr`
144
+ - `he`
145
+ - `hi`
146
+ - `hr`
147
+ - `hu`
148
+ - `id`
149
+ - `is`
150
+ - `it`
151
+ - `ja`
152
+ - `ka`
153
+ - `kk`
154
+ - `km`
155
+ - `ko`
156
+ - `lt`
157
+ - `lv`
158
+ - `mr`
159
+ - `ms`
160
+ - `nl`
161
+ - `no`
162
+ - `pl`
163
+ - `pt`
164
+ - `ro`
165
+ - `ru`
166
+ - `sk`
167
+ - `sl`
168
+ - `sq`
169
+ - `sr`
170
+ - `sv`
171
+ - `sw`
172
+ - `te`
173
+ - `th`
174
+ - `tl`
175
+ - `tr`
176
+ - `uk`
177
+ - `ur`
178
+ - `uz`
179
+ - `vi`
180
+ - `zh`
181
+
182
+ </details>
183
+
184
+ ## Quantization variants
185
+
186
+ This repository ships the following variants. Quality numbers are
187
+ measured against the upstream PyTorch reference on a fixed
188
+ FLORES-200 sample. The **Size** column is the on-disk size of the
189
+ ONNX weight file in megabytes (MB, 10^6 bytes).
190
+
191
+ | Variant | Size (MB) | p50 cosine | R@1 |
192
+ |---|---|---|---|
193
+ | `fp32` | 1231.6 | 1.000000 | — |
194
+ | `ffn_skip` | 508.2 | 0.991893 | 0.853 |
195
+
196
+
197
+ How to read the quality columns:
198
+
199
+ - **p50 cosine** is the median cosine similarity between this
200
+ variant's embeddings and the fp32 ONNX reference, computed over
201
+ a fixed evaluation set. Higher means closer to the unquantized
202
+ model; **1.0** is identical.
203
+ - **R@1** is top-1 retrieval consistency: if you use this variant
204
+ as a search index, R@1 is the fraction of queries that get the
205
+ same nearest neighbor as the fp32 reference would. Higher is
206
+ better.
207
+
208
+ Notes:
209
+ - **fp32**: full-precision reference. Useful for an accuracy ceiling,
210
+ but BYOM users almost always want one of the int8 variants for
211
+ in-database scoring -- they are 3-4x smaller and load much faster.
212
+ - **ffn_skip**: dynamic int8 with the feed-forward (FFN) MatMul
213
+ layers kept in **fp32**, while attention and projection MatMuls
214
+ stay quantized. The FFN layers are where most of the quantization
215
+ error in transformer blocks concentrates; leaving them in fp32
216
+ recovers most of the quality loss for a modest size increase.
217
+ The artifact is roughly **3x smaller than fp32** (larger than the
218
+ per_channel int8 sibling).
219
+
220
+ ## Quickstart: using this model with Teradata BYOM
221
+
222
+ Requires Teradata Vantage with **BYOM 6+** (`mldb.ONNXEmbeddings`).
223
+
224
+ ```python
225
+ import getpass
226
+ import teradataml as tdml
227
+ from huggingface_hub import hf_hub_download
228
+
229
+ repo_id = "Teradata/embeddinggemma-300m"
230
+ model_id = "embeddinggemma-300m" # arbitrary, used as the BYOM model_id
231
+ onnx_file = "onnx/model-ffn_skip.onnx"
232
+
233
+ # 1. Download the ONNX + tokenizer for the chosen variant.
234
+ hf_hub_download(repo_id=repo_id, filename=onnx_file, local_dir="./")
235
+ hf_hub_download(repo_id=repo_id, filename="tokenizer.json", local_dir="./")
236
+
237
+ # 2. Connect to Vantage.
238
+ tdml.create_context(
239
+ host=input("host: "),
240
+ username=input("user: "),
241
+ password=getpass.getpass("password: "),
242
+ )
243
+
244
+ # 3. Load model + tokenizer into BYOM tables (one-time per model_id).
245
+ tdml.save_byom(model_id=model_id, model_file=onnx_file,
246
+ table_name="embeddings_models")
247
+ tdml.save_byom(model_id=model_id, model_file="tokenizer.json",
248
+ table_name="embeddings_tokenizers")
249
+ ```
250
+
251
+ Then call `mldb.ONNXEmbeddings` against an input table whose
252
+ `txt` column carries the strings to embed:
253
+
254
+ ```sql
255
+ SELECT *
256
+ FROM mldb.ONNXEmbeddings(
257
+ ON (SELECT id, txt FROM your_input_table) AS InputTable
258
+ ON (SELECT model_id, model FROM embeddings_models
259
+ WHERE model_id = 'embeddinggemma-300m') AS ModelTable DIMENSION
260
+ ON (SELECT model_id, tokenizer FROM embeddings_tokenizers
261
+ WHERE model_id = 'embeddinggemma-300m') AS TokenizerTable DIMENSION
262
+ USING
263
+ Accumulate('id')
264
+ ModelOutputTensor('sentence_embedding')
265
+ OutputFormat('FLOAT32(768)')
266
+ OverwriteCachedModel('*')
267
+ ) AS t
268
+ ORDER BY id;
269
+ ```
270
+
271
+ Pooling rule **`mean`** is applied **inside** the converted
272
+ ONNX graph -- the output tensor named above already contains the
273
+ pooled, post-processed embedding vector.
274
+
275
+ ## Original model attribution
276
+
277
+ The original weights and training methodology belong to
278
+ **Google**. Please cite their work, not this
279
+ repository, in academic contexts. The canonical upstream model card
280
+ is at
281
+ [`google/embeddinggemma-300m`](https://huggingface.co/google/embeddinggemma-300m);
282
+ refer to it for benchmarks, training details, intended use, and
283
+ citation information.
284
+
285
+ ## Reporting issues
286
+
287
+ For ONNX-conversion or BYOM-compatibility issues specific to this
288
+ Teradata-converted artifact, please open a **Discussion** on this
289
+ model's Hugging Face page. Questions about the underlying model
290
+ quality, training, or intended use should go to the upstream
291
+ maintainer's model card.
292
+
293
+ ----
294
+
295
+ DISCLAIMER: The content herein ("Content") is provided "AS IS" and is not covered by any Teradata Operations, Inc. and its affiliates ("Teradata") agreements. Its listing here does not constitute certification or endorsement by Teradata.
296
+
297
+ To the extent any of the Content contains or is related to any artificial intelligence ("AI") or other language learning models ("Models") that interoperate with the products and services of Teradata, by accessing, bringing, deploying or using such Models, you acknowledge and agree that you are solely responsible for ensuring compliance with all applicable laws, regulations, and restrictions governing the use, deployment, and distribution of AI technologies. This includes, but is not limited to, AI Diffusion Rules, European Union AI Act, AI-related laws and regulations, privacy laws, export controls, and financial or sector-specific regulations.
298
+
299
+ While Teradata may provide support, guidance, or assistance in the deployment or implementation of Models to interoperate with Teradata's products and/or services, you remain fully responsible for ensuring that your Models, data, and applications comply with all relevant legal and regulatory obligations. Our assistance does not constitute legal or regulatory approval, and Teradata disclaims any liability arising from non-compliance with applicable laws.
300
+
301
+ You must determine the suitability of the Models for any purpose. Given the probabilistic nature of machine learning and modeling, the use of the Models may in some situations result in incorrect output that does not accurately reflect the action generated. You should evaluate the accuracy of any output as appropriate for your use case, including by using human review of the output.
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3TextModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "float32",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1152,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention"
43
+ ],
44
+ "max_position_embeddings": 2048,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 3,
47
+ "num_hidden_layers": 24,
48
+ "num_key_value_heads": 1,
49
+ "pad_token_id": 0,
50
+ "query_pre_attn_scalar": 256,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_local_base_freq": 10000.0,
53
+ "rope_scaling": null,
54
+ "rope_theta": 1000000.0,
55
+ "sliding_window": 512,
56
+ "transformers_version": "4.57.0.dev0",
57
+ "use_bidirectional_attention": true,
58
+ "use_cache": true,
59
+ "vocab_size": 262144
60
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cache_implementation": "hybrid",
3
+ "do_sample": true,
4
+ "top_k": 64,
5
+ "top_p": 0.95,
6
+ "transformers_version": "4.57.0.dev0"
7
+ }
onnx/model-ffn_skip.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58bbd79162842fc17d9449b09a6cf2c3dc118c20f26ebf7a1039d21be5dea756
3
+ size 508217309
onnx/model-fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75acaf07c82e0de0d80826e93fb66863e1d425dcd8452fa44383ee50054f6ef5
3
+ size 1231605612
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6852f8d561078cc0cebe70ca03c5bfdd0d60a45f9d2e0e1e4cc05b68e9ec329e
3
+ size 33385008
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff