Neelkumar commited on
Commit
181a99e
·
verified ·
1 Parent(s): 589f582

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 768,
3
+ "out_features": 3072,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80963c980f2e2b9f49cf453c69746e9c55ecf45801a2efd02fb7ed9109b373fe
3
+ size 9437272
3_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 3072,
3
+ "out_features": 768,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
3_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bd3167984b0133d1b7387604e923dd9cceaeab71485363ab395bde5300f6495
3
+ size 9437272
README.md ADDED
@@ -0,0 +1,389 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:1000
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: google/embeddinggemma-300m
11
+ widget:
12
+ - source_sentence: Qu'est-ce qui a motivé le retour de Claude LeBouthilier au Nouveau-Brunswick?
13
+ sentences:
14
+ - The driver of a vehicle that is approaching a railway crossing at which a stop
15
+ sign has been erected shall stop the vehicle within fifteen metres, but not less
16
+ than five metres, from the nearest rail of the railway.
17
+ - Je suis revenu vivre au Nouveau-Brunswick parce que je ne pouvais plus dissocier
18
+ mon écriture de mon lieu d’origine et de mon existence quotidienne.
19
+ - Quelles sont les procédures pour obtenir un passeport canadien?
20
+ - source_sentence: Quels sont les moyens de dépistage du cancer du col de l'utérus?
21
+ sentences:
22
+ - Comprendre les différences entre le test Pap et le test VPH.
23
+ - Employed and self-employed Nova Scotians who are not receiving Employment Insurance
24
+ (EI) and those who had or are in an EI waiting period may qualify for this relief
25
+ grant.
26
+ - Quelles sont les conditions pour obtenir une allocation familiale?
27
+ - source_sentence: What are the responsibilities of crew members regarding surface
28
+ contamination?
29
+ sentences:
30
+ - What are the requirements for obtaining a Canadian passport?
31
+ - Crew members are responsible to report suspected surface contamination to the
32
+ pilot-in-command as soon as it is discovered.
33
+ - Plant breeders receive legal protection for up to 25 years for trees and vines,
34
+ and 20 years for other plant varieties.
35
+ - source_sentence: Do oil and gas field workers have the same rights to consecutive
36
+ hours off as other employees in BC?
37
+ sentences:
38
+ - The provision of the Act which provides for 32 consecutive hours free from work
39
+ each week does not apply to employees referred to in section 37.6 of this regulation.
40
+ - Les nouveaux bureaux internationaux offriront des services pour faciliter l'investissement
41
+ dans la Saskatchewan et améliorer les exportations vers l'Asie.
42
+ - What are the requirements for registering a new business in British Columbia?
43
+ - source_sentence: What is the purpose of the funding provided by the Government of
44
+ Canada to the Federation of Black Canadians?
45
+ sentences:
46
+ - Ghana is an attractive market for industries such as Agriculture, Professional
47
+ Training, Technical and vocational education and training (TVET), Clean technologies,
48
+ Infrastructure, Mining, and Oil and gas.
49
+ - What are the eligibility requirements for the Canada Pension Plan?
50
+ - This investment through the Black Entrepreneurship Program (BEP) Ecosystem Fund
51
+ will allow the FBC to provide tools and resources to 170 Black youth entrepreneurs
52
+ across multiple regions, supporting them to successfully launch and grow their
53
+ businesses.
54
+ pipeline_tag: sentence-similarity
55
+ library_name: sentence-transformers
56
+ ---
57
+
58
+ # SentenceTransformer based on google/embeddinggemma-300m
59
+
60
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
61
+
62
+ ## Model Details
63
+
64
+ ### Model Description
65
+ - **Model Type:** Sentence Transformer
66
+ - **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) <!-- at revision 671e8c118e27f9061355bce059ee2d1d86d048df -->
67
+ - **Maximum Sequence Length:** 2048 tokens
68
+ - **Output Dimensionality:** 768 dimensions
69
+ - **Similarity Function:** Cosine Similarity
70
+ <!-- - **Training Dataset:** Unknown -->
71
+ <!-- - **Language:** Unknown -->
72
+ <!-- - **License:** Unknown -->
73
+
74
+ ### Model Sources
75
+
76
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
77
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
78
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
79
+
80
+ ### Full Model Architecture
81
+
82
+ ```
83
+ SentenceTransformer(
84
+ (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
85
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
86
+ (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
87
+ (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
88
+ (4): Normalize()
89
+ )
90
+ ```
91
+
92
+ ## Usage
93
+
94
+ ### Direct Usage (Sentence Transformers)
95
+
96
+ First install the Sentence Transformers library:
97
+
98
+ ```bash
99
+ pip install -U sentence-transformers
100
+ ```
101
+
102
+ Then you can load this model and run inference.
103
+ ```python
104
+ from sentence_transformers import SentenceTransformer
105
+
106
+ # Download from the 🤗 Hub
107
+ model = SentenceTransformer("Neelkumar/my-embedding-gemma-1000")
108
+ # Run inference
109
+ queries = [
110
+ "What is the purpose of the funding provided by the Government of Canada to the Federation of Black Canadians?",
111
+ ]
112
+ documents = [
113
+ 'This investment through the Black Entrepreneurship Program (BEP) Ecosystem Fund will allow the FBC to provide tools and resources to 170 Black youth entrepreneurs across multiple regions, supporting them to successfully launch and grow their businesses.',
114
+ 'What are the eligibility requirements for the Canada Pension Plan?',
115
+ 'Ghana is an attractive market for industries such as Agriculture, Professional Training, Technical and vocational education and training (TVET), Clean technologies, Infrastructure, Mining, and Oil and gas.',
116
+ ]
117
+ query_embeddings = model.encode_query(queries)
118
+ document_embeddings = model.encode_document(documents)
119
+ print(query_embeddings.shape, document_embeddings.shape)
120
+ # [1, 768] [3, 768]
121
+
122
+ # Get the similarity scores for the embeddings
123
+ similarities = model.similarity(query_embeddings, document_embeddings)
124
+ print(similarities)
125
+ # tensor([[ 0.9830, -0.5013, 0.8960]])
126
+ ```
127
+
128
+ <!--
129
+ ### Direct Usage (Transformers)
130
+
131
+ <details><summary>Click to see the direct usage in Transformers</summary>
132
+
133
+ </details>
134
+ -->
135
+
136
+ <!--
137
+ ### Downstream Usage (Sentence Transformers)
138
+
139
+ You can finetune this model on your own dataset.
140
+
141
+ <details><summary>Click to expand</summary>
142
+
143
+ </details>
144
+ -->
145
+
146
+ <!--
147
+ ### Out-of-Scope Use
148
+
149
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
150
+ -->
151
+
152
+ <!--
153
+ ## Bias, Risks and Limitations
154
+
155
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
156
+ -->
157
+
158
+ <!--
159
+ ### Recommendations
160
+
161
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
162
+ -->
163
+
164
+ ## Training Details
165
+
166
+ ### Training Dataset
167
+
168
+ #### Unnamed Dataset
169
+
170
+ * Size: 1,000 training samples
171
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
172
+ * Approximate statistics based on the first 1000 samples:
173
+ | | anchor | positive | negative |
174
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
175
+ | type | string | string | string |
176
+ | details | <ul><li>min: 6 tokens</li><li>mean: 15.8 tokens</li><li>max: 35 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 32.04 tokens</li><li>max: 130 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 15.01 tokens</li><li>max: 42 tokens</li></ul> |
177
+ * Samples:
178
+ | anchor | positive | negative |
179
+ |:--------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
180
+ | <code>Quelles mesures les propriétaires peuvent-ils prendre pour éliminer les punaises de lit?</code> | <code>Les propriétaires peuvent instaurer différentes mesures pour prévenir et éliminer les punaises des lits.</code> | <code>Quelles sont les conditions pour obtenir une assurance automobile?</code> |
181
+ | <code>Comment les pages web du gouvernement de la Saskatchewan sont-elles traduites en français?</code> | <code>Un certain nombre de pages sur le site web du gouvernement de la Saskatchewan ont été traduites professionnellement en français.</code> | <code>Quelles sont les exigences pour obtenir un permis de conduire?</code> |
182
+ | <code>How long do plant breeders' rights last in Canada?</code> | <code>Plant breeders receive legal protection for up to 25 years for trees and vines, and 20 years for other plant varieties.</code> | <code>What are the requirements for importing a pet into Canada?</code> |
183
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
184
+ ```json
185
+ {
186
+ "scale": 20.0,
187
+ "similarity_fct": "cos_sim",
188
+ "gather_across_devices": false
189
+ }
190
+ ```
191
+
192
+ ### Training Hyperparameters
193
+ #### Non-Default Hyperparameters
194
+
195
+ - `per_device_train_batch_size`: 1
196
+ - `learning_rate`: 2e-05
197
+ - `num_train_epochs`: 5
198
+ - `warmup_ratio`: 0.1
199
+ - `prompts`: task: sentence similarity | query:
200
+
201
+ #### All Hyperparameters
202
+ <details><summary>Click to expand</summary>
203
+
204
+ - `overwrite_output_dir`: False
205
+ - `do_predict`: False
206
+ - `eval_strategy`: no
207
+ - `prediction_loss_only`: True
208
+ - `per_device_train_batch_size`: 1
209
+ - `per_device_eval_batch_size`: 8
210
+ - `per_gpu_train_batch_size`: None
211
+ - `per_gpu_eval_batch_size`: None
212
+ - `gradient_accumulation_steps`: 1
213
+ - `eval_accumulation_steps`: None
214
+ - `torch_empty_cache_steps`: None
215
+ - `learning_rate`: 2e-05
216
+ - `weight_decay`: 0.0
217
+ - `adam_beta1`: 0.9
218
+ - `adam_beta2`: 0.999
219
+ - `adam_epsilon`: 1e-08
220
+ - `max_grad_norm`: 1.0
221
+ - `num_train_epochs`: 5
222
+ - `max_steps`: -1
223
+ - `lr_scheduler_type`: linear
224
+ - `lr_scheduler_kwargs`: {}
225
+ - `warmup_ratio`: 0.1
226
+ - `warmup_steps`: 0
227
+ - `log_level`: passive
228
+ - `log_level_replica`: warning
229
+ - `log_on_each_node`: True
230
+ - `logging_nan_inf_filter`: True
231
+ - `save_safetensors`: True
232
+ - `save_on_each_node`: False
233
+ - `save_only_model`: False
234
+ - `restore_callback_states_from_checkpoint`: False
235
+ - `no_cuda`: False
236
+ - `use_cpu`: False
237
+ - `use_mps_device`: False
238
+ - `seed`: 42
239
+ - `data_seed`: None
240
+ - `jit_mode_eval`: False
241
+ - `use_ipex`: False
242
+ - `bf16`: False
243
+ - `fp16`: False
244
+ - `fp16_opt_level`: O1
245
+ - `half_precision_backend`: auto
246
+ - `bf16_full_eval`: False
247
+ - `fp16_full_eval`: False
248
+ - `tf32`: None
249
+ - `local_rank`: 0
250
+ - `ddp_backend`: None
251
+ - `tpu_num_cores`: None
252
+ - `tpu_metrics_debug`: False
253
+ - `debug`: []
254
+ - `dataloader_drop_last`: False
255
+ - `dataloader_num_workers`: 0
256
+ - `dataloader_prefetch_factor`: None
257
+ - `past_index`: -1
258
+ - `disable_tqdm`: False
259
+ - `remove_unused_columns`: True
260
+ - `label_names`: None
261
+ - `load_best_model_at_end`: False
262
+ - `ignore_data_skip`: False
263
+ - `fsdp`: []
264
+ - `fsdp_min_num_params`: 0
265
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
266
+ - `fsdp_transformer_layer_cls_to_wrap`: None
267
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
268
+ - `parallelism_config`: None
269
+ - `deepspeed`: None
270
+ - `label_smoothing_factor`: 0.0
271
+ - `optim`: adamw_torch
272
+ - `optim_args`: None
273
+ - `adafactor`: False
274
+ - `group_by_length`: False
275
+ - `length_column_name`: length
276
+ - `ddp_find_unused_parameters`: None
277
+ - `ddp_bucket_cap_mb`: None
278
+ - `ddp_broadcast_buffers`: False
279
+ - `dataloader_pin_memory`: True
280
+ - `dataloader_persistent_workers`: False
281
+ - `skip_memory_metrics`: True
282
+ - `use_legacy_prediction_loop`: False
283
+ - `push_to_hub`: False
284
+ - `resume_from_checkpoint`: None
285
+ - `hub_model_id`: None
286
+ - `hub_strategy`: every_save
287
+ - `hub_private_repo`: None
288
+ - `hub_always_push`: False
289
+ - `hub_revision`: None
290
+ - `gradient_checkpointing`: False
291
+ - `gradient_checkpointing_kwargs`: None
292
+ - `include_inputs_for_metrics`: False
293
+ - `include_for_metrics`: []
294
+ - `eval_do_concat_batches`: True
295
+ - `fp16_backend`: auto
296
+ - `push_to_hub_model_id`: None
297
+ - `push_to_hub_organization`: None
298
+ - `mp_parameters`:
299
+ - `auto_find_batch_size`: False
300
+ - `full_determinism`: False
301
+ - `torchdynamo`: None
302
+ - `ray_scope`: last
303
+ - `ddp_timeout`: 1800
304
+ - `torch_compile`: False
305
+ - `torch_compile_backend`: None
306
+ - `torch_compile_mode`: None
307
+ - `include_tokens_per_second`: False
308
+ - `include_num_input_tokens_seen`: False
309
+ - `neftune_noise_alpha`: None
310
+ - `optim_target_modules`: None
311
+ - `batch_eval_metrics`: False
312
+ - `eval_on_start`: False
313
+ - `use_liger_kernel`: False
314
+ - `liger_kernel_config`: None
315
+ - `eval_use_gather_object`: False
316
+ - `average_tokens_across_devices`: False
317
+ - `prompts`: task: sentence similarity | query:
318
+ - `batch_sampler`: batch_sampler
319
+ - `multi_dataset_batch_sampler`: proportional
320
+ - `router_mapping`: {}
321
+ - `learning_rate_mapping`: {}
322
+
323
+ </details>
324
+
325
+ ### Training Logs
326
+ | Epoch | Step | Training Loss |
327
+ |:-----:|:----:|:-------------:|
328
+ | 1.0 | 1000 | 0.1065 |
329
+ | 2.0 | 2000 | 0.368 |
330
+ | 3.0 | 3000 | 0.2343 |
331
+ | 4.0 | 4000 | 0.1016 |
332
+ | 5.0 | 5000 | 0.0154 |
333
+
334
+
335
+ ### Framework Versions
336
+ - Python: 3.11.13
337
+ - Sentence Transformers: 5.1.1
338
+ - Transformers: 4.57.0.dev0
339
+ - PyTorch: 2.6.0+cu124
340
+ - Accelerate: 1.8.1
341
+ - Datasets: 3.6.0
342
+ - Tokenizers: 0.22.1
343
+
344
+ ## Citation
345
+
346
+ ### BibTeX
347
+
348
+ #### Sentence Transformers
349
+ ```bibtex
350
+ @inproceedings{reimers-2019-sentence-bert,
351
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
352
+ author = "Reimers, Nils and Gurevych, Iryna",
353
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
354
+ month = "11",
355
+ year = "2019",
356
+ publisher = "Association for Computational Linguistics",
357
+ url = "https://arxiv.org/abs/1908.10084",
358
+ }
359
+ ```
360
+
361
+ #### MultipleNegativesRankingLoss
362
+ ```bibtex
363
+ @misc{henderson2017efficient,
364
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
365
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
366
+ year={2017},
367
+ eprint={1705.00652},
368
+ archivePrefix={arXiv},
369
+ primaryClass={cs.CL}
370
+ }
371
+ ```
372
+
373
+ <!--
374
+ ## Glossary
375
+
376
+ *Clearly define terms in order to be accessible across audiences.*
377
+ -->
378
+
379
+ <!--
380
+ ## Model Card Authors
381
+
382
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
383
+ -->
384
+
385
+ <!--
386
+ ## Model Card Contact
387
+
388
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
389
+ -->
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<image_soft_token>": 262144
3
+ }
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3TextModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "float32",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1152,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention"
43
+ ],
44
+ "max_position_embeddings": 2048,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 3,
47
+ "num_hidden_layers": 24,
48
+ "num_key_value_heads": 1,
49
+ "pad_token_id": 0,
50
+ "query_pre_attn_scalar": 256,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_local_base_freq": 10000.0,
53
+ "rope_scaling": null,
54
+ "rope_theta": 1000000.0,
55
+ "sliding_window": 257,
56
+ "transformers_version": "4.57.0.dev0",
57
+ "use_bidirectional_attention": true,
58
+ "use_cache": true,
59
+ "vocab_size": 262144
60
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.1",
5
+ "transformers": "4.57.0.dev0",
6
+ "pytorch": "2.6.0+cu124"
7
+ },
8
+ "prompts": {
9
+ "query": "task: search result | query: ",
10
+ "document": "title: none | text: ",
11
+ "BitextMining": "task: search result | query: ",
12
+ "Clustering": "task: clustering | query: ",
13
+ "Classification": "task: classification | query: ",
14
+ "InstructionRetrieval": "task: code retrieval | query: ",
15
+ "MultilabelClassification": "task: classification | query: ",
16
+ "PairClassification": "task: sentence similarity | query: ",
17
+ "Reranking": "task: search result | query: ",
18
+ "Retrieval": "task: search result | query: ",
19
+ "Retrieval-query": "task: search result | query: ",
20
+ "Retrieval-document": "title: none | text: ",
21
+ "STS": "task: sentence similarity | query: ",
22
+ "Summarization": "task: summarization | query: "
23
+ },
24
+ "default_prompt_name": null,
25
+ "similarity_fn_name": "cosine"
26
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc0c6ac64d9eaaedd9a802a14e67ef2d54599dd639d456b8b21ea9bfcc5c4814
3
+ size 1211486072
modules.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_Dense",
24
+ "type": "sentence_transformers.models.Dense"
25
+ },
26
+ {
27
+ "idx": 4,
28
+ "name": "4",
29
+ "path": "4_Normalize",
30
+ "type": "sentence_transformers.models.Normalize"
31
+ }
32
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 2048,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:216e2a79606fe879c9f17c529c71cd241338407fd5646b595ffd3c4b9ea1d503
3
+ size 33385262
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff