pavanmantha commited on
Commit
c2c473a
·
verified ·
1 Parent(s): bc873db

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 768,
3
+ "out_features": 3072,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9370ed9c21e3c082b44da5120dcd580c8fe52e1328af48af0a90147854a7878a
3
+ size 9437272
3_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 3072,
3
+ "out_features": 768,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
3_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:730f6f645e31c2cf8e6983169a508efe5c5c68ddb71a75be9e66d2ccb74d91e2
3
+ size 9437272
README.md ADDED
@@ -0,0 +1,423 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:33200
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: google/embeddinggemma-300m
11
+ widget:
12
+ - source_sentence: Are stroke patients' reports of home blood pressure readings reliable?
13
+ sentences:
14
+ - The first Whitehall study.
15
+ - A total of 1027 monitor and 716 booklet readings were recorded. Ninety per cent
16
+ of booklet recordings were exactly the same as the BP monitor readings. Average
17
+ booklet readings were 0.6 mmHg systolic [95% confidence interval (95% CI) -0.6
18
+ to 1.8] and 0.3 mmHg diastolic (95% CI -0.3 to 0.8) lower than those on the monitor.
19
+ - 'Protocol 1: a) office blood pressure measurement and Home1 were significantly
20
+ higher than ambulatory blood pressure monitoring, except for systolic and diastolic
21
+ office blood pressure measurement taken by the patient or a family member, systolic
22
+ blood pressure taken by a nurse, and diastolic blood pressure taken by a physician.
23
+ b) ambulatory blood pressure monitoring and HBPM1 were similar. Protocol 2: a)
24
+ HBPM2 and Home2 were similar. b) Home2 was significantly lower than Home1, except
25
+ for diastolic blood pressure taken by a nurse or the patient. There were significant
26
+ relationships between: a) diastolic blood pressure measured by the patient and
27
+ the thickness of the interventricular septum, posterior wall, and left ventricular
28
+ mass; and b) ambulatory and HBPM2 diastolic and systolic blood pressure taken
29
+ by a physician (home2) and left ventricular mass. Therefore, the data indicate
30
+ that home blood pressure measurement and ambulatory blood pressure monitoring
31
+ had good prognostic values relative to "office measurement."'
32
+ - source_sentence: Do socioeconomic differences in mortality persist after retirement?
33
+ sentences:
34
+ - to compare the mortality rates of elderly demented and nondemented subjects and
35
+ the differential association of midlife risk factors with mortality according
36
+ to dementia status.
37
+ - Death.
38
+ - To investigate polysomnographic and anthropomorphic factors predicting need of
39
+ high optimal continuous positive airway pressure (CPAP).
40
+ - source_sentence: Does a history of unintended pregnancy lessen the likelihood of
41
+ desire for sterilization reversal?
42
+ sentences:
43
+ - Evolutionary life history theory predicts that, in the absence of contraception,
44
+ any enhancement of maternal condition can increase human fertility. Energetic
45
+ trade-offs are likely to be resolved in favour of maximizing reproductive success
46
+ rather than health or longevity. Here we find support for the hypothesis that
47
+ development initiatives designed to improve maternal and child welfare may also
48
+ incur costs associated with increased family sizes if they do not include a family
49
+ planning component.
50
+ - This study used national, cross-sectional data collected by the 2006-2010 National
51
+ Survey of Family Growth. The study sample included women ages 15-44 who were surgically
52
+ sterile from a tubal sterilization at the time of interview. Multivariable logistic
53
+ regression was used to examine the relationship between a history of unintended
54
+ pregnancy and desire for sterilization reversal while controlling for potential
55
+ confounders.
56
+ - Anti-HTLV-I antibodies were positive in both the serum and the CSF in all of the
57
+ patients. Biopsied sample from spinal cord lesions showed inflammatory changes
58
+ in Patient 1. Patient 2 had a demyelinating type of sensorimotor polyneuropathy.
59
+ Two of the three patients examined showed high risk of developing HAM/TSP in virologic
60
+ and immunologic aspects.
61
+ - source_sentence: Are behavioural risk factors to be blamed for the conversion from
62
+ optimal blood pressure to hypertensive status in Black South Africans?
63
+ sentences:
64
+ - Longitudinal cohort studies in sub-Saharan Africa are urgently needed to understand
65
+ cardiovascular disease development. We, therefore, explored health behaviours
66
+ and conventional risk factors of African individuals with optimal blood pressure
67
+ (BP) (≤ 120/80 mm Hg), and their 5-year prediction for the development of hypertension.
68
+ - The primary aim was to assess long-term blood pressure in 110 patients with Type
69
+ 2 diabetes who had achieved optimal blood pressure control during attendance at
70
+ a protocol-based nurse-led hypertension intensive intervention clinic 7 years
71
+ previously. The secondary aim was to assess modifiable cardiovascular risk factor
72
+ status.
73
+ - The Prospective Urban Rural Epidemiology study in the North West Province, South
74
+ Africa, started in 2005 and included African volunteers (n = 1994; aged>30 years)
75
+ from a sample of 6000 randomly selected households in rural and urban areas.
76
+ - source_sentence: Can you deliver accurate tidal volume by manual resuscitator?
77
+ sentences:
78
+ - One of the problems with manual resuscitators is the difficulty in achieving accurate
79
+ volume delivery. The volume delivered to the patient varies by the physical characteristics
80
+ of the person and method. This study was designed to compare tidal volumes delivered
81
+ by the squeezing method, physical characteristics and education and practice levels.
82
+ - Sections from paraffin-embedded blocks of surgically resected specimens of GBC
83
+ (69 cases), XGC (65), chronic cholecystitis (18) and control gallbladder (10)
84
+ were stained with the monoclonal antibodies to p53 and PCNA, and a polyclonal
85
+ antibody to beta-catenin. p53 expression was scored as the percentage of nuclei
86
+ stained. PCNA expression was scored as the product of the percentage of nuclei
87
+ stained and the intensity of the staining (1-3). A cut-off value of 80 for this
88
+ score was taken as a positive result. Beta-catenin expression was scored as type
89
+ of expression-membranous, cytoplasmic or nuclear staining.
90
+ - Although current resuscitation guidelines are rescuer focused, the opportunity
91
+ exists to develop patient-centered resuscitation strategies that optimize the
92
+ hemodynamic response of the individual in the hopes to improve survival.
93
+ datasets:
94
+ - pavanmantha/pubmed-30k
95
+ pipeline_tag: sentence-similarity
96
+ library_name: sentence-transformers
97
+ ---
98
+
99
+ # SentenceTransformer based on google/embeddinggemma-300m
100
+
101
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) on the [pubmed-30k](https://huggingface.co/datasets/pavanmantha/pubmed-30k) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
102
+
103
+ ## Model Details
104
+
105
+ ### Model Description
106
+ - **Model Type:** Sentence Transformer
107
+ - **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) <!-- at revision 57c266a740f537b4dc058e1b0cda161fd15afa75 -->
108
+ - **Maximum Sequence Length:** 2048 tokens
109
+ - **Output Dimensionality:** 768 dimensions
110
+ - **Similarity Function:** Cosine Similarity
111
+ - **Training Dataset:**
112
+ - [pubmed-30k](https://huggingface.co/datasets/pavanmantha/pubmed-30k)
113
+ <!-- - **Language:** Unknown -->
114
+ <!-- - **License:** Unknown -->
115
+
116
+ ### Model Sources
117
+
118
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
119
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
120
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
121
+
122
+ ### Full Model Architecture
123
+
124
+ ```
125
+ SentenceTransformer(
126
+ (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
127
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
128
+ (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
129
+ (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
130
+ (4): Normalize()
131
+ )
132
+ ```
133
+
134
+ ## Usage
135
+
136
+ ### Direct Usage (Sentence Transformers)
137
+
138
+ First install the Sentence Transformers library:
139
+
140
+ ```bash
141
+ pip install -U sentence-transformers
142
+ ```
143
+
144
+ Then you can load this model and run inference.
145
+ ```python
146
+ from sentence_transformers import SentenceTransformer
147
+
148
+ # Download from the 🤗 Hub
149
+ model = SentenceTransformer("pavanmantha/embeddinggemma-pubmed")
150
+ # Run inference
151
+ queries = [
152
+ "Can you deliver accurate tidal volume by manual resuscitator?",
153
+ ]
154
+ documents = [
155
+ 'One of the problems with manual resuscitators is the difficulty in achieving accurate volume delivery. The volume delivered to the patient varies by the physical characteristics of the person and method. This study was designed to compare tidal volumes delivered by the squeezing method, physical characteristics and education and practice levels.',
156
+ 'Although current resuscitation guidelines are rescuer focused, the opportunity exists to develop patient-centered resuscitation strategies that optimize the hemodynamic response of the individual in the hopes to improve survival.',
157
+ 'Sections from paraffin-embedded blocks of surgically resected specimens of GBC (69 cases), XGC (65), chronic cholecystitis (18) and control gallbladder (10) were stained with the monoclonal antibodies to p53 and PCNA, and a polyclonal antibody to beta-catenin. p53 expression was scored as the percentage of nuclei stained. PCNA expression was scored as the product of the percentage of nuclei stained and the intensity of the staining (1-3). A cut-off value of 80 for this score was taken as a positive result. Beta-catenin expression was scored as type of expression-membranous, cytoplasmic or nuclear staining.',
158
+ ]
159
+ query_embeddings = model.encode_query(queries)
160
+ document_embeddings = model.encode_document(documents)
161
+ print(query_embeddings.shape, document_embeddings.shape)
162
+ # [1, 768] [3, 768]
163
+
164
+ # Get the similarity scores for the embeddings
165
+ similarities = model.similarity(query_embeddings, document_embeddings)
166
+ print(similarities)
167
+ # tensor([[ 0.9156, 0.2237, -0.1894]])
168
+ ```
169
+
170
+ <!--
171
+ ### Direct Usage (Transformers)
172
+
173
+ <details><summary>Click to see the direct usage in Transformers</summary>
174
+
175
+ </details>
176
+ -->
177
+
178
+ <!--
179
+ ### Downstream Usage (Sentence Transformers)
180
+
181
+ You can finetune this model on your own dataset.
182
+
183
+ <details><summary>Click to expand</summary>
184
+
185
+ </details>
186
+ -->
187
+
188
+ <!--
189
+ ### Out-of-Scope Use
190
+
191
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
192
+ -->
193
+
194
+ <!--
195
+ ## Bias, Risks and Limitations
196
+
197
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
198
+ -->
199
+
200
+ <!--
201
+ ### Recommendations
202
+
203
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
204
+ -->
205
+
206
+ ## Training Details
207
+
208
+ ### Training Dataset
209
+
210
+ #### pubmed-30k
211
+
212
+ * Dataset: [pubmed-30k](https://huggingface.co/datasets/pavanmantha/pubmed-30k) at [6a7c15c](https://huggingface.co/datasets/pavanmantha/pubmed-30k/tree/6a7c15c83164ef44a767f4da72b5e71bd920104f)
213
+ * Size: 33,200 training samples
214
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
215
+ * Approximate statistics based on the first 1000 samples:
216
+ | | anchor | positive | negative |
217
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
218
+ | type | string | string | string |
219
+ | details | <ul><li>min: 11 tokens</li><li>mean: 17.74 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 85.76 tokens</li><li>max: 301 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 82.14 tokens</li><li>max: 409 tokens</li></ul> |
220
+ * Samples:
221
+ | anchor | positive | negative |
222
+ |:----------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
223
+ | <code>Does a history of unintended pregnancy lessen the likelihood of desire for sterilization reversal?</code> | <code>Unintended pregnancy has been significantly associated with subsequent female sterilization. Whether women who are sterilized after experiencing an unintended pregnancy are less likely to express desire for sterilization reversal is unknown.</code> | <code>Changes in serum hormone levels induced by combined contraceptives.</code> |
224
+ | <code>Does a history of unintended pregnancy lessen the likelihood of desire for sterilization reversal?</code> | <code>Unintended pregnancy has been significantly associated with subsequent female sterilization. Whether women who are sterilized after experiencing an unintended pregnancy are less likely to express desire for sterilization reversal is unknown.</code> | <code>Evolutionary life history theory predicts that, in the absence of contraception, any enhancement of maternal condition can increase human fertility. Energetic trade-offs are likely to be resolved in favour of maximizing reproductive success rather than health or longevity. Here we find support for the hypothesis that development initiatives designed to improve maternal and child welfare may also incur costs associated with increased family sizes if they do not include a family planning component.</code> |
225
+ | <code>Does a history of unintended pregnancy lessen the likelihood of desire for sterilization reversal?</code> | <code>Unintended pregnancy has been significantly associated with subsequent female sterilization. Whether women who are sterilized after experiencing an unintended pregnancy are less likely to express desire for sterilization reversal is unknown.</code> | <code>Out of 663 cycles resulting in oocyte retrieval, 299 produced a clinical pregnancy (45.1%). Women who achieved a clinical pregnancy had a significantly shorter stimulation length (11.9 vs. 12.1 days, p = 0.047). Polycystic ovary syndrome (PCOS) was the only etiology of infertility that was significantly associated with a higher chance for clinical pregnancy and was a significant confounder for the association of duration and success of treatment. Women with 13 days or longer of stimulation had a 34 % lower chance of clinical pregnancy as compared to those who had a shorter cycle (OR 0.66, 95% CI:0.46-0.95) after adjustment for age, ovarian reserve, number of oocytes retrieved, embryos transferred and PCOS diagnosis.</code> |
226
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
227
+ ```json
228
+ {
229
+ "scale": 20.0,
230
+ "similarity_fct": "cos_sim",
231
+ "gather_across_devices": false
232
+ }
233
+ ```
234
+
235
+ ### Training Hyperparameters
236
+ #### Non-Default Hyperparameters
237
+
238
+ - `per_device_train_batch_size`: 16
239
+ - `learning_rate`: 2e-05
240
+ - `warmup_steps`: 0.1
241
+ - `gradient_accumulation_steps`: 4
242
+ - `fp16`: True
243
+ - `warmup_ratio`: 0.1
244
+ - `prompts`: task: sentence similarity | query:
245
+
246
+ #### All Hyperparameters
247
+ <details><summary>Click to expand</summary>
248
+
249
+ - `per_device_train_batch_size`: 16
250
+ - `num_train_epochs`: 3
251
+ - `max_steps`: -1
252
+ - `learning_rate`: 2e-05
253
+ - `lr_scheduler_type`: linear
254
+ - `lr_scheduler_kwargs`: None
255
+ - `warmup_steps`: 0.1
256
+ - `optim`: adamw_torch_fused
257
+ - `optim_args`: None
258
+ - `weight_decay`: 0.0
259
+ - `adam_beta1`: 0.9
260
+ - `adam_beta2`: 0.999
261
+ - `adam_epsilon`: 1e-08
262
+ - `optim_target_modules`: None
263
+ - `gradient_accumulation_steps`: 4
264
+ - `average_tokens_across_devices`: True
265
+ - `max_grad_norm`: 1.0
266
+ - `label_smoothing_factor`: 0.0
267
+ - `bf16`: False
268
+ - `fp16`: True
269
+ - `bf16_full_eval`: False
270
+ - `fp16_full_eval`: False
271
+ - `tf32`: None
272
+ - `gradient_checkpointing`: False
273
+ - `gradient_checkpointing_kwargs`: None
274
+ - `torch_compile`: False
275
+ - `torch_compile_backend`: None
276
+ - `torch_compile_mode`: None
277
+ - `use_liger_kernel`: False
278
+ - `liger_kernel_config`: None
279
+ - `use_cache`: False
280
+ - `neftune_noise_alpha`: None
281
+ - `torch_empty_cache_steps`: None
282
+ - `auto_find_batch_size`: False
283
+ - `log_on_each_node`: True
284
+ - `logging_nan_inf_filter`: True
285
+ - `include_num_input_tokens_seen`: no
286
+ - `log_level`: passive
287
+ - `log_level_replica`: warning
288
+ - `disable_tqdm`: False
289
+ - `project`: huggingface
290
+ - `trackio_space_id`: trackio
291
+ - `eval_strategy`: no
292
+ - `per_device_eval_batch_size`: 8
293
+ - `prediction_loss_only`: True
294
+ - `eval_on_start`: False
295
+ - `eval_do_concat_batches`: True
296
+ - `eval_use_gather_object`: False
297
+ - `eval_accumulation_steps`: None
298
+ - `include_for_metrics`: []
299
+ - `batch_eval_metrics`: False
300
+ - `save_only_model`: False
301
+ - `save_on_each_node`: False
302
+ - `enable_jit_checkpoint`: False
303
+ - `push_to_hub`: False
304
+ - `hub_private_repo`: None
305
+ - `hub_model_id`: None
306
+ - `hub_strategy`: every_save
307
+ - `hub_always_push`: False
308
+ - `hub_revision`: None
309
+ - `load_best_model_at_end`: False
310
+ - `ignore_data_skip`: False
311
+ - `restore_callback_states_from_checkpoint`: False
312
+ - `full_determinism`: False
313
+ - `seed`: 42
314
+ - `data_seed`: None
315
+ - `use_cpu`: False
316
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
317
+ - `parallelism_config`: None
318
+ - `dataloader_drop_last`: False
319
+ - `dataloader_num_workers`: 0
320
+ - `dataloader_pin_memory`: True
321
+ - `dataloader_persistent_workers`: False
322
+ - `dataloader_prefetch_factor`: None
323
+ - `remove_unused_columns`: True
324
+ - `label_names`: None
325
+ - `train_sampling_strategy`: random
326
+ - `length_column_name`: length
327
+ - `ddp_find_unused_parameters`: None
328
+ - `ddp_bucket_cap_mb`: None
329
+ - `ddp_broadcast_buffers`: False
330
+ - `ddp_backend`: None
331
+ - `ddp_timeout`: 1800
332
+ - `fsdp`: []
333
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
334
+ - `deepspeed`: None
335
+ - `debug`: []
336
+ - `skip_memory_metrics`: True
337
+ - `do_predict`: False
338
+ - `resume_from_checkpoint`: None
339
+ - `warmup_ratio`: 0.1
340
+ - `local_rank`: -1
341
+ - `prompts`: task: sentence similarity | query:
342
+ - `batch_sampler`: batch_sampler
343
+ - `multi_dataset_batch_sampler`: proportional
344
+ - `router_mapping`: {}
345
+ - `learning_rate_mapping`: {}
346
+
347
+ </details>
348
+
349
+ ### Training Logs
350
+ | Epoch | Step | Training Loss |
351
+ |:------:|:----:|:-------------:|
352
+ | 0.1928 | 100 | 0.2086 |
353
+ | 0.3855 | 200 | 0.0872 |
354
+ | 0.5783 | 300 | 0.0623 |
355
+ | 0.7711 | 400 | 0.0569 |
356
+ | 0.9639 | 500 | 0.0487 |
357
+ | 1.1561 | 600 | 0.0423 |
358
+ | 1.3489 | 700 | 0.0412 |
359
+ | 1.5417 | 800 | 0.0407 |
360
+ | 1.7345 | 900 | 0.0341 |
361
+ | 1.9272 | 1000 | 0.0384 |
362
+ | 2.1195 | 1100 | 0.0316 |
363
+ | 2.3123 | 1200 | 0.0290 |
364
+ | 2.5051 | 1300 | 0.0314 |
365
+ | 2.6978 | 1400 | 0.0303 |
366
+ | 2.8906 | 1500 | 0.0245 |
367
+
368
+
369
+ ### Framework Versions
370
+ - Python: 3.11.11
371
+ - Sentence Transformers: 5.2.3
372
+ - Transformers: 5.2.0
373
+ - PyTorch: 2.8.0.dev20250319+cu128
374
+ - Accelerate: 1.12.0
375
+ - Datasets: 4.5.0
376
+ - Tokenizers: 0.22.2
377
+
378
+ ## Citation
379
+
380
+ ### BibTeX
381
+
382
+ #### Sentence Transformers
383
+ ```bibtex
384
+ @inproceedings{reimers-2019-sentence-bert,
385
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
386
+ author = "Reimers, Nils and Gurevych, Iryna",
387
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
388
+ month = "11",
389
+ year = "2019",
390
+ publisher = "Association for Computational Linguistics",
391
+ url = "https://arxiv.org/abs/1908.10084",
392
+ }
393
+ ```
394
+
395
+ #### MultipleNegativesRankingLoss
396
+ ```bibtex
397
+ @misc{henderson2017efficient,
398
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
399
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
400
+ year={2017},
401
+ eprint={1705.00652},
402
+ archivePrefix={arXiv},
403
+ primaryClass={cs.CL}
404
+ }
405
+ ```
406
+
407
+ <!--
408
+ ## Glossary
409
+
410
+ *Clearly define terms in order to be accessible across audiences.*
411
+ -->
412
+
413
+ <!--
414
+ ## Model Card Authors
415
+
416
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
417
+ -->
418
+
419
+ <!--
420
+ ## Model Card Contact
421
+
422
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
423
+ -->
config.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3TextModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "float32",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1152,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention"
43
+ ],
44
+ "max_position_embeddings": 2048,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 3,
47
+ "num_hidden_layers": 24,
48
+ "num_key_value_heads": 1,
49
+ "pad_token_id": 0,
50
+ "query_pre_attn_scalar": 256,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_parameters": {
53
+ "full_attention": {
54
+ "rope_theta": 1000000.0,
55
+ "rope_type": "default"
56
+ },
57
+ "sliding_attention": {
58
+ "rope_theta": 10000.0,
59
+ "rope_type": "default"
60
+ }
61
+ },
62
+ "sliding_window": 257,
63
+ "tie_word_embeddings": true,
64
+ "transformers_version": "5.2.0",
65
+ "use_bidirectional_attention": true,
66
+ "use_cache": true,
67
+ "vocab_size": 262144
68
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.3",
5
+ "transformers": "5.2.0",
6
+ "pytorch": "2.8.0.dev20250319+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "task: search result | query: ",
10
+ "document": "title: none | text: ",
11
+ "BitextMining": "task: search result | query: ",
12
+ "Clustering": "task: clustering | query: ",
13
+ "Classification": "task: classification | query: ",
14
+ "InstructionRetrieval": "task: code retrieval | query: ",
15
+ "MultilabelClassification": "task: classification | query: ",
16
+ "PairClassification": "task: sentence similarity | query: ",
17
+ "Reranking": "task: search result | query: ",
18
+ "Retrieval": "task: search result | query: ",
19
+ "Retrieval-query": "task: search result | query: ",
20
+ "Retrieval-document": "title: none | text: ",
21
+ "STS": "task: sentence similarity | query: ",
22
+ "Summarization": "task: summarization | query: "
23
+ },
24
+ "default_prompt_name": null,
25
+ "similarity_fn_name": "cosine"
26
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:961a9f9f067dd4e40ee90579f9bd7543b62cf8df7ea8e6ce162ed4db852ff8dc
3
+ size 1211486072
modules.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_Dense",
24
+ "type": "sentence_transformers.models.Dense"
25
+ },
26
+ {
27
+ "idx": 4,
28
+ "name": "4",
29
+ "path": "4_Normalize",
30
+ "type": "sentence_transformers.models.Normalize"
31
+ }
32
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 2048,
3
+ "do_lower_case": false
4
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37a36b975fbb51fe36f93e6d156cc4eefbce6d4209aee46c4575cbe9a6a1542e
3
+ size 33385137
tokenizer_config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "boi_token": "<start_of_image>",
4
+ "bos_token": "<bos>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eoi_token": "<end_of_image>",
7
+ "eos_token": "<eos>",
8
+ "image_token": "<image_soft_token>",
9
+ "is_local": false,
10
+ "mask_token": "<mask>",
11
+ "model_max_length": 2048,
12
+ "model_specific_special_tokens": {
13
+ "boi_token": "<start_of_image>",
14
+ "eoi_token": "<end_of_image>",
15
+ "image_token": "<image_soft_token>"
16
+ },
17
+ "pad_token": "<pad>",
18
+ "padding_side": "right",
19
+ "sp_model_kwargs": null,
20
+ "spaces_between_special_tokens": false,
21
+ "tokenizer_class": "GemmaTokenizer",
22
+ "unk_token": "<unk>",
23
+ "use_default_system_prompt": false
24
+ }