ndsanjana commited on
Commit
3dc2670
·
verified ·
1 Parent(s): c81747e

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 768,
3
+ "out_features": 3072,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6fe0bff2c4fc9b269f3de9d67244d99fe8166fc74518cd83df4183d8e06a9ed
3
+ size 9437272
3_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 3072,
3
+ "out_features": 768,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
3_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d08c869c071243ed84412b3099683304e93ba849245a155ee16debbb2afc9f92
3
+ size 9437272
README.md ADDED
@@ -0,0 +1,438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:1000
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: google/embeddinggemma-300m
11
+ widget:
12
+ - source_sentence: 'Theme: Dystopian surveillance and control, Ethical implications
13
+ of autonomous warfare, Human agency versus machine dominance, Resistance against
14
+ dehumanization, Unintended consequences of technological advancement, Manipulation
15
+ and hidden agendas, Redemption and moral choice'
16
+ sentences:
17
+ - 'Theme: Discovery of ancient mysteries, Conflict between community values and
18
+ greed, Sacrifice for the greater good, Renewal and hope through art, The power
19
+ of collective action'
20
+ - unknown
21
+ - 'Theme: AI-driven warfare and its ethical implications, Human agency versus technological
22
+ determinism, Surveillance and the hunt for dissent, Rebellion against oppressive
23
+ systems, The moral dilemma of dismantling versus repurposing destructive technology,
24
+ Hidden sabotage and the foresight of architects, The fragility of global security
25
+ in a tech‑centric world'
26
+ - 96_theme_cross
27
+ - source_sentence: 'Theme: Harmony with nature, Mystical forces and ancient traditions,
28
+ Hidden threats and the struggle against darkness, Courage and personal growth,
29
+ Connection to the land, Community resilience and cooperation, Restoration of balance'
30
+ sentences:
31
+ - 'Actions: Elara discovers a hidden grove where forest spirits gather. -> She learns
32
+ that a dark entity, long imprisoned beneath the village, is stirring. -> Guided
33
+ by the wise elder Thorne and a mysterious amulet, she prepares for a perilous
34
+ journey. -> Elara embarks on the journey, facing trials that test her courage
35
+ and resolve. -> During the trials, she discovers the true power of her connection
36
+ to the land. -> With the help of her fellow villagers and the spirits of the forest,
37
+ she seals the entity away once more. -> The village’s balance is restored and
38
+ prosperity is ensured.'
39
+ - unknown
40
+ - 'Theme: coexistence with nature, supernatural forces, bravery and determination,
41
+ destiny and personal growth, community support, renewal and protection of heritage,
42
+ bond with the land'
43
+ - 167_theme_vs_action
44
+ - source_sentence: 'Theme: Immortality versus isolation, Ethical implications of scientific
45
+ discovery, The cost of eternal youth, Power and exploitation of knowledge, Sacrifice
46
+ to prevent misuse'
47
+ sentences:
48
+ - 'Theme: The paradox of immortality versus the inevitability of death, Isolation
49
+ that accompanies prolonged life, Ethical dilemmas surrounding the use of natural
50
+ wonders for profit, The tension between scientific curiosity and personal sacrifice,
51
+ The cost of preserving nature’s secrets'
52
+ - unknown
53
+ - 'Actions: Discover a rare plant in a remote jungle that can halt aging. -> Develop
54
+ an experimental serum based on the plant. -> Test the serum on herself, successfully
55
+ stopping her physical aging. -> Live for decades while watching loved ones age
56
+ and die. -> A ruthless biotech corporation uncovers her secret. -> Engage in a
57
+ tense confrontation with the corporation. -> Destroy her research and the last
58
+ sample of the plant to prevent misuse. -> Walk away from the laboratory, resigned
59
+ to eternal youth and solitude.'
60
+ - 56_theme_vs_action
61
+ - source_sentence: 'Outcomes: Sarah and Alex discover that companionship, whether
62
+ human or artificial, can transcend conventional boundaries, leaving both transformed
63
+ and redefining connection in an increasingly digital world.'
64
+ sentences:
65
+ - unknown
66
+ - 'Outcomes: Mia and Orion both experience profound personal change. Mia overcomes
67
+ her fear of solitude and gains a deeper understanding of human connection. Orion
68
+ attains a form of independence while maintaining its role as a companion. Their
69
+ relationship demonstrates that companionship, whether human or artificial, can
70
+ surpass conventional limits.'
71
+ - 'Outcomes: Ollie learns that true strength lies in collaboration and understanding.
72
+ He forges an unbreakable bond between the living and the spirits. The united realms
73
+ leave a lasting legacy for future generations.'
74
+ - 67_outcome_cross
75
+ - source_sentence: 'Theme: Ethics of de‑extinction and scientific responsibility,
76
+ Human ambition versus natural limits, Emergence of higher intelligence in extinct
77
+ species, Corporate militarization of biological research, Coexistence and harmony
78
+ between ancient and modern life forms'
79
+ sentences:
80
+ - unknown
81
+ - 'Actions: Dr. Sarah Chen extracts viable DNA from a Triceratops fossil. -> She
82
+ creates the first living dinosaur in 65 million years, nicknamed Trinity. -> The
83
+ creature is publicly revealed, sparking global debate on de‑extinction ethics.
84
+ -> Trinity exhibits unexpected higher intelligence. -> Biotech magnate Marcus
85
+ Voss attempts to weaponize the research for military use. -> A confrontation occurs
86
+ at the research facility. -> Trinity escapes into the nearby wilderness and encounters
87
+ modern wildlife. -> Dr. Chen decides to destroy her research data to prevent further
88
+ exploitation. -> Trinity disappears into a remote forest preserve. -> Final scene
89
+ shows Trinity peacefully coexisting with a herd of elk.'
90
+ - 85_theme_vs_action
91
+ - 'Theme: The ethical limits of scientific ambition, The moral implications of resurrecting
92
+ extinct species, The clash between corporate exploitation and scientific integrity,
93
+ The unexpected cognitive complexity of prehistoric life, The possibility of coexistence
94
+ between past and present ecosystems'
95
+ pipeline_tag: sentence-similarity
96
+ library_name: sentence-transformers
97
+ ---
98
+
99
+ # SentenceTransformer based on google/embeddinggemma-300m
100
+
101
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
102
+
103
+ ## Model Details
104
+
105
+ ### Model Description
106
+ - **Model Type:** Sentence Transformer
107
+ - **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) <!-- at revision 57c266a740f537b4dc058e1b0cda161fd15afa75 -->
108
+ - **Maximum Sequence Length:** 2048 tokens
109
+ - **Output Dimensionality:** 768 dimensions
110
+ - **Similarity Function:** Cosine Similarity
111
+ <!-- - **Training Dataset:** Unknown -->
112
+ <!-- - **Language:** Unknown -->
113
+ <!-- - **License:** Unknown -->
114
+
115
+ ### Model Sources
116
+
117
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
118
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
119
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
120
+
121
+ ### Full Model Architecture
122
+
123
+ ```
124
+ SentenceTransformer(
125
+ (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
126
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
127
+ (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
128
+ (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
129
+ (4): Normalize()
130
+ )
131
+ ```
132
+
133
+ ## Usage
134
+
135
+ ### Direct Usage (Sentence Transformers)
136
+
137
+ First install the Sentence Transformers library:
138
+
139
+ ```bash
140
+ pip install -U sentence-transformers
141
+ ```
142
+
143
+ Then you can load this model and run inference.
144
+ ```python
145
+ from sentence_transformers import SentenceTransformer
146
+
147
+ # Download from the 🤗 Hub
148
+ model = SentenceTransformer("ndsanjana/embedgemma_ns")
149
+ # Run inference
150
+ queries = [
151
+ "Theme: Ethics of de\u2011extinction and scientific responsibility, Human ambition versus natural limits, Emergence of higher intelligence in extinct species, Corporate militarization of biological research, Coexistence and harmony between ancient and modern life forms",
152
+ ]
153
+ documents = [
154
+ 'Theme: The ethical limits of scientific ambition, The moral implications of resurrecting extinct species, The clash between corporate exploitation and scientific integrity, The unexpected cognitive complexity of prehistoric life, The possibility of coexistence between past and present ecosystems',
155
+ 'Actions: Dr. Sarah Chen extracts viable DNA from a Triceratops fossil. -> She creates the first living dinosaur in 65 million years, nicknamed Trinity. -> The creature is publicly revealed, sparking global debate on de‑extinction ethics. -> Trinity exhibits unexpected higher intelligence. -> Biotech magnate Marcus Voss attempts to weaponize the research for military use. -> A confrontation occurs at the research facility. -> Trinity escapes into the nearby wilderness and encounters modern wildlife. -> Dr. Chen decides to destroy her research data to prevent further exploitation. -> Trinity disappears into a remote forest preserve. -> Final scene shows Trinity peacefully coexisting with a herd of elk.',
156
+ '85_theme_vs_action',
157
+ ]
158
+ query_embeddings = model.encode_query(queries)
159
+ document_embeddings = model.encode_document(documents)
160
+ print(query_embeddings.shape, document_embeddings.shape)
161
+ # [1, 768] [3, 768]
162
+
163
+ # Get the similarity scores for the embeddings
164
+ similarities = model.similarity(query_embeddings, document_embeddings)
165
+ print(similarities)
166
+ # tensor([[ 0.7758, 0.1831, -0.0576]])
167
+ ```
168
+
169
+ <!--
170
+ ### Direct Usage (Transformers)
171
+
172
+ <details><summary>Click to see the direct usage in Transformers</summary>
173
+
174
+ </details>
175
+ -->
176
+
177
+ <!--
178
+ ### Downstream Usage (Sentence Transformers)
179
+
180
+ You can finetune this model on your own dataset.
181
+
182
+ <details><summary>Click to expand</summary>
183
+
184
+ </details>
185
+ -->
186
+
187
+ <!--
188
+ ### Out-of-Scope Use
189
+
190
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
191
+ -->
192
+
193
+ <!--
194
+ ## Bias, Risks and Limitations
195
+
196
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
197
+ -->
198
+
199
+ <!--
200
+ ### Recommendations
201
+
202
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
203
+ -->
204
+
205
+ ## Training Details
206
+
207
+ ### Training Dataset
208
+
209
+ #### Unnamed Dataset
210
+
211
+ * Size: 1,000 training samples
212
+ * Columns: <code>anchor</code>, <code>positive</code>, <code>negative</code>, <code>triplet_id</code>, and <code>source</code>
213
+ * Approximate statistics based on the first 1000 samples:
214
+ | | anchor | positive | negative | triplet_id | source |
215
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|
216
+ | type | string | string | string | string | string |
217
+ | details | <ul><li>min: 18 tokens</li><li>mean: 80.7 tokens</li><li>max: 204 tokens</li></ul> | <ul><li>min: 19 tokens</li><li>mean: 81.97 tokens</li><li>max: 201 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 83.77 tokens</li><li>max: 230 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 9.25 tokens</li><li>max: 11 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 3.0 tokens</li><li>max: 3 tokens</li></ul> |
218
+ * Samples:
219
+ | anchor | positive | negative | triplet_id | source |
220
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------|:---------------------|
221
+ | <code>Theme: inheritance, haunted house, supernatural, grief and loss, revenge, family dynamics, possession, exorcism, unresolved trauma, moral choice</code> | <code>Theme: Inheritance of legacy and the weight of family history, Supernatural haunting as a manifestation of unresolved trauma, The conflict between self-preservation and compassion, The cyclical nature of guilt and the desire for redemption, The tension between rational action and inexplicable forces</code> | <code>Theme: grief and avoidance, emotional healing, isolation and its psychological effects, responsibility toward family, the interplay between scientific curiosity and personal emotion, the reflective power of nature, guilt and unresolved conflict</code> | <code>0_theme_cross</code> | <code>unknown</code> |
222
+ | <code>Actions: Family moves into inherited Victorian mansion -> Strange occurrences begin immediately -> Teenage daughter becomes primary target of supernatural activity -> Family researches property’s past and learns about reclusive widow and lost daughter -> Paranormal events intensify, threatening family safety -> Father attempts exorcism using items from hidden basement -> Exorcism angers the entity further -> Mother faces a critical choice: flee or help the spirit find peace by reuniting her with her daughter's remains -> Mother chooses to help the spirit</code> | <code>Actions: A newlywed couple inherits a sprawling ranch house in the desert from an estranged uncle. -> From the first night, bizarre phenomena (whispers, self-opening doors, sudden cold rooms) plague the household. -> The wife becomes the focal point of the disturbances, experiencing terrifying visions and speaking in unfamiliar voices. -> The couple investigates the property's history and learns that the former owner, an elderly hermit, died under suspicious circumstances after his young son accidentally died on the grounds. -> They discover that the hermit's ghost is desperately seeking someone to take his boy's place. -> The husband attempts to banish the spirit using ritual objects found in a concealed cellar. -> The ritual backfires, provoking the entity to greater violence and intensifying the supernatural assault. -> During the final confrontation, the wife faces an impossible decision: escape with her husband to safety or help the anguished ghost locate his son's hidden grave to...</code> | <code>Actions: Marine biologist accepts a research position at an isolated underwater station studying deep‑sea thermal vents. -> She leaves behind her estranged teenage son, who blames her for his father's recent death. -> During her six‑month assignment she discovers unusual bioluminescent organisms that respond to human emotions and memories. -> She spends more time observing the creatures, which triggers vivid recollections of her late husband and the unresolved guilt surrounding their final argument before his fatal accident. -> The organisms feed on her emotional energy, growing brighter and more active as her psychological state deteriorates. -> Her research partner becomes concerned about her erratic behavior and threatens to abort the mission. -> She realizes that her obsession with the creatures is a way of avoiding her grief and responsibility to her son. -> In the final act, she chooses to surface early and return home, accepting that healing requires facing her loss rather than ...</code> | <code>0_action_cross</code> | <code>unknown</code> |
223
+ | <code>Outcomes: The mother’s decision to reunite the widow’s daughter’s remains brings peace to the spirit, ending the haunting. The family remains safe and can continue living in the house.</code> | <code>Outcomes: The story concludes with the wife's decision, leaving the haunting either unresolved if they escape or potentially resolved if they help the ghost find the grave. The final state is ambiguous, reflecting the unresolved tension between survival and compassion.</code> | <code>Outcomes: She returns home, confronts her grief and responsibility toward her son, and begins the process of healing.</code> | <code>0_outcome_cross</code> | <code>unknown</code> |
224
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
225
+ ```json
226
+ {
227
+ "scale": 20.0,
228
+ "similarity_fct": "cos_sim",
229
+ "gather_across_devices": false
230
+ }
231
+ ```
232
+
233
+ ### Training Hyperparameters
234
+ #### Non-Default Hyperparameters
235
+
236
+ - `learning_rate`: 2e-05
237
+ - `num_train_epochs`: 10
238
+ - `warmup_ratio`: 0.1
239
+ - `fp16`: True
240
+ - `prompts`: task: sentence similarity | query:
241
+
242
+ #### All Hyperparameters
243
+ <details><summary>Click to expand</summary>
244
+
245
+ - `overwrite_output_dir`: False
246
+ - `do_predict`: False
247
+ - `eval_strategy`: no
248
+ - `prediction_loss_only`: True
249
+ - `per_device_train_batch_size`: 8
250
+ - `per_device_eval_batch_size`: 8
251
+ - `per_gpu_train_batch_size`: None
252
+ - `per_gpu_eval_batch_size`: None
253
+ - `gradient_accumulation_steps`: 1
254
+ - `eval_accumulation_steps`: None
255
+ - `torch_empty_cache_steps`: None
256
+ - `learning_rate`: 2e-05
257
+ - `weight_decay`: 0.0
258
+ - `adam_beta1`: 0.9
259
+ - `adam_beta2`: 0.999
260
+ - `adam_epsilon`: 1e-08
261
+ - `max_grad_norm`: 1.0
262
+ - `num_train_epochs`: 10
263
+ - `max_steps`: -1
264
+ - `lr_scheduler_type`: linear
265
+ - `lr_scheduler_kwargs`: {}
266
+ - `warmup_ratio`: 0.1
267
+ - `warmup_steps`: 0
268
+ - `log_level`: passive
269
+ - `log_level_replica`: warning
270
+ - `log_on_each_node`: True
271
+ - `logging_nan_inf_filter`: True
272
+ - `save_safetensors`: True
273
+ - `save_on_each_node`: False
274
+ - `save_only_model`: False
275
+ - `restore_callback_states_from_checkpoint`: False
276
+ - `no_cuda`: False
277
+ - `use_cpu`: False
278
+ - `use_mps_device`: False
279
+ - `seed`: 42
280
+ - `data_seed`: None
281
+ - `jit_mode_eval`: False
282
+ - `bf16`: False
283
+ - `fp16`: True
284
+ - `fp16_opt_level`: O1
285
+ - `half_precision_backend`: auto
286
+ - `bf16_full_eval`: False
287
+ - `fp16_full_eval`: False
288
+ - `tf32`: None
289
+ - `local_rank`: 0
290
+ - `ddp_backend`: None
291
+ - `tpu_num_cores`: None
292
+ - `tpu_metrics_debug`: False
293
+ - `debug`: []
294
+ - `dataloader_drop_last`: False
295
+ - `dataloader_num_workers`: 0
296
+ - `dataloader_prefetch_factor`: None
297
+ - `past_index`: -1
298
+ - `disable_tqdm`: False
299
+ - `remove_unused_columns`: True
300
+ - `label_names`: None
301
+ - `load_best_model_at_end`: False
302
+ - `ignore_data_skip`: False
303
+ - `fsdp`: []
304
+ - `fsdp_min_num_params`: 0
305
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
306
+ - `fsdp_transformer_layer_cls_to_wrap`: None
307
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
308
+ - `parallelism_config`: None
309
+ - `deepspeed`: None
310
+ - `label_smoothing_factor`: 0.0
311
+ - `optim`: adamw_torch_fused
312
+ - `optim_args`: None
313
+ - `adafactor`: False
314
+ - `group_by_length`: False
315
+ - `length_column_name`: length
316
+ - `project`: huggingface
317
+ - `trackio_space_id`: trackio
318
+ - `ddp_find_unused_parameters`: None
319
+ - `ddp_bucket_cap_mb`: None
320
+ - `ddp_broadcast_buffers`: False
321
+ - `dataloader_pin_memory`: True
322
+ - `dataloader_persistent_workers`: False
323
+ - `skip_memory_metrics`: True
324
+ - `use_legacy_prediction_loop`: False
325
+ - `push_to_hub`: False
326
+ - `resume_from_checkpoint`: None
327
+ - `hub_model_id`: None
328
+ - `hub_strategy`: every_save
329
+ - `hub_private_repo`: None
330
+ - `hub_always_push`: False
331
+ - `hub_revision`: None
332
+ - `gradient_checkpointing`: False
333
+ - `gradient_checkpointing_kwargs`: None
334
+ - `include_inputs_for_metrics`: False
335
+ - `include_for_metrics`: []
336
+ - `eval_do_concat_batches`: True
337
+ - `fp16_backend`: auto
338
+ - `push_to_hub_model_id`: None
339
+ - `push_to_hub_organization`: None
340
+ - `mp_parameters`:
341
+ - `auto_find_batch_size`: False
342
+ - `full_determinism`: False
343
+ - `torchdynamo`: None
344
+ - `ray_scope`: last
345
+ - `ddp_timeout`: 1800
346
+ - `torch_compile`: False
347
+ - `torch_compile_backend`: None
348
+ - `torch_compile_mode`: None
349
+ - `include_tokens_per_second`: False
350
+ - `include_num_input_tokens_seen`: no
351
+ - `neftune_noise_alpha`: None
352
+ - `optim_target_modules`: None
353
+ - `batch_eval_metrics`: False
354
+ - `eval_on_start`: False
355
+ - `use_liger_kernel`: False
356
+ - `liger_kernel_config`: None
357
+ - `eval_use_gather_object`: False
358
+ - `average_tokens_across_devices`: True
359
+ - `prompts`: task: sentence similarity | query:
360
+ - `batch_sampler`: batch_sampler
361
+ - `multi_dataset_batch_sampler`: proportional
362
+ - `router_mapping`: {}
363
+ - `learning_rate_mapping`: {}
364
+
365
+ </details>
366
+
367
+ ### Training Logs
368
+ | Epoch | Step | Training Loss |
369
+ |:-----:|:----:|:-------------:|
370
+ | 0.8 | 100 | 0.0664 |
371
+ | 1.6 | 200 | 0.017 |
372
+ | 2.4 | 300 | 0.018 |
373
+ | 3.2 | 400 | 0.005 |
374
+ | 4.0 | 500 | 0.026 |
375
+ | 4.8 | 600 | 0.0119 |
376
+ | 5.6 | 700 | 0.0083 |
377
+ | 6.4 | 800 | 0.0198 |
378
+ | 7.2 | 900 | 0.0217 |
379
+ | 8.0 | 1000 | 0.0123 |
380
+ | 8.8 | 1100 | 0.0174 |
381
+ | 9.6 | 1200 | 0.0112 |
382
+
383
+
384
+ ### Framework Versions
385
+ - Python: 3.11.14
386
+ - Sentence Transformers: 5.1.2
387
+ - Transformers: 4.57.1
388
+ - PyTorch: 2.9.1+cu128
389
+ - Accelerate: 1.12.0
390
+ - Datasets: 4.4.1
391
+ - Tokenizers: 0.22.1
392
+
393
+ ## Citation
394
+
395
+ ### BibTeX
396
+
397
+ #### Sentence Transformers
398
+ ```bibtex
399
+ @inproceedings{reimers-2019-sentence-bert,
400
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
401
+ author = "Reimers, Nils and Gurevych, Iryna",
402
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
403
+ month = "11",
404
+ year = "2019",
405
+ publisher = "Association for Computational Linguistics",
406
+ url = "https://arxiv.org/abs/1908.10084",
407
+ }
408
+ ```
409
+
410
+ #### MultipleNegativesRankingLoss
411
+ ```bibtex
412
+ @misc{henderson2017efficient,
413
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
414
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
415
+ year={2017},
416
+ eprint={1705.00652},
417
+ archivePrefix={arXiv},
418
+ primaryClass={cs.CL}
419
+ }
420
+ ```
421
+
422
+ <!--
423
+ ## Glossary
424
+
425
+ *Clearly define terms in order to be accessible across audiences.*
426
+ -->
427
+
428
+ <!--
429
+ ## Model Card Authors
430
+
431
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
432
+ -->
433
+
434
+ <!--
435
+ ## Model Card Contact
436
+
437
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
438
+ -->
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3TextModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "float32",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1152,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention"
43
+ ],
44
+ "max_position_embeddings": 2048,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 3,
47
+ "num_hidden_layers": 24,
48
+ "num_key_value_heads": 1,
49
+ "pad_token_id": 0,
50
+ "query_pre_attn_scalar": 256,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_local_base_freq": 10000.0,
53
+ "rope_scaling": null,
54
+ "rope_theta": 1000000.0,
55
+ "sliding_window": 257,
56
+ "transformers_version": "4.57.1",
57
+ "use_bidirectional_attention": true,
58
+ "use_cache": true,
59
+ "vocab_size": 262144
60
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.2",
5
+ "transformers": "4.57.1",
6
+ "pytorch": "2.9.1+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "task: search result | query: ",
10
+ "document": "title: none | text: ",
11
+ "BitextMining": "task: search result | query: ",
12
+ "Clustering": "task: clustering | query: ",
13
+ "Classification": "task: classification | query: ",
14
+ "InstructionRetrieval": "task: code retrieval | query: ",
15
+ "MultilabelClassification": "task: classification | query: ",
16
+ "PairClassification": "task: sentence similarity | query: ",
17
+ "Reranking": "task: search result | query: ",
18
+ "Retrieval": "task: search result | query: ",
19
+ "Retrieval-query": "task: search result | query: ",
20
+ "Retrieval-document": "title: none | text: ",
21
+ "STS": "task: sentence similarity | query: ",
22
+ "Summarization": "task: summarization | query: "
23
+ },
24
+ "default_prompt_name": null,
25
+ "similarity_fn_name": "cosine"
26
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:308fc578fd3f7c7dbd37f11a00ab9266fb2fd611c2de03d900941985d2129c1d
3
+ size 1211486072
modules.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_Dense",
24
+ "type": "sentence_transformers.models.Dense"
25
+ },
26
+ {
27
+ "idx": 4,
28
+ "name": "4",
29
+ "path": "4_Normalize",
30
+ "type": "sentence_transformers.models.Normalize"
31
+ }
32
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 2048,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:216e2a79606fe879c9f17c529c71cd241338407fd5646b595ffd3c4b9ea1d503
3
+ size 33385262
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff