blachang28 commited on
Commit
99fe6cb
·
verified ·
1 Parent(s): 65a1f90

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 768,
3
+ "out_features": 3072,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91b0c2e8b2bd4fc6fb110f6947d967d2354bb5ed838b23fdda0049478891f28c
3
+ size 9437272
3_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 3072,
3
+ "out_features": 768,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
3_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d448b240b1bfad709856dde32df95c21522474f56f1514e7b7e40458b6f73280
3
+ size 9437272
README.md ADDED
@@ -0,0 +1,433 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:2680
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: google/embeddinggemma-300m
11
+ widget:
12
+ - source_sentence: Let $A, M,$ and $C$ be nonnegative integers such that $A + M +
13
+ C = 12$. What is the maximum value of $A \cdot M \cdot C + A \cdot M + M \cdot
14
+ C + A \cdot C$?
15
+ sentences:
16
+ - Given that $2^{2004}$ is a $604$-digit number whose first digit is $1$, how many
17
+ elements of the set $S = \{2^0,2^1,2^2,\ldots ,2^{2003}\}$ have a first digit
18
+ of $4$?
19
+ - To complete the grid below, each of the digits 1 through 4 must occur once in
20
+ each row and once in each column. What number will occupy the lower right-hand
21
+ square? \[\begin{tabular}{|c|c|c|c|}\hline 1 & & 2 &\\ \hline 2 & 3 & &\\ \hline
22
+ & &&4\\ \hline & &&\\ \hline\end{tabular}\]
23
+ - Two non-zero real numbers, $a$ and $b,$ satisfy $ab = a - b$. Which of the following
24
+ is a possible value of $\frac {a}{b} + \frac {b}{a} - ab$?
25
+ - source_sentence: What is the sum of the prime factors of $2010$?
26
+ sentences:
27
+ - The lengths of the sides of a triangle in inches are three consecutive integers.
28
+ The length of the shortest side is $30\%$ of the perimeter. What is the length
29
+ of the longest side?
30
+ - On a map, a $12$-centimeter length represents $72$ kilometers. How many kilometers
31
+ does a $17$-centimeter length represent?
32
+ - The five pieces shown below can be arranged to form four of the five figures shown
33
+ in the choices. Which figure cannot be formed? [asy] defaultpen(linewidth(0.6));
34
+ size(80); real r=0.5, s=1.5; path p=origin--(1,0)--(1,1)--(0,1)--cycle; draw(p);
35
+ draw(shift(s,r)*p); draw(shift(s,-r)*p); draw(shift(2s,2r)*p); draw(shift(2s,0)*p);
36
+ draw(shift(2s,-2r)*p); draw(shift(3s,3r)*p); draw(shift(3s,-3r)*p); draw(shift(3s,r)*p);
37
+ draw(shift(3s,-r)*p); draw(shift(4s,-4r)*p); draw(shift(4s,-2r)*p); draw(shift(4s,0)*p);
38
+ draw(shift(4s,2r)*p); draw(shift(4s,4r)*p); [/asy] [asy] size(350); defaultpen(linewidth(0.6));
39
+ path p=origin--(1,0)--(1,1)--(0,1)--cycle; pair[] a={(0,0), (0,1), (0,2), (0,3),
40
+ (0,4), (1,0), (1,1), (1,2), (2,0), (2,1), (3,0), (3,1), (3,2), (3,3), (3,4)};
41
+ pair[] b={(5,3), (5,4), (6,2), (6,3), (6,4), (7,1), (7,2), (7,3), (7,4), (8,0),
42
+ (8,1), (8,2), (9,0), (9,1), (9,2)}; pair[] c={(11,0), (11,1), (11,2), (11,3),
43
+ (11,4), (12,1), (12,2), (12,3), (12,4), (13,2), (13,3), (13,4), (14,3), (14,4),
44
+ (15,4)}; pair[] d={(17,0), (17,1), (17,2), (17,3), (17,4), (18,0), (18,1), (18,2),
45
+ (18,3), (18,4), (19,0), (19,1), (19,2), (19,3), (19,4)}; pair[] e={(21,4), (22,1),
46
+ (22,2), (22,3), (22,4), (23,0), (23,1), (23,2), (23,3), (23,4), (24,1), (24,2),
47
+ (24,3), (24,4), (25,4)}; int i; for(int i=0; i<15; i=i+1) { draw(shift(a[i])*p);
48
+ draw(shift(b[i])*p); draw(shift(c[i])*p); draw(shift(d[i])*p); draw(shift(e[i])*p);
49
+ } [/asy] \[
50
+ - source_sentence: A circle and two distinct lines are drawn on a sheet of paper.
51
+ What is the largest possible number of points of intersection of these figures?
52
+ sentences:
53
+ - Three fair six-sided dice are rolled. What is the probability that the values
54
+ shown on two of the dice sum to the value shown on the remaining die?
55
+ - In the small country of Mathland, all automobile license plates have four symbols.
56
+ The first must be a vowel (A, E, I, O, or U), the second and third must be two
57
+ different letters among the 21 non-vowels, and the fourth must be a digit (0 through
58
+ 9). If the symbols are chosen at random subject to these conditions, what is the
59
+ probability that the plate will read "AMC8"?
60
+ - How many different combinations of \$5 bills and \$2 bills can be used to make
61
+ a total of \$17? Order does not matter in this problem.
62
+ - source_sentence: Points $K, L, M,$ and $N$ lie in the plane of the square $ABCD$
63
+ such that $AKB$, $BLC$, $CMD$, and $DNA$ are equilateral triangles. If $ABCD$
64
+ has an area of 16, find the area of $KLMN$. [asy] unitsize(2cm); defaultpen(fontsize(8)+linewidth(0.8));
65
+ pair A=(-0.5,0.5), B=(0.5,0.5), C=(0.5,-0.5), D=(-0.5,-0.5); pair K=(0,1.366),
66
+ L=(1.366,0), M=(0,-1.366), N=(-1.366,0); draw(A--N--K--A--B--K--L--B--C--L--M--C--D--M--N--D--A);
67
+ label("$A$",A,SE); label("$B$",B,SW); label("$C$",C,NW); label("$D$",D,NE); label("$K$",K,NNW);
68
+ label("$L$",L,E); label("$M$",M,S); label("$N$",N,W); [/asy]
69
+ sentences:
70
+ - A semicircle of diameter $1$ sits at the top of a semicircle of diameter $2$,
71
+ as shown. The shaded area inside the smaller semicircle and outside the larger
72
+ semicircle is called a lune. Determine the area of this lune. [asy] import graph;
73
+ size(150); defaultpen(fontsize(8)); pair A=(-2,0), B=(2,0); filldraw(Arc((0,sqrt(3)),1,0,180)--cycle,mediumgray);
74
+ filldraw(Arc((0,0),2,0,180)--cycle,white); draw(2*expi(2*pi/6)--2*expi(4*pi/6));
75
+ label("1",(0,sqrt(3)),(0,-1)); label("2",(0,0),(0,-1)); [/asy]
76
+ - The average age of $5$ people in a room is $30$ years. An $18$-year-old person
77
+ leaves the room. What is the average age of the four remaining people?
78
+ - Which of the following numbers is a perfect square?
79
+ - source_sentence: The harmonic mean of a set of non-zero numbers is the reciprocal
80
+ of the average of the reciprocals of the numbers. What is the harmonic mean of
81
+ 1, 2, and 4?
82
+ sentences:
83
+ - Spinners $A$ and $B$ are spun. On each spinner, the arrow is equally likely to
84
+ land on each number. What is the probability that the product of the two spinners'
85
+ numbers is even?
86
+ - Abby, Bridget, and four of their classmates will be seated in two rows of three
87
+ for a group picture, as shown. \begin{eqnarray*} \text{X}&\quad\text{X}\quad&\text{X}
88
+ \\ \text{X}&\quad\text{X}\quad&\text{X} \end{eqnarray*} If the seating positions
89
+ are assigned randomly, what is the probability that Abby and Bridget are adjacent
90
+ to each other in the same row or the same column?
91
+ - Semicircle $\Gamma$ has diameter $\overline{AB}$ of length $14$. Circle $\Omega$
92
+ lies tangent to $\overline{AB}$ at a point $P$ and intersects $\Gamma$ at points
93
+ $Q$ and $R$. If $QR=3\sqrt3$ and $\angle QPR=60^\circ$, then the area of $\triangle
94
+ PQR$ equals $\tfrac{a\sqrt{b}}{c}$, where $a$ and $c$ are relatively prime positive
95
+ integers, and $b$ is a positive integer not divisible by the square of any prime.
96
+ What is $a+b+c$?
97
+ pipeline_tag: sentence-similarity
98
+ library_name: sentence-transformers
99
+ ---
100
+
101
+ # SentenceTransformer based on google/embeddinggemma-300m
102
+
103
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
104
+
105
+ ## Model Details
106
+
107
+ ### Model Description
108
+ - **Model Type:** Sentence Transformer
109
+ - **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) <!-- at revision 57c266a740f537b4dc058e1b0cda161fd15afa75 -->
110
+ - **Maximum Sequence Length:** 2048 tokens
111
+ - **Output Dimensionality:** 768 dimensions
112
+ - **Similarity Function:** Cosine Similarity
113
+ <!-- - **Training Dataset:** Unknown -->
114
+ <!-- - **Language:** Unknown -->
115
+ <!-- - **License:** Unknown -->
116
+
117
+ ### Model Sources
118
+
119
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
120
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
121
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
122
+
123
+ ### Full Model Architecture
124
+
125
+ ```
126
+ SentenceTransformer(
127
+ (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
128
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
129
+ (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
130
+ (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
131
+ (4): Normalize()
132
+ )
133
+ ```
134
+
135
+ ## Usage
136
+
137
+ ### Direct Usage (Sentence Transformers)
138
+
139
+ First install the Sentence Transformers library:
140
+
141
+ ```bash
142
+ pip install -U sentence-transformers
143
+ ```
144
+
145
+ Then you can load this model and run inference.
146
+ ```python
147
+ from sentence_transformers import SentenceTransformer
148
+
149
+ # Download from the 🤗 Hub
150
+ model = SentenceTransformer("blachang28/my-embedding-gemma")
151
+ # Run inference
152
+ queries = [
153
+ "The harmonic mean of a set of non-zero numbers is the reciprocal of the average of the reciprocals of the numbers. What is the harmonic mean of 1, 2, and 4?",
154
+ ]
155
+ documents = [
156
+ 'Abby, Bridget, and four of their classmates will be seated in two rows of three for a group picture, as shown. \\begin{eqnarray*} \\text{X}&\\quad\\text{X}\\quad&\\text{X} \\\\ \\text{X}&\\quad\\text{X}\\quad&\\text{X} \\end{eqnarray*} If the seating positions are assigned randomly, what is the probability that Abby and Bridget are adjacent to each other in the same row or the same column?',
157
+ 'Semicircle $\\Gamma$ has diameter $\\overline{AB}$ of length $14$. Circle $\\Omega$ lies tangent to $\\overline{AB}$ at a point $P$ and intersects $\\Gamma$ at points $Q$ and $R$. If $QR=3\\sqrt3$ and $\\angle QPR=60^\\circ$, then the area of $\\triangle PQR$ equals $\\tfrac{a\\sqrt{b}}{c}$, where $a$ and $c$ are relatively prime positive integers, and $b$ is a positive integer not divisible by the square of any prime. What is $a+b+c$?',
158
+ "Spinners $A$ and $B$ are spun. On each spinner, the arrow is equally likely to land on each number. What is the probability that the product of the two spinners' numbers is even?",
159
+ ]
160
+ query_embeddings = model.encode_query(queries)
161
+ document_embeddings = model.encode_document(documents)
162
+ print(query_embeddings.shape, document_embeddings.shape)
163
+ # [1, 768] [3, 768]
164
+
165
+ # Get the similarity scores for the embeddings
166
+ similarities = model.similarity(query_embeddings, document_embeddings)
167
+ print(similarities)
168
+ # tensor([[ 0.9314, -0.3410, 0.9672]])
169
+ ```
170
+
171
+ <!--
172
+ ### Direct Usage (Transformers)
173
+
174
+ <details><summary>Click to see the direct usage in Transformers</summary>
175
+
176
+ </details>
177
+ -->
178
+
179
+ <!--
180
+ ### Downstream Usage (Sentence Transformers)
181
+
182
+ You can finetune this model on your own dataset.
183
+
184
+ <details><summary>Click to expand</summary>
185
+
186
+ </details>
187
+ -->
188
+
189
+ <!--
190
+ ### Out-of-Scope Use
191
+
192
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
193
+ -->
194
+
195
+ <!--
196
+ ## Bias, Risks and Limitations
197
+
198
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
199
+ -->
200
+
201
+ <!--
202
+ ### Recommendations
203
+
204
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
205
+ -->
206
+
207
+ ## Training Details
208
+
209
+ ### Training Dataset
210
+
211
+ #### Unnamed Dataset
212
+
213
+ * Size: 2,680 training samples
214
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
215
+ * Approximate statistics based on the first 1000 samples:
216
+ | | anchor | positive | negative |
217
+ |:--------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
218
+ | type | string | string | string |
219
+ | details | <ul><li>min: 10 tokens</li><li>mean: 82.06 tokens</li><li>max: 1260 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 80.7 tokens</li><li>max: 1260 tokens</li></ul> | <ul><li>min: 12 tokens</li><li>mean: 92.86 tokens</li><li>max: 2048 tokens</li></ul> |
220
+ * Samples:
221
+ | anchor | positive | negative |
222
+ |:-------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
223
+ | <code>$(6?3) + 4 - (2 - 1) = 5.$ To make this statement true, the question mark between the 6 and the 3 should be replaced by</code> | <code>What is the degree measure of the smaller angle formed by the hands of a clock at 10 o'clock?</code> | <code>An insect lives on the surface of a regular tetrahedron with edges of length 1. It wishes to travel on the surface of the tetrahedron from the midpoint of one edge to the midpoint of the opposite edge. What is the length of the shortest such trip? (Note: Two edges of a tetrahedron are opposite if they have no common endpoint.)</code> |
224
+ | <code>What is the degree measure of the smaller angle formed by the hands of a clock at 10 o'clock?</code> | <code>Which triplet of numbers has a sum NOT equal to 1?</code> | <code>Corners are sliced off a unit cube so that the six faces each become regular octagons. What is the total volume of the removed tetrahedra?</code> |
225
+ | <code>Which triplet of numbers has a sum NOT equal to 1?</code> | <code>What is the degree measure of the smaller angle formed by the hands of a clock at 10 o'clock?</code> | <code>How many pairs of positive integers $(a,b)$ are there such that $\text{gcd}(a,b)=1$ and $\frac{a}{b} + \frac{14b}{9a}$ is an integer? $\mathrm {(A)}\ 4\quad\mathrm {(B)}\ 6\quad\mathrm {(C)}\ 9\quad\mathrm {(D)}\ 12\quad\mathrm {(E)}\ \text{infinitely many}$</code> |
226
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
227
+ ```json
228
+ {
229
+ "scale": 20.0,
230
+ "similarity_fct": "cos_sim",
231
+ "gather_across_devices": false
232
+ }
233
+ ```
234
+
235
+ ### Training Hyperparameters
236
+ #### Non-Default Hyperparameters
237
+
238
+ - `per_device_train_batch_size`: 1
239
+ - `learning_rate`: 2e-05
240
+ - `num_train_epochs`: 5
241
+ - `warmup_ratio`: 0.1
242
+ - `prompts`: task: classification | query:
243
+
244
+ #### All Hyperparameters
245
+ <details><summary>Click to expand</summary>
246
+
247
+ - `overwrite_output_dir`: False
248
+ - `do_predict`: False
249
+ - `eval_strategy`: no
250
+ - `prediction_loss_only`: True
251
+ - `per_device_train_batch_size`: 1
252
+ - `per_device_eval_batch_size`: 8
253
+ - `per_gpu_train_batch_size`: None
254
+ - `per_gpu_eval_batch_size`: None
255
+ - `gradient_accumulation_steps`: 1
256
+ - `eval_accumulation_steps`: None
257
+ - `torch_empty_cache_steps`: None
258
+ - `learning_rate`: 2e-05
259
+ - `weight_decay`: 0.0
260
+ - `adam_beta1`: 0.9
261
+ - `adam_beta2`: 0.999
262
+ - `adam_epsilon`: 1e-08
263
+ - `max_grad_norm`: 1.0
264
+ - `num_train_epochs`: 5
265
+ - `max_steps`: -1
266
+ - `lr_scheduler_type`: linear
267
+ - `lr_scheduler_kwargs`: {}
268
+ - `warmup_ratio`: 0.1
269
+ - `warmup_steps`: 0
270
+ - `log_level`: passive
271
+ - `log_level_replica`: warning
272
+ - `log_on_each_node`: True
273
+ - `logging_nan_inf_filter`: True
274
+ - `save_safetensors`: True
275
+ - `save_on_each_node`: False
276
+ - `save_only_model`: False
277
+ - `restore_callback_states_from_checkpoint`: False
278
+ - `no_cuda`: False
279
+ - `use_cpu`: False
280
+ - `use_mps_device`: False
281
+ - `seed`: 42
282
+ - `data_seed`: None
283
+ - `jit_mode_eval`: False
284
+ - `bf16`: False
285
+ - `fp16`: False
286
+ - `fp16_opt_level`: O1
287
+ - `half_precision_backend`: auto
288
+ - `bf16_full_eval`: False
289
+ - `fp16_full_eval`: False
290
+ - `tf32`: None
291
+ - `local_rank`: 0
292
+ - `ddp_backend`: None
293
+ - `tpu_num_cores`: None
294
+ - `tpu_metrics_debug`: False
295
+ - `debug`: []
296
+ - `dataloader_drop_last`: False
297
+ - `dataloader_num_workers`: 0
298
+ - `dataloader_prefetch_factor`: None
299
+ - `past_index`: -1
300
+ - `disable_tqdm`: False
301
+ - `remove_unused_columns`: True
302
+ - `label_names`: None
303
+ - `load_best_model_at_end`: False
304
+ - `ignore_data_skip`: False
305
+ - `fsdp`: []
306
+ - `fsdp_min_num_params`: 0
307
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
308
+ - `fsdp_transformer_layer_cls_to_wrap`: None
309
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
310
+ - `parallelism_config`: None
311
+ - `deepspeed`: None
312
+ - `label_smoothing_factor`: 0.0
313
+ - `optim`: adamw_torch_fused
314
+ - `optim_args`: None
315
+ - `adafactor`: False
316
+ - `group_by_length`: False
317
+ - `length_column_name`: length
318
+ - `project`: huggingface
319
+ - `trackio_space_id`: trackio
320
+ - `ddp_find_unused_parameters`: None
321
+ - `ddp_bucket_cap_mb`: None
322
+ - `ddp_broadcast_buffers`: False
323
+ - `dataloader_pin_memory`: True
324
+ - `dataloader_persistent_workers`: False
325
+ - `skip_memory_metrics`: True
326
+ - `use_legacy_prediction_loop`: False
327
+ - `push_to_hub`: False
328
+ - `resume_from_checkpoint`: None
329
+ - `hub_model_id`: None
330
+ - `hub_strategy`: every_save
331
+ - `hub_private_repo`: None
332
+ - `hub_always_push`: False
333
+ - `hub_revision`: None
334
+ - `gradient_checkpointing`: False
335
+ - `gradient_checkpointing_kwargs`: None
336
+ - `include_inputs_for_metrics`: False
337
+ - `include_for_metrics`: []
338
+ - `eval_do_concat_batches`: True
339
+ - `fp16_backend`: auto
340
+ - `push_to_hub_model_id`: None
341
+ - `push_to_hub_organization`: None
342
+ - `mp_parameters`:
343
+ - `auto_find_batch_size`: False
344
+ - `full_determinism`: False
345
+ - `torchdynamo`: None
346
+ - `ray_scope`: last
347
+ - `ddp_timeout`: 1800
348
+ - `torch_compile`: False
349
+ - `torch_compile_backend`: None
350
+ - `torch_compile_mode`: None
351
+ - `include_tokens_per_second`: False
352
+ - `include_num_input_tokens_seen`: no
353
+ - `neftune_noise_alpha`: None
354
+ - `optim_target_modules`: None
355
+ - `batch_eval_metrics`: False
356
+ - `eval_on_start`: False
357
+ - `use_liger_kernel`: False
358
+ - `liger_kernel_config`: None
359
+ - `eval_use_gather_object`: False
360
+ - `average_tokens_across_devices`: True
361
+ - `prompts`: task: classification | query:
362
+ - `batch_sampler`: batch_sampler
363
+ - `multi_dataset_batch_sampler`: proportional
364
+ - `router_mapping`: {}
365
+ - `learning_rate_mapping`: {}
366
+
367
+ </details>
368
+
369
+ ### Training Logs
370
+ | Epoch | Step | Training Loss |
371
+ |:-----:|:-----:|:-------------:|
372
+ | 1.0 | 2680 | 1.5631 |
373
+ | 2.0 | 5360 | 1.2027 |
374
+ | 3.0 | 8040 | 0.8526 |
375
+ | 4.0 | 10720 | 0.6227 |
376
+ | 5.0 | 13400 | 0.3352 |
377
+
378
+
379
+ ### Framework Versions
380
+ - Python: 3.12.12
381
+ - Sentence Transformers: 5.1.2
382
+ - Transformers: 4.57.2
383
+ - PyTorch: 2.9.0+cu126
384
+ - Accelerate: 1.12.0
385
+ - Datasets: 4.0.0
386
+ - Tokenizers: 0.22.1
387
+
388
+ ## Citation
389
+
390
+ ### BibTeX
391
+
392
+ #### Sentence Transformers
393
+ ```bibtex
394
+ @inproceedings{reimers-2019-sentence-bert,
395
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
396
+ author = "Reimers, Nils and Gurevych, Iryna",
397
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
398
+ month = "11",
399
+ year = "2019",
400
+ publisher = "Association for Computational Linguistics",
401
+ url = "https://arxiv.org/abs/1908.10084",
402
+ }
403
+ ```
404
+
405
+ #### MultipleNegativesRankingLoss
406
+ ```bibtex
407
+ @misc{henderson2017efficient,
408
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
409
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
410
+ year={2017},
411
+ eprint={1705.00652},
412
+ archivePrefix={arXiv},
413
+ primaryClass={cs.CL}
414
+ }
415
+ ```
416
+
417
+ <!--
418
+ ## Glossary
419
+
420
+ *Clearly define terms in order to be accessible across audiences.*
421
+ -->
422
+
423
+ <!--
424
+ ## Model Card Authors
425
+
426
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
427
+ -->
428
+
429
+ <!--
430
+ ## Model Card Contact
431
+
432
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
433
+ -->
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<image_soft_token>": 262144
3
+ }
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3TextModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "float32",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1152,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention"
43
+ ],
44
+ "max_position_embeddings": 2048,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 3,
47
+ "num_hidden_layers": 24,
48
+ "num_key_value_heads": 1,
49
+ "pad_token_id": 0,
50
+ "query_pre_attn_scalar": 256,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_local_base_freq": 10000.0,
53
+ "rope_scaling": null,
54
+ "rope_theta": 1000000.0,
55
+ "sliding_window": 257,
56
+ "transformers_version": "4.57.2",
57
+ "use_bidirectional_attention": true,
58
+ "use_cache": true,
59
+ "vocab_size": 262144
60
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.2",
5
+ "transformers": "4.57.2",
6
+ "pytorch": "2.9.0+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "task: search result | query: ",
10
+ "document": "title: none | text: ",
11
+ "BitextMining": "task: search result | query: ",
12
+ "Clustering": "task: clustering | query: ",
13
+ "Classification": "task: classification | query: ",
14
+ "InstructionRetrieval": "task: code retrieval | query: ",
15
+ "MultilabelClassification": "task: classification | query: ",
16
+ "PairClassification": "task: sentence similarity | query: ",
17
+ "Reranking": "task: search result | query: ",
18
+ "Retrieval": "task: search result | query: ",
19
+ "Retrieval-query": "task: search result | query: ",
20
+ "Retrieval-document": "title: none | text: ",
21
+ "STS": "task: sentence similarity | query: ",
22
+ "Summarization": "task: summarization | query: "
23
+ },
24
+ "default_prompt_name": null,
25
+ "similarity_fn_name": "cosine"
26
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10efe981d6e6cd3d93236bccf9f526a65bcb08b49ed31dba3ea5a950b9c5d0c6
3
+ size 1211486072
modules.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_Dense",
24
+ "type": "sentence_transformers.models.Dense"
25
+ },
26
+ {
27
+ "idx": 4,
28
+ "name": "4",
29
+ "path": "4_Normalize",
30
+ "type": "sentence_transformers.models.Normalize"
31
+ }
32
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 2048,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:216e2a79606fe879c9f17c529c71cd241338407fd5646b595ffd3c4b9ea1d503
3
+ size 33385262
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff