Stergios-Konstantinidis commited on
Commit
785e680
·
verified ·
1 Parent(s): cccb546

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,503 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:99980
8
+ - loss:ContrastiveTensionLoss
9
+ base_model: sucharush/e5_stem_finetuned
10
+ widget:
11
+ - source_sentence: 'Question: who sang on the street where you live in my fair lady,
12
+ Answer: On the Street Where You Live "On the Street Where You Live" is a song
13
+ with music by Frederick Loewe and lyrics by Alan Jay Lerner, from the 1956 Broadway
14
+ musical, My Fair Lady.[1] It is sung in the musical by the character Freddy Eynsford-Hill,
15
+ who was portrayed by John Michael King in the original production. In the 1964
16
+ film version, it was sung by Bill Shirley, dubbing for actor Jeremy Brett.'
17
+ sentences:
18
+ - 'Question: who sang on the street where you live in my fair lady, Answer: On the
19
+ Street Where You Live "On the Street Where You Live" is a song with music by Frederick
20
+ Loewe and lyrics by Alan Jay Lerner, from the 1956 Broadway musical, My Fair Lady.[1]
21
+ It is sung in the musical by the character Freddy Eynsford-Hill, who was portrayed
22
+ by John Michael King in the original production. In the 1964 film version, it
23
+ was sung by Bill Shirley, dubbing for actor Jeremy Brett.'
24
+ - 'Question: who sings the song middle finger in the air, Answer: Middle Finger
25
+ (song) "Middle Finger" is a song by American synthpop band Cobra Starship. The
26
+ track was written and produced by Norwegian production team Stargate and features
27
+ Pittsburgh rapper Mac Miller. It was released as the third promotional single
28
+ from their fourth studio album, "Night Shades" and later as the second official
29
+ single.'
30
+ - '[{''from'': ''human'', ''value'': ''How can you use the x-intercepts and vertex
31
+ of a quadratic equation to sketch its graph accurately?''}, {''from'': ''gpt'',
32
+ ''value'': ''To sketch the graph of a quadratic equation using the x-intercepts
33
+ and vertex:\n\n1. Find the x-intercepts: Set y = 0 in the equation and solve for
34
+ x. The x-intercepts are the points where the graph crosses the x-axis.\n2. Find
35
+ the vertex: Convert the equation into vertex form: y = a(x-h)^2 + k, where (h,k)
36
+ is the vertex.\n3. Plot the x-intercepts and vertex: Mark these points on the
37
+ graph.\n4. Determine the shape of the parabola: If a > 0, the parabola opens upwards.
38
+ If a < 0, it opens downwards.\n5. Sketch the parabola: Draw a smooth curve through
39
+ the x-intercepts and vertex, following the shape determined in step 4.\n\nBy following
40
+ these steps, you can accurately sketch the graph of a quadratic equation using
41
+ its x-intercepts and vertex.''}]'
42
+ - source_sentence: '[{''from'': ''human'', ''value'': ''Find the number of moles of
43
+ Chloromethane formed on combining 3 moles of Methane and 3 moles of Chlorine''},
44
+ {''from'': ''gpt'', ''value'': ''The reaction between methane (CH₄) and chlorine
45
+ (Cl₂) to form chloromethane (CH₃Cl) and hydrogen chloride (HCl) is typically a
46
+ photochemical reaction that occurs in the presence of light. The balanced chemical
47
+ equation for this reaction is:\n\nCH₄ + Cl₂ → CH₃Cl + HCl\n\nFrom the balanced
48
+ equation, we can see that 1 mole of methane reacts with 1 mole of chlorine to
49
+ produce 1 mole of chloromethane and 1 mole of hydrogen chloride.\n\nGiven that
50
+ you have 3 moles of methane and 3 moles of chlorine, the reaction would proceed
51
+ as follows:\n\n3 moles CH₄ + 3 moles Cl₂ → 3 moles CH₃Cl + 3 moles HCl\n\nTherefore,
52
+ 3 moles of chloromethane would be formed from the reaction of 3 moles of methane
53
+ with 3 moles of chlorine, assuming the reaction goes to completion and there are
54
+ no side reactions.''}]'
55
+ sentences:
56
+ - '[{''from'': ''human'', ''value'': ''Find the number of moles of Chloromethane
57
+ formed on combining 3 moles of Methane and 3 moles of Chlorine''}, {''from'':
58
+ ''gpt'', ''value'': ''The reaction between methane (CH₄) and chlorine (Cl₂) to
59
+ form chloromethane (CH₃Cl) and hydrogen chloride (HCl) is typically a photochemical
60
+ reaction that occurs in the presence of light. The balanced chemical equation
61
+ for this reaction is:\n\nCH₄ + Cl₂ → CH₃Cl + HCl\n\nFrom the balanced equation,
62
+ we can see that 1 mole of methane reacts with 1 mole of chlorine to produce 1
63
+ mole of chloromethane and 1 mole of hydrogen chloride.\n\nGiven that you have
64
+ 3 moles of methane and 3 moles of chlorine, the reaction would proceed as follows:\n\n3
65
+ moles CH₄ + 3 moles Cl₂ → 3 moles CH₃Cl + 3 moles HCl\n\nTherefore, 3 moles of
66
+ chloromethane would be formed from the reaction of 3 moles of methane with 3 moles
67
+ of chlorine, assuming the reaction goes to completion and there are no side reactions.''}]'
68
+ - 'Question: where was the remake of dirty dancing filmed, Answer: western North
69
+ Carolina'
70
+ - '[{''from'': ''human'', ''value'': "How can I sort the given characters ''S'',
71
+ ''O'', ''R'', ''T'' in alphabetical order using a Java program?"}, {''from'':
72
+ ''gpt'', ''value'': ''To sort the given characters in alphabetical order using
73
+ a Java program, you can use the following code:\n\n```java\nimport java.util.Arrays;
74
+ \n\npublic class Main {\n public static void main(String[] args)\n {\n //
75
+ Given characters\n char[] charArray = { \''S\'', \''O\'', \''R\'', \''T\''
76
+ };\n\n // Sort the array\n Arrays.sort(charArray);\n\n //
77
+ Print out the sorted characters\n System.out.print("Characters arranged
78
+ in alphabetical order: ");\n for (char c : charArray) {\n System.out.print(c
79
+ + " ");\n }\n }\n}\n```\n\nWhen you run this program, it will output:
80
+ "Characters arranged in alphabetical order: O R S T". \n\nIn this code, we first
81
+ declare an array called `charArray` which contains the given characters \''S\'',
82
+ \''O\'', \''R\'', \''T\''. Then, we use the `Arrays.sort()` function from Java\''s
83
+ `Arrays` class to sort the array in ascending order, which is the same as alphabetical
84
+ order for characters. Finally, we iterate over the sorted array and print each
85
+ character separated by a space.''}]'
86
+ - source_sentence: 'Question: what production model was the tiger i tank of world
87
+ war ii, Answer: Tiger I The Tiger I  listen (help·info) is a German heavy tank
88
+ of World War II deployed from 1942 in Africa and Europe, usually in independent
89
+ heavy tank battalions. Its final designation was Panzerkampfwagen VI Tiger Ausf.
90
+ E often shortened to Tiger. The Tiger I gave the Wehrmacht its first armoured
91
+ fighting vehicle that mounted the 8.8 cm KwK 36 gun (not to be confused with the
92
+ 8.8 cm Flak 36). 1,347 were built between August 1942 and August 1944.[9] After
93
+ August 1944, production of the Tiger I was phased out in favour of the Tiger II.'
94
+ sentences:
95
+ - 'Question: who sang theme song for star trek enterprise, Answer: Faith of the
96
+ Heart It was also recorded by English tenor Russell Watson as "Where My Heart
97
+ Will Take Me" in order to be used as a theme to the 2001 television series Star
98
+ Trek: Enterprise. This version of the single was used on four occasions as wake-up
99
+ calls onboard Space Shuttle missions, and performed by Watson at the 2002 Commonwealth
100
+ Games. Watson also recorded a special version of the song to be played for the
101
+ final wake up of the New Horizons exploration spacecraft on December 6, 2014.[1]'
102
+ - 'Question: what production model was the tiger i tank of world war ii, Answer:
103
+ Tiger I The Tiger I  listen (help·info) is a German heavy tank of World War II
104
+ deployed from 1942 in Africa and Europe, usually in independent heavy tank battalions.
105
+ Its final designation was Panzerkampfwagen VI Tiger Ausf. E often shortened to
106
+ Tiger. The Tiger I gave the Wehrmacht its first armoured fighting vehicle that
107
+ mounted the 8.8 cm KwK 36 gun (not to be confused with the 8.8 cm Flak 36). 1,347
108
+ were built between August 1942 and August 1944.[9] After August 1944, production
109
+ of the Tiger I was phased out in favour of the Tiger II.'
110
+ - 'Question: (Neural networks) Training only the first layer of a deep neural network
111
+ using the logistic loss is equivalent to training a logistic regression over a
112
+ transformed feature space., Answer: False'
113
+ - source_sentence: 'Question: who sings prop me up by the jukebox, Answer: Prop Me
114
+ Up Beside the Jukebox (If I Die) "Prop Me Up Beside the Jukebox (If I Die)" is
115
+ a song written by Kerry Kurt Phillips, Howard Perdew and Rick Blaylock, and recorded
116
+ by American country music singer Joe Diffie. It was released in July 1993 as the
117
+ second single from his CD Honky Tonk Attitude. It peaked at number 3 on the Billboard
118
+ Hot Country Singles & Tracks (now Hot Country Songs) chart.'
119
+ sentences:
120
+ - '[{''from'': ''human'', ''value'': ''Explain how the weaker molecular forces in
121
+ steam affect its physical properties compared to liquid water.''}, {''from'':
122
+ ''gpt'', ''value'': ''The weaker molecular forces in steam result in a lower density,
123
+ higher compressibility, and lower boiling point compared to liquid water. The
124
+ molecules are farther apart and less attracted to each other, making steam easier
125
+ to compress and expand, and requiring less energy to separate into a gas phase.''}]'
126
+ - 'Question: what episode does justin die in chicago pd, Answer: Hank Voight In
127
+ "Start Digging", Hank is offered a meritorious promotion to Lieutenant by Commander
128
+ Emma Crowley (Barbara Eve Harris) and the CPD brass as part of the new Chicago
129
+ Police Superintendent''s reorganization of the department that includes getting
130
+ officers like him off the streets. However, before Hank could either accept or
131
+ decline the promotion, his son Justin was assaulted and later died of his injuries,
132
+ causing Hank to go on a revenge mission using whatever means necessary to find
133
+ Justin''s murderer. He takes the murderer to the silos, he makes him dig a hole
134
+ and then kills him.'
135
+ - 'Question: who sings prop me up by the jukebox, Answer: Prop Me Up Beside the
136
+ Jukebox (If I Die) "Prop Me Up Beside the Jukebox (If I Die)" is a song written
137
+ by Kerry Kurt Phillips, Howard Perdew and Rick Blaylock, and recorded by American
138
+ country music singer Joe Diffie. It was released in July 1993 as the second single
139
+ from his CD Honky Tonk Attitude. It peaked at number 3 on the Billboard Hot Country
140
+ Singles & Tracks (now Hot Country Songs) chart.'
141
+ - source_sentence: The text presents a novel pilot-free multi-user uplink framework
142
+ for integrated sensing and communication (ISAC) in mm-wave networks. It enables
143
+ single-antenna users to transmit without dedicated pilots, allowing the base station
144
+ to decode messages and detect targets by utilizing users' spatial and velocity
145
+ diversities. The system employs a 3D super-resolution approach to jointly estimate
146
+ multi-path parameters like delays, Doppler shifts, and angles-of-arrival, solved
147
+ through semidefinite programming. A key innovation is multi-user fusion, enhancing
148
+ sensing and decoding by combining diverse user observations. This approach improves
149
+ robustness and integrates multi-user perspectives for high-resolution sensing
150
+ and communication. Numerical results demonstrate substantial enhancements in target
151
+ estimation and communication performance. The methodology transforms user transmissions
152
+ into valuable sensing opportunities, crucial for the next generation of wireless
153
+ networks focused on environmental awareness and reliable data transmission. Future
154
+ work includes addressing user mobility, real-time challenges, and optimizing fusion
155
+ techniques in noisy conditions.
156
+ sentences:
157
+ - 'Question: why did green day write when september ends, Answer: Wake Me Up When
158
+ September Ends "Wake Me Up When September Ends" is a song by American rock band
159
+ Green Day, released on June 13, 2005, as the fourth single from the group''s seventh
160
+ studio album, American Idiot (2004). The song was written by frontman Billie Joe
161
+ Armstrong regarding the death of his father.'
162
+ - 'Question: what is charlotte''s real name from henry danger, Answer: List of Henry
163
+ Danger characters Charlotte[5] (Riele Downs) is one of Henry''s best friends.
164
+ She is sarcastic, clever, and smart. She is the "sass master" of the bunch, always
165
+ there to snap everyone back to reality. She and Henry have been best friends for
166
+ a long time and therefore she is close enough to him to tell it like it is. She
167
+ is a big fan of Captain Man. In the fourth episode, Charlotte figures out Henry''s
168
+ secret and gets a job as Henry and Ray''s manager.'
169
+ - The text presents a novel pilot-free multi-user uplink framework for integrated
170
+ sensing and communication (ISAC) in mm-wave networks. It enables single-antenna
171
+ users to transmit without dedicated pilots, allowing the base station to decode
172
+ messages and detect targets by utilizing users' spatial and velocity diversities.
173
+ The system employs a 3D super-resolution approach to jointly estimate multi-path
174
+ parameters like delays, Doppler shifts, and angles-of-arrival, solved through
175
+ semidefinite programming. A key innovation is multi-user fusion, enhancing sensing
176
+ and decoding by combining diverse user observations. This approach improves robustness
177
+ and integrates multi-user perspectives for high-resolution sensing and communication.
178
+ Numerical results demonstrate substantial enhancements in target estimation and
179
+ communication performance. The methodology transforms user transmissions into
180
+ valuable sensing opportunities, crucial for the next generation of wireless networks
181
+ focused on environmental awareness and reliable data transmission. Future work
182
+ includes addressing user mobility, real-time challenges, and optimizing fusion
183
+ techniques in noisy conditions.
184
+ pipeline_tag: sentence-similarity
185
+ library_name: sentence-transformers
186
+ ---
187
+
188
+ # SentenceTransformer based on sucharush/e5_stem_finetuned
189
+
190
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sucharush/e5_stem_finetuned](https://huggingface.co/sucharush/e5_stem_finetuned). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
191
+
192
+ ## Model Details
193
+
194
+ ### Model Description
195
+ - **Model Type:** Sentence Transformer
196
+ - **Base model:** [sucharush/e5_stem_finetuned](https://huggingface.co/sucharush/e5_stem_finetuned) <!-- at revision 3013cc95b53a83082746afd556743e81f1da5dff -->
197
+ - **Maximum Sequence Length:** 512 tokens
198
+ - **Output Dimensionality:** 768 dimensions
199
+ - **Similarity Function:** Cosine Similarity
200
+ <!-- - **Training Dataset:** Unknown -->
201
+ <!-- - **Language:** Unknown -->
202
+ <!-- - **License:** Unknown -->
203
+
204
+ ### Model Sources
205
+
206
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
207
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
208
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
209
+
210
+ ### Full Model Architecture
211
+
212
+ ```
213
+ SentenceTransformer(
214
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
215
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
216
+ (2): Normalize()
217
+ )
218
+ ```
219
+
220
+ ## Usage
221
+
222
+ ### Direct Usage (Sentence Transformers)
223
+
224
+ First install the Sentence Transformers library:
225
+
226
+ ```bash
227
+ pip install -U sentence-transformers
228
+ ```
229
+
230
+ Then you can load this model and run inference.
231
+ ```python
232
+ from sentence_transformers import SentenceTransformer
233
+
234
+ # Download from the 🤗 Hub
235
+ model = SentenceTransformer("Stergios-Konstantinidis/MNLP_M3_tokenizer_tuned_pos_neg_1")
236
+ # Run inference
237
+ sentences = [
238
+ "The text presents a novel pilot-free multi-user uplink framework for integrated sensing and communication (ISAC) in mm-wave networks. It enables single-antenna users to transmit without dedicated pilots, allowing the base station to decode messages and detect targets by utilizing users' spatial and velocity diversities. The system employs a 3D super-resolution approach to jointly estimate multi-path parameters like delays, Doppler shifts, and angles-of-arrival, solved through semidefinite programming. A key innovation is multi-user fusion, enhancing sensing and decoding by combining diverse user observations. This approach improves robustness and integrates multi-user perspectives for high-resolution sensing and communication. Numerical results demonstrate substantial enhancements in target estimation and communication performance. The methodology transforms user transmissions into valuable sensing opportunities, crucial for the next generation of wireless networks focused on environmental awareness and reliable data transmission. Future work includes addressing user mobility, real-time challenges, and optimizing fusion techniques in noisy conditions.",
239
+ "The text presents a novel pilot-free multi-user uplink framework for integrated sensing and communication (ISAC) in mm-wave networks. It enables single-antenna users to transmit without dedicated pilots, allowing the base station to decode messages and detect targets by utilizing users' spatial and velocity diversities. The system employs a 3D super-resolution approach to jointly estimate multi-path parameters like delays, Doppler shifts, and angles-of-arrival, solved through semidefinite programming. A key innovation is multi-user fusion, enhancing sensing and decoding by combining diverse user observations. This approach improves robustness and integrates multi-user perspectives for high-resolution sensing and communication. Numerical results demonstrate substantial enhancements in target estimation and communication performance. The methodology transforms user transmissions into valuable sensing opportunities, crucial for the next generation of wireless networks focused on environmental awareness and reliable data transmission. Future work includes addressing user mobility, real-time challenges, and optimizing fusion techniques in noisy conditions.",
240
+ 'Question: what is charlotte\'s real name from henry danger, Answer: List of Henry Danger characters Charlotte[5] (Riele Downs) is one of Henry\'s best friends. She is sarcastic, clever, and smart. She is the "sass master" of the bunch, always there to snap everyone back to reality. She and Henry have been best friends for a long time and therefore she is close enough to him to tell it like it is. She is a big fan of Captain Man. In the fourth episode, Charlotte figures out Henry\'s secret and gets a job as Henry and Ray\'s manager.',
241
+ ]
242
+ embeddings = model.encode(sentences)
243
+ print(embeddings.shape)
244
+ # [3, 768]
245
+
246
+ # Get the similarity scores for the embeddings
247
+ similarities = model.similarity(embeddings, embeddings)
248
+ print(similarities.shape)
249
+ # [3, 3]
250
+ ```
251
+
252
+ <!--
253
+ ### Direct Usage (Transformers)
254
+
255
+ <details><summary>Click to see the direct usage in Transformers</summary>
256
+
257
+ </details>
258
+ -->
259
+
260
+ <!--
261
+ ### Downstream Usage (Sentence Transformers)
262
+
263
+ You can finetune this model on your own dataset.
264
+
265
+ <details><summary>Click to expand</summary>
266
+
267
+ </details>
268
+ -->
269
+
270
+ <!--
271
+ ### Out-of-Scope Use
272
+
273
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
274
+ -->
275
+
276
+ <!--
277
+ ## Bias, Risks and Limitations
278
+
279
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
280
+ -->
281
+
282
+ <!--
283
+ ### Recommendations
284
+
285
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
286
+ -->
287
+
288
+ ## Training Details
289
+
290
+ ### Training Dataset
291
+
292
+ #### Unnamed Dataset
293
+
294
+ * Size: 99,980 training samples
295
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
296
+ * Approximate statistics based on the first 1000 samples:
297
+ | | sentence_0 | sentence_1 | label |
298
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-----------------------------|
299
+ | type | string | string | int |
300
+ | details | <ul><li>min: 16 tokens</li><li>mean: 186.84 tokens</li><li>max: 510 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 186.84 tokens</li><li>max: 510 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
301
+ * Samples:
302
+ | sentence_0 | sentence_1 | label |
303
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
304
+ | <code>Question: who was the leader of the texas rangers, Answer: History of the Texas Ranger Division However, the Texas Rangers were not formally constituted until 1835. Austin returned to Texas after having been imprisoned in Mexico City and helped organize a council to govern the group. On October 17, at a consultation of the Provisional Government of Texas, Daniel Parker proposed a resolution to establish the Texas Rangers. He proposed creating three companies that would total some 60 men and would be known by "uniforms" consisting of a light duster (clothing) and an identification badge made from a Mexican Peso. They were instituted by Texan lawmakers on November 24. On November 28, 1835 Robert McAlpin Williamson was chosen to be the first Major of the Texas Rangers. Within two years the Rangers grew to more than 300 men.</code> | <code>Question: who was the leader of the texas rangers, Answer: History of the Texas Ranger Division However, the Texas Rangers were not formally constituted until 1835. Austin returned to Texas after having been imprisoned in Mexico City and helped organize a council to govern the group. On October 17, at a consultation of the Provisional Government of Texas, Daniel Parker proposed a resolution to establish the Texas Rangers. He proposed creating three companies that would total some 60 men and would be known by "uniforms" consisting of a light duster (clothing) and an identification badge made from a Mexican Peso. They were instituted by Texan lawmakers on November 24. On November 28, 1835 Robert McAlpin Williamson was chosen to be the first Major of the Texas Rangers. Within two years the Rangers grew to more than 300 men.</code> | <code>1</code> |
305
+ | <code>[{'from': 'human', 'value': 'Working together, Printer A and Printer B would finish a task in 40 minutes. Printer A alone would finish the task in 60 minutes. The task contains 480 pages. How many more pages per minute does Printer B print compared to Printer A?'}, {'from': 'gpt', 'value': "Let's denote the rate at which Printer A works as A pages per minute and the rate at which Printer B works as B pages per minute.\n\nWe know that Printer A alone can finish the task in 60 minutes, so:\nA * 60 = 480 pages\nA = 480 / 60\nA = 8 pages per minute\n\nWhen Printer A and Printer B work together, they can finish the task in 40 minutes, so:\n(A + B) * 40 = 480 pages\n8 + B = 480 / 40\n8 + B = 12\nB = 12 - 8\nB = 4 pages per minute\n\nNow we can find out how many more pages per minute Printer B prints compared to Printer A:\nB - A = 4 - 8\nB - A = -4\n\nThis result indicates that Printer A actually prints 4 pages per minute more than Printer B, not the other way around. If you're looking for h...</code> | <code>[{'from': 'human', 'value': 'Working together, Printer A and Printer B would finish a task in 40 minutes. Printer A alone would finish the task in 60 minutes. The task contains 480 pages. How many more pages per minute does Printer B print compared to Printer A?'}, {'from': 'gpt', 'value': "Let's denote the rate at which Printer A works as A pages per minute and the rate at which Printer B works as B pages per minute.\n\nWe know that Printer A alone can finish the task in 60 minutes, so:\nA * 60 = 480 pages\nA = 480 / 60\nA = 8 pages per minute\n\nWhen Printer A and Printer B work together, they can finish the task in 40 minutes, so:\n(A + B) * 40 = 480 pages\n8 + B = 480 / 40\n8 + B = 12\nB = 12 - 8\nB = 4 pages per minute\n\nNow we can find out how many more pages per minute Printer B prints compared to Printer A:\nB - A = 4 - 8\nB - A = -4\n\nThis result indicates that Printer A actually prints 4 pages per minute more than Printer B, not the other way around. If you're looking for h...</code> | <code>1</code> |
306
+ | <code>Question: where does the time zone change from central to eastern, Answer: Eastern Time Zone The boundary between time zones is set forth in the Code of Federal Regulations, with the boundary between the Eastern and Central Time Zones being specifically detailed at 49 CFR 71.[4]</code> | <code>Question: where does the time zone change from central to eastern, Answer: Eastern Time Zone The boundary between time zones is set forth in the Code of Federal Regulations, with the boundary between the Eastern and Central Time Zones being specifically detailed at 49 CFR 71.[4]</code> | <code>1</code> |
307
+ * Loss: [<code>ContrastiveTensionLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastivetensionloss)
308
+
309
+ ### Training Hyperparameters
310
+ #### Non-Default Hyperparameters
311
+
312
+ - `per_device_train_batch_size`: 20
313
+ - `per_device_eval_batch_size`: 20
314
+ - `num_train_epochs`: 1
315
+ - `multi_dataset_batch_sampler`: round_robin
316
+
317
+ #### All Hyperparameters
318
+ <details><summary>Click to expand</summary>
319
+
320
+ - `overwrite_output_dir`: False
321
+ - `do_predict`: False
322
+ - `eval_strategy`: no
323
+ - `prediction_loss_only`: True
324
+ - `per_device_train_batch_size`: 20
325
+ - `per_device_eval_batch_size`: 20
326
+ - `per_gpu_train_batch_size`: None
327
+ - `per_gpu_eval_batch_size`: None
328
+ - `gradient_accumulation_steps`: 1
329
+ - `eval_accumulation_steps`: None
330
+ - `torch_empty_cache_steps`: None
331
+ - `learning_rate`: 5e-05
332
+ - `weight_decay`: 0.0
333
+ - `adam_beta1`: 0.9
334
+ - `adam_beta2`: 0.999
335
+ - `adam_epsilon`: 1e-08
336
+ - `max_grad_norm`: 1
337
+ - `num_train_epochs`: 1
338
+ - `max_steps`: -1
339
+ - `lr_scheduler_type`: linear
340
+ - `lr_scheduler_kwargs`: {}
341
+ - `warmup_ratio`: 0.0
342
+ - `warmup_steps`: 0
343
+ - `log_level`: passive
344
+ - `log_level_replica`: warning
345
+ - `log_on_each_node`: True
346
+ - `logging_nan_inf_filter`: True
347
+ - `save_safetensors`: True
348
+ - `save_on_each_node`: False
349
+ - `save_only_model`: False
350
+ - `restore_callback_states_from_checkpoint`: False
351
+ - `no_cuda`: False
352
+ - `use_cpu`: False
353
+ - `use_mps_device`: False
354
+ - `seed`: 42
355
+ - `data_seed`: None
356
+ - `jit_mode_eval`: False
357
+ - `use_ipex`: False
358
+ - `bf16`: False
359
+ - `fp16`: False
360
+ - `fp16_opt_level`: O1
361
+ - `half_precision_backend`: auto
362
+ - `bf16_full_eval`: False
363
+ - `fp16_full_eval`: False
364
+ - `tf32`: None
365
+ - `local_rank`: 0
366
+ - `ddp_backend`: None
367
+ - `tpu_num_cores`: None
368
+ - `tpu_metrics_debug`: False
369
+ - `debug`: []
370
+ - `dataloader_drop_last`: False
371
+ - `dataloader_num_workers`: 0
372
+ - `dataloader_prefetch_factor`: None
373
+ - `past_index`: -1
374
+ - `disable_tqdm`: False
375
+ - `remove_unused_columns`: True
376
+ - `label_names`: None
377
+ - `load_best_model_at_end`: False
378
+ - `ignore_data_skip`: False
379
+ - `fsdp`: []
380
+ - `fsdp_min_num_params`: 0
381
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
382
+ - `fsdp_transformer_layer_cls_to_wrap`: None
383
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
384
+ - `deepspeed`: None
385
+ - `label_smoothing_factor`: 0.0
386
+ - `optim`: adamw_torch
387
+ - `optim_args`: None
388
+ - `adafactor`: False
389
+ - `group_by_length`: False
390
+ - `length_column_name`: length
391
+ - `ddp_find_unused_parameters`: None
392
+ - `ddp_bucket_cap_mb`: None
393
+ - `ddp_broadcast_buffers`: False
394
+ - `dataloader_pin_memory`: True
395
+ - `dataloader_persistent_workers`: False
396
+ - `skip_memory_metrics`: True
397
+ - `use_legacy_prediction_loop`: False
398
+ - `push_to_hub`: False
399
+ - `resume_from_checkpoint`: None
400
+ - `hub_model_id`: None
401
+ - `hub_strategy`: every_save
402
+ - `hub_private_repo`: None
403
+ - `hub_always_push`: False
404
+ - `gradient_checkpointing`: False
405
+ - `gradient_checkpointing_kwargs`: None
406
+ - `include_inputs_for_metrics`: False
407
+ - `include_for_metrics`: []
408
+ - `eval_do_concat_batches`: True
409
+ - `fp16_backend`: auto
410
+ - `push_to_hub_model_id`: None
411
+ - `push_to_hub_organization`: None
412
+ - `mp_parameters`:
413
+ - `auto_find_batch_size`: False
414
+ - `full_determinism`: False
415
+ - `torchdynamo`: None
416
+ - `ray_scope`: last
417
+ - `ddp_timeout`: 1800
418
+ - `torch_compile`: False
419
+ - `torch_compile_backend`: None
420
+ - `torch_compile_mode`: None
421
+ - `include_tokens_per_second`: False
422
+ - `include_num_input_tokens_seen`: False
423
+ - `neftune_noise_alpha`: None
424
+ - `optim_target_modules`: None
425
+ - `batch_eval_metrics`: False
426
+ - `eval_on_start`: False
427
+ - `use_liger_kernel`: False
428
+ - `eval_use_gather_object`: False
429
+ - `average_tokens_across_devices`: False
430
+ - `prompts`: None
431
+ - `batch_sampler`: batch_sampler
432
+ - `multi_dataset_batch_sampler`: round_robin
433
+
434
+ </details>
435
+
436
+ ### Training Logs
437
+ | Epoch | Step | Training Loss |
438
+ |:------:|:----:|:-------------:|
439
+ | 0.1000 | 500 | 6.2692 |
440
+ | 0.2000 | 1000 | 6.2665 |
441
+ | 0.3001 | 1500 | 6.2663 |
442
+ | 0.4001 | 2000 | 6.2662 |
443
+ | 0.5001 | 2500 | 6.266 |
444
+ | 0.6001 | 3000 | 6.2658 |
445
+ | 0.7001 | 3500 | 6.2657 |
446
+ | 0.8002 | 4000 | 6.2656 |
447
+ | 0.9002 | 4500 | 6.2656 |
448
+
449
+
450
+ ### Framework Versions
451
+ - Python: 3.12.8
452
+ - Sentence Transformers: 3.4.1
453
+ - Transformers: 4.52.4
454
+ - PyTorch: 2.6.0+cu126
455
+ - Accelerate: 1.3.0
456
+ - Datasets: 3.2.0
457
+ - Tokenizers: 0.21.0
458
+
459
+ ## Citation
460
+
461
+ ### BibTeX
462
+
463
+ #### Sentence Transformers
464
+ ```bibtex
465
+ @inproceedings{reimers-2019-sentence-bert,
466
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
467
+ author = "Reimers, Nils and Gurevych, Iryna",
468
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
469
+ month = "11",
470
+ year = "2019",
471
+ publisher = "Association for Computational Linguistics",
472
+ url = "https://arxiv.org/abs/1908.10084",
473
+ }
474
+ ```
475
+
476
+ #### ContrastiveTensionLoss
477
+ ```bibtex
478
+ @inproceedings{carlsson2021semantic,
479
+ title={Semantic Re-tuning with Contrastive Tension},
480
+ author={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{"a}{"a} Hellqvist and Magnus Sahlgren},
481
+ booktitle={International Conference on Learning Representations},
482
+ year={2021},
483
+ url={https://openreview.net/forum?id=Ov_sMNau-PF}
484
+ }
485
+ ```
486
+
487
+ <!--
488
+ ## Glossary
489
+
490
+ *Clearly define terms in order to be accessible across audiences.*
491
+ -->
492
+
493
+ <!--
494
+ ## Model Card Authors
495
+
496
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
497
+ -->
498
+
499
+ <!--
500
+ ## Model Card Contact
501
+
502
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
503
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.52.4",
5
+ "pytorch": "2.6.0+cu126"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a3e9324eb4b19eb42f9eebcb8da68ff6b6b5dab6d24a389ee4836c92ebc141d
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff