marroyo777 commited on
Commit
cf6ec05
·
verified ·
1 Parent(s): f3e4b63

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,792 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: marroyo777/bge-99GPT-v1
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - cosine_accuracy
6
+ - dot_accuracy
7
+ - manhattan_accuracy
8
+ - euclidean_accuracy
9
+ - max_accuracy
10
+ pipeline_tag: sentence-similarity
11
+ tags:
12
+ - sentence-transformers
13
+ - sentence-similarity
14
+ - feature-extraction
15
+ - generated_from_trainer
16
+ - dataset_size:1416
17
+ - loss:MultipleNegativesRankingLoss
18
+ widget:
19
+ - source_sentence: 'Who wrote the blog Sprint 7: Iterating upon iterations?'
20
+ sentences:
21
+ - 'Title: From Insights to Impact: 99P Labs Collaborates with BDAA to Foster Data
22
+ Visualization Talent
23
+
24
+ Published: March, 2023
25
+
26
+ Author(s): Ryan Lingo
27
+
28
+ Claps: 0
29
+
30
+ Comments: 0
31
+
32
+ Word Count: 1537
33
+
34
+ URL: https://medium.com/99p-labs/from-insights-to-impact-99p-labs-collaborates-with-bdaa-to-foster-data-visualization-talent-26e22a76d1df
35
+
36
+
37
+ In the spring of 2023, 99P Labs sponsored a data visualization challenge in collaboration
38
+ with BDAA, the Big Data and Analytics Association at Ohio State University. The
39
+ challenge lasted two weeks and began with a kickoff event where the 99P Labs team
40
+ went to the weekly Tuesday BDAA meeting and laid out the motivation and starting
41
+ guardrails for the challenge. The challenge allowed 99P Labs to connect with the
42
+ next generation of data professionals and support their growth and development.
43
+ The data visualization challenge was open to all BDAA members and lasted two weeks.
44
+ The teams used a wide range of tools and software to create their visualizations
45
+ and dashboards, including Streamlit, Plotly Dash, and Tableau. The winning entries
46
+ were highlighted, and the challenge was a valuable experience for 99P Labs. The
47
+ challenge was not just an opportunity for the students to learn, but it was also
48
+ an opportunity for 99P Labs to connect with the next generation of data professionals
49
+ and help build their developer community. The collaboration with BDAA was strengthened,
50
+ and they look forward to continuing this collaboration in the future.'
51
+ - 'Title: Sprint 7: Iterating upon iterations
52
+
53
+ Published: October, 2023
54
+
55
+ Author(s): 2023 99P Labs x CMU MHCI Capstone Team
56
+
57
+ Claps: 24
58
+
59
+ Comments: 0
60
+
61
+ Word Count: 985
62
+
63
+ URL: https://medium.com/99p-labs/sprint-7-iterating-upon-iterations-34cc621a5aeb
64
+
65
+
66
+ The 99P Labs x CMU MHCI Capstone Team is part of the Master of Human-Computer
67
+ Interaction (MHCI) program at Carnegie Mellon University. The team started off
68
+ with a blank canvas and ran design sessions with 3 Gen Z participants to shape
69
+ their mobile mentor to fit their learning needs. They found that the activities
70
+ people wanted to perform in cars fell under a few main hierarchies and came up
71
+ with a set of 3 scenarios to test out the different roles that Gen Zers expect
72
+ from the mobile mentor. The team then moved from more generative to evaluative
73
+ testing and decided to focus on the tutor scenario, making use of the unique moving
74
+ environment of a vehicle. They also made use of their clients'' expertise in vehicle
75
+ HCI design to conduct testing sessions. The team is looking forward to shaping
76
+ the future of learning on-the-go in their last few iterations.'
77
+ - 'Title: 99P Labs 2022 Data I/O Recap
78
+
79
+ Published: November, 2022
80
+
81
+ Author(s): Ryan Lingo
82
+
83
+ Claps: 259
84
+
85
+ Comments: 0
86
+
87
+ Word Count: 1021
88
+
89
+ URL: https://medium.com/99p-labs/99p-labs-2022-data-i-o-recap-7c710fbe28e6
90
+
91
+
92
+ The blog post discusses the 99P Labs 2022 Data I/O Recap, which took place at
93
+ The Ohio State University. The event included 50 students participating in 12
94
+ teams, with 99P Labs sponsoring and offering a challenge for the participants.
95
+ Despite varying skill levels, the atmosphere remained friendly and inclusive.
96
+ The event allowed for more personal interaction and submissions from all teams,
97
+ resulting in impressive insights and visuals. The winning teams were determined
98
+ by a team of 99P Labs and OSU faculty. Overall, the author expresses their enjoyment
99
+ and the inspiring energy of the event. For more information, readers are encouraged
100
+ to visit the 99P Labs blog post.'
101
+ - source_sentence: What is the safety system for cars using robots mentioned in the
102
+ blog?
103
+ sentences:
104
+ - 'Title: The Future of Smart Mobility — Prof. Chris Atkinson
105
+
106
+ Published: April, 2021
107
+
108
+ Author(s): 99P Labs
109
+
110
+ Claps: 1
111
+
112
+ Comments: 0
113
+
114
+ Word Count: 52
115
+
116
+ URL: https://medium.com/99p-labs/the-future-of-smart-mobility-prof-chris-atkinson-8dfbc1fc1280
117
+
118
+
119
+ The blog article discusses the webinar on The Future of Smart Mobility by Prof.
120
+ Atkinson at The Ohio State University. 99P Labs expresses excitement about the
121
+ topic and their collaboration with partners such as OSU to work towards realizing
122
+ this future.'
123
+ - 'Title: Innovative Projects at MakeOHI/O
124
+
125
+ Published: March, 2023
126
+
127
+ Author(s): Ryan Lingo
128
+
129
+ Claps: 58
130
+
131
+ Comments: 0
132
+
133
+ Word Count: 603
134
+
135
+ URL: https://medium.com/99p-labs/innovative-projects-at-makeohi-o-6f8a4c5a3d02
136
+
137
+
138
+ The blog article discusses the Innovative Projects at MakeOHI/O, a makeathon event
139
+ sponsored by 99P Labs. The event aimed to encourage creativity and innovation
140
+ among undergraduate and graduate students. The article highlights three successful
141
+ projects from the event, including a platform for visually impaired individuals,
142
+ a safety system for cars using robots, and an automated rearview mirror and sun
143
+ visor adjustment system. The article also expresses gratitude to the event organizers
144
+ and participants and invites readers to connect with 99P Labs for collaboration.'
145
+ - 'Title: An Overview of Machine Learning — Part 2: All About Regression
146
+
147
+ Published: January, 2023
148
+
149
+ Author(s): Luka Brkljacic
150
+
151
+ Claps: 2
152
+
153
+ Comments: 0
154
+
155
+ Word Count: 4550
156
+
157
+ URL: https://medium.com/99p-labs/an-overview-of-machine-learning-part-2-all-about-regression-2f991281932e
158
+
159
+
160
+ The blog article provides an in-depth overview of regression in machine learning.
161
+ It covers linear regression, calculating R, limitations of R, multiple regression,
162
+ adjusted R, and logistic regression. The article also includes practical Python
163
+ examples for linear regression and multiple regression. The author also mentions
164
+ that the next post will cover decision trees.'
165
+ - source_sentence: What is the Intel Realsense D435i Depth Camera used for?
166
+ sentences:
167
+ - 'Title: How LLMs can Drive the Intersection Between Social and Mobility
168
+
169
+ Published: December, 2023
170
+
171
+ Author(s): Roopal Joshi, Nishant Chintalapati, Sanghmitra Wankhade, Ashima Saxena,
172
+ and Ken Pulverman
173
+
174
+ Claps: 1
175
+
176
+ Comments: 0
177
+
178
+ Word Count: 2211
179
+
180
+ URL: https://medium.com/99p-labs/how-llms-can-drive-the-intersection-between-social-and-mobility-1cca9f34e410
181
+
182
+
183
+ The article discusses the authors'' journey in tackling a challenge for 99P Labs,
184
+ exploring the relevance of LLMs for the company to engage their users in the future
185
+ of mobility. The authors detail their process of ideation, convergent and divergent
186
+ thinking, and the development of a product or service that leverages the capabilities
187
+ of ChatGPT or Gen AI to initiate or influence physical actions. The article concludes
188
+ with recommendations and insights gained from the project.'
189
+ - 'Title: Harnessing Sensors and Software
190
+
191
+ Published: August, 2023
192
+
193
+ Author(s): Edward Lui
194
+
195
+ Claps: 0
196
+
197
+ Comments: 0
198
+
199
+ Word Count: 1133
200
+
201
+ URL: https://medium.com/99p-labs/harnessing-sensors-and-software
202
+
203
+
204
+ The blog article discusses the author''s two-month internship at 99P, focusing
205
+ on sensors and their integration with the Robot Operating System (ROS). The author
206
+ worked on the SOMEthings project, exploring technologies such as the Intel Realsense
207
+ D435i Depth Camera, HC-SR04 Ultrasonic Sensor, and DW1000 UWB Module. The challenges
208
+ faced and accomplishments achieved during the internship are highlighted, providing
209
+ valuable insights and hands-on experience. The article concludes with an invitation
210
+ for collaboration and engagement with 99P Labs.'
211
+ - 'Title: Sprint 6: Designing a Mobile Mentor
212
+
213
+ Published: October, 2023
214
+
215
+ Author(s): Alana Levene
216
+
217
+ Claps: 1
218
+
219
+ Comments: 0
220
+
221
+ Word Count: 1015
222
+
223
+ URL: https://medium.com/99p-labs/sprint-6-designing-a-mobile-mentor
224
+
225
+
226
+ The 99P Labs x CMU MHCI Capstone Team has transitioned from research to design,
227
+ focusing on creating a Mobile Mentor for Gen Z to facilitate on-the-go learning.
228
+ The team has identified key insights from their research and has begun the prototyping
229
+ process using a low-fidelity cardboard model. They are actively involving participants
230
+ in the design process and are considering various influencing factors on their
231
+ product. The team plans to transition to a design sprint timeline and is excited
232
+ to continue developing this innovative product.'
233
+ - source_sentence: What use cases are provided for the Sustainable Mobility Analytics
234
+ dashboard?
235
+ sentences:
236
+ - 'Title: MakeOHI/O 2024
237
+
238
+ Published: March, 2024
239
+
240
+ Author(s): Ryan Lingo
241
+
242
+ Claps: 4
243
+
244
+ Comments: 0
245
+
246
+ Word Count: 2582
247
+
248
+ URL: https://medium.com/99p-labs/makeohi-o-2024-cb594eceb99f
249
+
250
+
251
+ The blog post discusses the author''s experience at the MakeOHI/O hackathon at
252
+ Ohio State University. The author served as a mentor and judge and shares the
253
+ challenge set for the students, the outstanding projects, and the winners. The
254
+ blog highlights the innovative solutions presented by the winning teams and the
255
+ overall success of the event. The author also encourages readers to stay engaged
256
+ with the community and explore partnership opportunities.'
257
+ - 'Title: CMU Heinz Capstone Project — Building Sustainable Mobility Analytics Tool
258
+
259
+ Published: June, 2022
260
+
261
+ Author(s): 99P Labs
262
+
263
+ Claps: 58
264
+
265
+ Comments: 0
266
+
267
+ Word Count: 2780
268
+
269
+ URL: https://medium.com/99p-labs/cmu-heinz-capstone-project-building-sustainable-mobility-analytics-tool-cbfe6e2591ee
270
+
271
+
272
+ The blog article discusses the sustainability of transportation networks and the
273
+ development of a Sustainable Mobility Analytics dashboard by a group of interdisciplinary
274
+ research scientists and engineers finishing their graduate studies at Heinz College.
275
+ The dashboard aims to help partners at 99P Labs understand the complexity of transportation
276
+ networks and evaluate their sustainability. The article details the three phases
277
+ of the project, the methodologies for calculating various metrics featured on
278
+ the dashboard, and provides use cases for the dashboard. Additionally, it discusses
279
+ the potential for future work and thanks those who supported the project.'
280
+ - 'Title: Navigating Telematics Data
281
+
282
+ Published: December, 2023
283
+
284
+ Author(s): Amber Liu, Hanna Lee, Parunjodhi Munisamy, Yaretsy Castro, and Tulip
285
+ Daaboul
286
+
287
+ Claps: 60
288
+
289
+ Comments: 0
290
+
291
+ Word Count: 1637
292
+
293
+ URL: https://medium.com/99p-labs/navigating-telematics-data-1b59e09489c7
294
+
295
+
296
+ The blog article discusses the importance of telematics data and its relevance
297
+ in the transportation landscape. It outlines the challenges faced in working with
298
+ telematics data, the tools and resources used, and the process of navigating and
299
+ visualizing the data. The article also delves into the specific analysis of telematics
300
+ data and census data in Ohio, highlighting the impact of Covid on transportation
301
+ and income levels. The authors express a desire to further explore pre-covid and
302
+ post-covid trends and extend the investigation to other states in the United States.'
303
+ - source_sentence: How does gamification enhance the learning experience in data science
304
+ according to the blog?
305
+ sentences:
306
+ - 'Title: Unlocking Potential: The Power of Gamification in Employee Data Science
307
+ Learning
308
+
309
+ Published: April, 2024
310
+
311
+ Author(s): Fern Zhang
312
+
313
+ Claps: 5
314
+
315
+ Comments: 0
316
+
317
+ Word Count: 1661
318
+
319
+ URL: https://medium.com/99p-labs/unlocking-potential-the-power-of-gamification-in-employee-data-science-learning-5f88e97c74aa
320
+
321
+
322
+ The blog article discusses the use of gamification in employee data science learning.
323
+ It highlights the challenges in data science training and the team''s initiative
324
+ to revolutionize it using gamification strategies. The team adopted a multifaceted
325
+ approach to understand the diverse backgrounds and prior knowledge of their target
326
+ learners to design effective instruction. The article also discusses the gamification
327
+ strategies for manager and practitioner training, as well as the user testing
328
+ feedback and future plans for employee training in data science. Overall, the
329
+ article emphasizes the importance of data science training and the use of gamification
330
+ to make it an engaging and impactful learning experience.'
331
+ - 'Title: CMU Capstone Project — Visualization Framework Of Telematics Data
332
+
333
+ Published: April, 2024
334
+
335
+ Author(s): Yiheng Zhang, Yixue Yin, Rui Huang
336
+
337
+ Claps: 1
338
+
339
+ Comments: 0
340
+
341
+ Word Count: 2520
342
+
343
+ URL: https://medium.com/99p-labs/cmu-capstone-project-visualization-framework-of-telematics-data-abb74fcbb975
344
+
345
+
346
+ The blog article discusses the development of an application to display telematic
347
+ trajectory data in various formats on a web browser. The project involved brainstorming,
348
+ user interviews, experimentation, and necessary pivots to define the trajectory
349
+ of the development process. The team also focused on enhancing the foundational
350
+ dashboard, building up a plugin system, fixing problems, and building new features.
351
+ The final sprint involved finalizing and enhancing the user interface of the visualization
352
+ framework. The article also outlines future works for the project.'
353
+ - 'Title: Summer Sprint 3: Planes, Trains, and Autonomous Vehicles
354
+
355
+ Published: September, 2022
356
+
357
+ Author(s): MHCI x 99P Labs Capstone Team
358
+
359
+ Claps: 4
360
+
361
+ Comments: 0
362
+
363
+ Word Count: 1728
364
+
365
+ URL: https://medium.com/99p-labs/summer-sprint-3-planes-trains-and-autonomous-vehicles-5e8b40dbb67e
366
+
367
+
368
+ The MHCI x 99P Labs Capstone Team worked remotely from NYC and San Diego, attending
369
+ two major UX conferences and learning skills to apply to their project. They solidified
370
+ their understanding of quantitative research methods and learned how to address
371
+ points of friction in a user''s journey. The team discovered benefits and challenges
372
+ of remote work, and tested low-fi prototypes of built-in screens in an autonomous
373
+ people-mover. They conducted a brainstorm of all the capabilities they imagine
374
+ their AV''s ecosystem would have and identified the highest priority capabilities.
375
+ The team also developed a wireflow based on their map of capabilities and prototyped
376
+ it in Figma to test with participants. They plan to A/B test different content
377
+ for the in-app options and continue to explore specificity levels when it comes
378
+ to giving passengers information. They are excited to bring all of their learnings
379
+ to life in their final design, which will inform the future of shared AV transportation.'
380
+ model-index:
381
+ - name: SentenceTransformer based on marroyo777/bge-99GPT-v1
382
+ results:
383
+ - task:
384
+ type: triplet
385
+ name: Triplet
386
+ dataset:
387
+ name: 99GPT Finetuning Embedding test 01
388
+ type: 99GPT-Finetuning-Embedding-test-01
389
+ metrics:
390
+ - type: cosine_accuracy
391
+ value: 0.9887005649717514
392
+ name: Cosine Accuracy
393
+ - type: dot_accuracy
394
+ value: 0.011299435028248588
395
+ name: Dot Accuracy
396
+ - type: manhattan_accuracy
397
+ value: 0.9887005649717514
398
+ name: Manhattan Accuracy
399
+ - type: euclidean_accuracy
400
+ value: 0.9887005649717514
401
+ name: Euclidean Accuracy
402
+ - type: max_accuracy
403
+ value: 0.9887005649717514
404
+ name: Max Accuracy
405
+ - type: cosine_accuracy
406
+ value: 0.9915254237288136
407
+ name: Cosine Accuracy
408
+ - type: dot_accuracy
409
+ value: 0.00847457627118644
410
+ name: Dot Accuracy
411
+ - type: manhattan_accuracy
412
+ value: 0.9915254237288136
413
+ name: Manhattan Accuracy
414
+ - type: euclidean_accuracy
415
+ value: 0.9915254237288136
416
+ name: Euclidean Accuracy
417
+ - type: max_accuracy
418
+ value: 0.9915254237288136
419
+ name: Max Accuracy
420
+ ---
421
+
422
+ # SentenceTransformer based on marroyo777/bge-99GPT-v1
423
+
424
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [marroyo777/bge-99GPT-v1](https://huggingface.co/marroyo777/bge-99GPT-v1). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
425
+
426
+ ## Model Details
427
+
428
+ ### Model Description
429
+ - **Model Type:** Sentence Transformer
430
+ - **Base model:** [marroyo777/bge-99GPT-v1](https://huggingface.co/marroyo777/bge-99GPT-v1) <!-- at revision 4ca01046331fa1aed7ce35326b38186f8baa5149 -->
431
+ - **Maximum Sequence Length:** 512 tokens
432
+ - **Output Dimensionality:** 384 tokens
433
+ - **Similarity Function:** Cosine Similarity
434
+ <!-- - **Training Dataset:** Unknown -->
435
+ <!-- - **Language:** Unknown -->
436
+ <!-- - **License:** Unknown -->
437
+
438
+ ### Model Sources
439
+
440
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
441
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
442
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
443
+
444
+ ### Full Model Architecture
445
+
446
+ ```
447
+ SentenceTransformer(
448
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
449
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
450
+ (2): Normalize()
451
+ )
452
+ ```
453
+
454
+ ## Usage
455
+
456
+ ### Direct Usage (Sentence Transformers)
457
+
458
+ First install the Sentence Transformers library:
459
+
460
+ ```bash
461
+ pip install -U sentence-transformers
462
+ ```
463
+
464
+ Then you can load this model and run inference.
465
+ ```python
466
+ from sentence_transformers import SentenceTransformer
467
+
468
+ # Download from the 🤗 Hub
469
+ model = SentenceTransformer("marroyo777/bge-99GPT-v1")
470
+ # Run inference
471
+ sentences = [
472
+ 'How does gamification enhance the learning experience in data science according to the blog?',
473
+ "Title: Unlocking Potential: The Power of Gamification in Employee Data Science Learning\nPublished: April, 2024\nAuthor(s): Fern Zhang\nClaps: 5\nComments: 0\nWord Count: 1661\nURL: https://medium.com/99p-labs/unlocking-potential-the-power-of-gamification-in-employee-data-science-learning-5f88e97c74aa\n\nThe blog article discusses the use of gamification in employee data science learning. It highlights the challenges in data science training and the team's initiative to revolutionize it using gamification strategies. The team adopted a multifaceted approach to understand the diverse backgrounds and prior knowledge of their target learners to design effective instruction. The article also discusses the gamification strategies for manager and practitioner training, as well as the user testing feedback and future plans for employee training in data science. Overall, the article emphasizes the importance of data science training and the use of gamification to make it an engaging and impactful learning experience.",
474
+ 'Title: CMU Capstone Project\u200a—\u200aVisualization Framework Of Telematics Data\nPublished: April, 2024\nAuthor(s): Yiheng Zhang, Yixue Yin, Rui Huang\nClaps: 1\nComments: 0\nWord Count: 2520\nURL: https://medium.com/99p-labs/cmu-capstone-project-visualization-framework-of-telematics-data-abb74fcbb975\n\nThe blog article discusses the development of an application to display telematic trajectory data in various formats on a web browser. The project involved brainstorming, user interviews, experimentation, and necessary pivots to define the trajectory of the development process. The team also focused on enhancing the foundational dashboard, building up a plugin system, fixing problems, and building new features. The final sprint involved finalizing and enhancing the user interface of the visualization framework. The article also outlines future works for the project.',
475
+ ]
476
+ embeddings = model.encode(sentences)
477
+ print(embeddings.shape)
478
+ # [3, 384]
479
+
480
+ # Get the similarity scores for the embeddings
481
+ similarities = model.similarity(embeddings, embeddings)
482
+ print(similarities.shape)
483
+ # [3, 3]
484
+ ```
485
+
486
+ <!--
487
+ ### Direct Usage (Transformers)
488
+
489
+ <details><summary>Click to see the direct usage in Transformers</summary>
490
+
491
+ </details>
492
+ -->
493
+
494
+ <!--
495
+ ### Downstream Usage (Sentence Transformers)
496
+
497
+ You can finetune this model on your own dataset.
498
+
499
+ <details><summary>Click to expand</summary>
500
+
501
+ </details>
502
+ -->
503
+
504
+ <!--
505
+ ### Out-of-Scope Use
506
+
507
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
508
+ -->
509
+
510
+ ## Evaluation
511
+
512
+ ### Metrics
513
+
514
+ #### Triplet
515
+ * Dataset: `99GPT-Finetuning-Embedding-test-01`
516
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
517
+
518
+ | Metric | Value |
519
+ |:-------------------|:-----------|
520
+ | cosine_accuracy | 0.9887 |
521
+ | dot_accuracy | 0.0113 |
522
+ | manhattan_accuracy | 0.9887 |
523
+ | euclidean_accuracy | 0.9887 |
524
+ | **max_accuracy** | **0.9887** |
525
+
526
+ #### Triplet
527
+ * Dataset: `99GPT-Finetuning-Embedding-test-01`
528
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
529
+
530
+ | Metric | Value |
531
+ |:-------------------|:-----------|
532
+ | cosine_accuracy | 0.9915 |
533
+ | dot_accuracy | 0.0085 |
534
+ | manhattan_accuracy | 0.9915 |
535
+ | euclidean_accuracy | 0.9915 |
536
+ | **max_accuracy** | **0.9915** |
537
+
538
+ <!--
539
+ ## Bias, Risks and Limitations
540
+
541
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
542
+ -->
543
+
544
+ <!--
545
+ ### Recommendations
546
+
547
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
548
+ -->
549
+
550
+ ## Training Details
551
+
552
+ ### Training Dataset
553
+
554
+ #### Unnamed Dataset
555
+
556
+
557
+ * Size: 1,416 training samples
558
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
559
+ * Approximate statistics based on the first 1000 samples:
560
+ | | anchor | positive | negative |
561
+ |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
562
+ | type | string | string | string |
563
+ | details | <ul><li>min: 8 tokens</li><li>mean: 17.71 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 190.68 tokens</li><li>max: 331 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 190.0 tokens</li><li>max: 331 tokens</li></ul> |
564
+ * Samples:
565
+ | anchor | positive | negative |
566
+ |:---------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
567
+ | <code>What guidance does the article provide for creating a co-design protocol?</code> | <code>Title: Interactive Co-Design Sessions for Customer Research — Part 2: Co-Design Protocol<br>Published: November, 2020<br>Author(s): Langley Vogt<br>Claps: 0<br>Comments: 0<br>Word Count: 497<br>URL: https://medium.com/99p-labs/interactive-co-design-sessions-for-customer-research-part-2-co-design-protocol-2c60291e88c9<br><br>The article discusses the process of creating an interactive co-design protocol for customer research. It emphasizes the importance of creating a thorough protocol and interactive board simultaneously, and provides guidance on creating a preliminary protocol and laying out the rest of the protocol in a table format. The article also mentions that Part 3 will share co-design learnings and takeaways.</code> | <code>Title: What is Software-defined Mobility?<br>Published: March, 2023<br>Author(s): Rajeev Chhajer and Ryan Lingo<br>Claps: 56<br>Comments: 0<br>Word Count: 742<br>URL: https://medium.com/99p-labs/what-is-software-defined-mobility/<br><br>The article discusses the concept of Software-defined Mobility and its impact on the automotive industry. It emphasizes the importance of incorporating intelligence into the mobility ecosystem through software to create a more integrated, sustainable, and emotional mobility experience. The authors believe that participation and cooperation are key to success in this new mobility paradigm, and they aim to leverage cutting-edge technologies and innovative approaches to address the challenges facing the automotive industry.</code> |
568
+ | <code>What was the goal of the MHCI 99P Labs Capstone Team's project?</code> | <code>Title: Interactions, Car Data, and Play Dynamics…Oh My!—2021 MHCI Capstone Part 8<br>Published: January, 2022<br>Author(s): MHCI 99P Labs Capstone Team<br>Claps: 0<br>Comments: 0<br>Word Count: 1061<br>URL: https://medium.com/99p-labs/interactions-car-data-and-play-dynamics-oh-my-2021-mhci-capstone-part-8-b3ac8dd1ceef<br><br>The MHCI 99P Labs Capstone Team shares their experiences and learnings from Sprint 2 of their project. They explored various interactions in the car, including shared motion and collaboration, button-based games, and co-creation with data input from the car. The team aimed to foster connections between families through play and successfully learned how these new interactions could achieve this goal. The marble game was the most successful, while the other two prototypes had mixed success. The team plans to take their learnings forward in the next sprint.</code> | <code>Title: Introducing the 99P Labs Blog Chatbot<br>Published: February, 2024<br>Author(s): Martin Arroyo<br>Claps: 4<br>Comments: 1<br>Word Count: 3208<br>URL: https://medium.com/99p-labs/99gpt-building-a-chatbot-fdde8b689df4<br><br>The 99P Labs blog has introduced a chatbot called 99GPT, designed to answer questions about blog content. The chatbot aims to provide a more engaging and interactive way for readers to explore insights from the blog archive. The article discusses the technical considerations, challenges, and lessons learned in building 99GPT, including the ingestion phase, model selection, and developing a querying strategy. The blog also highlights the importance of frameworks like Langchain and LlamaIndex in bridging the gap between raw data and AI-driven interactive applications. The article concludes with the deployment of the chatbot on the Streamlit community cloud.</code> |
569
+ | <code>What are the ideal data quality outputs mentioned in the article?</code> | <code>Title: Weighing the Value of Data Quality Checks<br>Published: July, 2022<br>Author(s): Ryan Lingo<br>Claps: 36<br>Comments: 0<br>Word Count: 2572<br>URL: https://medium.com/99p-labs/weighing-the-value-of-data-quality-checks-4a5d0da1f3ff<br><br>The article discusses the exploration of implementing data quality checks into a data platform, the goals, limits, and expectations, and the small experiments conducted to validate thinking. It also covers the flexibility and customization of data quality, potential actions to take when finding inadequate data quality, ideal data quality output, metrics to report, and where in the pipeline data quality checks best fit. The article also explores general deployment options and closing thoughts on the exploration of data quality ideas and architecture.</code> | <code>Title: Sprint 2: Robot You Can Drive My Car<br>Published: May, 2022<br>Author(s): MHCI x 99P Labs Capstone Team<br>Claps: 0<br>Comments: 0<br>Word Count: 648<br>URL: https://medium.com/99p-labs/sprint-2-robot-you-can-drive-my-car-e4d988826555<br><br>The blog article discusses the progress of the MHCI x 99P Labs Capstone Team in their project, focusing on the preliminary research and brainstorming they have conducted. The team has updated their research plan and is preparing to conduct informal interviews and observations in various related fields. They also plan to explore pretotyping in their next sprint to understand what form of attendants is most helpful to human passengers.</code> |
570
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
571
+ ```json
572
+ {
573
+ "scale": 20.0,
574
+ "similarity_fct": "cos_sim"
575
+ }
576
+ ```
577
+
578
+ ### Evaluation Dataset
579
+
580
+ #### Unnamed Dataset
581
+
582
+
583
+ * Size: 354 evaluation samples
584
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
585
+ * Approximate statistics based on the first 354 samples:
586
+ | | anchor | positive | negative |
587
+ |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
588
+ | type | string | string | string |
589
+ | details | <ul><li>min: 7 tokens</li><li>mean: 17.68 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 187.96 tokens</li><li>max: 331 tokens</li></ul> | <ul><li>min: 125 tokens</li><li>mean: 189.88 tokens</li><li>max: 331 tokens</li></ul> |
590
+ * Samples:
591
+ | anchor | positive | negative |
592
+ |:------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
593
+ | <code>What challenges did the 99P capstone team face in their project?</code> | <code>Title: Decoding Travel Times: Exploring Telematics Data Dynamics<br>Published: May, 2024<br>Author(s): Qamar Mohamoud<br>Claps: 3<br>Comments: 1<br>Word Count: 1880<br>URL: https://medium.com/99p-labs/decoding-travel-times-exploring-telematics-data-dynamics<br><br>The blog article discusses the challenges faced by the 99P capstone team of the MTDA program at The Ohio State University in building a model to compare real-life trip times to ideal times projected by the Google Distance Matrix. The team explored telematics data dynamics and the impact of geography, time of day, and local weather on trip times. The article also highlights the team's approach to feature creation, weather analysis, zone identification, data filtering, and modeling. Despite their efforts, the predictive models tested did not exceed 60% accuracy, leading to several key conclusions. The team advises caution in replicating their analysis and suggests addressing data bias, exploring alternative data sources, and considering route information for more accurate analyses in the future.</code> | <code>Title: Sprint 5: Optimizing HRI Research with Smart Guide — A Co-Design Journey<br>Published: May, 2024<br>Author(s): Honda Research Institute MHCI @ CMU<br>Claps: 2<br>Comments: 0<br>Word Count: 970<br>URL: https://medium.com/99p-labs/sprint-5-optimizing-hri-research-with-smart-guide-a-co-design-journey-fa5d64a56a3d<br><br>The blog article discusses the Smart Guide as an AI research companion for HRI researchers, aimed at enhancing the efficiency of human-AI teaming (HAIT) research. The article details the goals and testing process for the Smart Guide, as well as the insights gained from co-creation sessions with CMU researchers. The article also outlines the prototype and the key takeaways from the research process.</code> |
594
+ | <code>What challenges did the author face during the internship?</code> | <code>Title: Harnessing Sensors and Software<br>Published: August, 2023<br>Author(s): Edward Lui<br>Claps: 0<br>Comments: 0<br>Word Count: 1133<br>URL: https://medium.com/99p-labs/harnessing-sensors-and-software<br><br>The blog article discusses the author's two-month internship at 99P, focusing on sensors and their integration with the Robot Operating System (ROS). The author worked on the SOMEthings project, exploring technologies such as the Intel Realsense D435i Depth Camera, HC-SR04 Ultrasonic Sensor, and DW1000 UWB Module. The challenges faced and accomplishments achieved during the internship are highlighted, providing valuable insights and hands-on experience. The article concludes with an invitation for collaboration and engagement with 99P Labs.</code> | <code>Title: Sprint 6: Designing a Mobile Mentor<br>Published: October, 2023<br>Author(s): Alana Levene<br>Claps: 1<br>Comments: 0<br>Word Count: 1015<br>URL: https://medium.com/99p-labs/sprint-6-designing-a-mobile-mentor<br><br>The 99P Labs x CMU MHCI Capstone Team has transitioned from research to design, focusing on creating a Mobile Mentor for Gen Z to facilitate on-the-go learning. The team has identified key insights from their research and has begun the prototyping process using a low-fidelity cardboard model. They are actively involving participants in the design process and are considering various influencing factors on their product. The team plans to transition to a design sprint timeline and is excited to continue developing this innovative product.</code> |
595
+ | <code>What are the goals of the SOMEThings project?</code> | <code>Title: Introducing the SOMEThings Project<br>Published: July, 2023<br>Author(s): Ryan Lingo<br>Claps: 15<br>Comments: 0<br>Word Count: 2794<br>URL: https://medium.com/99p-labs/introducing-the-somethings-project-f5eb8b0cf572<br><br>The blog introduces the SOMEThings project, which is an initiative to build a miniature smart city for testing and experimenting with real-world challenges in the mobility ecosystem and IoT. The project aims to revolutionize the mobility sector, enhance efficiency and accessibility of mobility through IoT integration, and foster a culture of continuous learning and improvement. The blog also discusses the development of the SOMEThings Lab, the car, and the track for the project. The project is expected to have a substantial impact on the future of mobility and society at large.</code> | <code>Title: An Overview of Machine Learning — Part 2: All About Regression<br>Published: January, 2023<br>Author(s): Luka Brkljacic<br>Claps: 2<br>Comments: 0<br>Word Count: 4550<br>URL: https://medium.com/99p-labs/an-overview-of-machine-learning-part-2-all-about-regression-2f991281932e<br><br>The blog article provides an in-depth overview of regression in machine learning. It covers linear regression, calculating R, limitations of R, multiple regression, adjusted R, and logistic regression. The article also includes practical Python examples for linear regression and multiple regression. The author also mentions that the next post will cover decision trees.</code> |
596
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
597
+ ```json
598
+ {
599
+ "scale": 20.0,
600
+ "similarity_fct": "cos_sim"
601
+ }
602
+ ```
603
+
604
+ ### Training Hyperparameters
605
+ #### Non-Default Hyperparameters
606
+
607
+ - `eval_strategy`: steps
608
+ - `per_device_train_batch_size`: 16
609
+ - `per_device_eval_batch_size`: 16
610
+ - `num_train_epochs`: 1
611
+ - `warmup_ratio`: 0.1
612
+ - `fp16`: True
613
+ - `batch_sampler`: no_duplicates
614
+
615
+ #### All Hyperparameters
616
+ <details><summary>Click to expand</summary>
617
+
618
+ - `overwrite_output_dir`: False
619
+ - `do_predict`: False
620
+ - `eval_strategy`: steps
621
+ - `prediction_loss_only`: True
622
+ - `per_device_train_batch_size`: 16
623
+ - `per_device_eval_batch_size`: 16
624
+ - `per_gpu_train_batch_size`: None
625
+ - `per_gpu_eval_batch_size`: None
626
+ - `gradient_accumulation_steps`: 1
627
+ - `eval_accumulation_steps`: None
628
+ - `torch_empty_cache_steps`: None
629
+ - `learning_rate`: 5e-05
630
+ - `weight_decay`: 0.0
631
+ - `adam_beta1`: 0.9
632
+ - `adam_beta2`: 0.999
633
+ - `adam_epsilon`: 1e-08
634
+ - `max_grad_norm`: 1.0
635
+ - `num_train_epochs`: 1
636
+ - `max_steps`: -1
637
+ - `lr_scheduler_type`: linear
638
+ - `lr_scheduler_kwargs`: {}
639
+ - `warmup_ratio`: 0.1
640
+ - `warmup_steps`: 0
641
+ - `log_level`: passive
642
+ - `log_level_replica`: warning
643
+ - `log_on_each_node`: True
644
+ - `logging_nan_inf_filter`: True
645
+ - `save_safetensors`: True
646
+ - `save_on_each_node`: False
647
+ - `save_only_model`: False
648
+ - `restore_callback_states_from_checkpoint`: False
649
+ - `no_cuda`: False
650
+ - `use_cpu`: False
651
+ - `use_mps_device`: False
652
+ - `seed`: 42
653
+ - `data_seed`: None
654
+ - `jit_mode_eval`: False
655
+ - `use_ipex`: False
656
+ - `bf16`: False
657
+ - `fp16`: True
658
+ - `fp16_opt_level`: O1
659
+ - `half_precision_backend`: auto
660
+ - `bf16_full_eval`: False
661
+ - `fp16_full_eval`: False
662
+ - `tf32`: None
663
+ - `local_rank`: 0
664
+ - `ddp_backend`: None
665
+ - `tpu_num_cores`: None
666
+ - `tpu_metrics_debug`: False
667
+ - `debug`: []
668
+ - `dataloader_drop_last`: False
669
+ - `dataloader_num_workers`: 0
670
+ - `dataloader_prefetch_factor`: None
671
+ - `past_index`: -1
672
+ - `disable_tqdm`: False
673
+ - `remove_unused_columns`: True
674
+ - `label_names`: None
675
+ - `load_best_model_at_end`: False
676
+ - `ignore_data_skip`: False
677
+ - `fsdp`: []
678
+ - `fsdp_min_num_params`: 0
679
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
680
+ - `fsdp_transformer_layer_cls_to_wrap`: None
681
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
682
+ - `deepspeed`: None
683
+ - `label_smoothing_factor`: 0.0
684
+ - `optim`: adamw_torch
685
+ - `optim_args`: None
686
+ - `adafactor`: False
687
+ - `group_by_length`: False
688
+ - `length_column_name`: length
689
+ - `ddp_find_unused_parameters`: None
690
+ - `ddp_bucket_cap_mb`: None
691
+ - `ddp_broadcast_buffers`: False
692
+ - `dataloader_pin_memory`: True
693
+ - `dataloader_persistent_workers`: False
694
+ - `skip_memory_metrics`: True
695
+ - `use_legacy_prediction_loop`: False
696
+ - `push_to_hub`: False
697
+ - `resume_from_checkpoint`: None
698
+ - `hub_model_id`: None
699
+ - `hub_strategy`: every_save
700
+ - `hub_private_repo`: False
701
+ - `hub_always_push`: False
702
+ - `gradient_checkpointing`: False
703
+ - `gradient_checkpointing_kwargs`: None
704
+ - `include_inputs_for_metrics`: False
705
+ - `eval_do_concat_batches`: True
706
+ - `fp16_backend`: auto
707
+ - `push_to_hub_model_id`: None
708
+ - `push_to_hub_organization`: None
709
+ - `mp_parameters`:
710
+ - `auto_find_batch_size`: False
711
+ - `full_determinism`: False
712
+ - `torchdynamo`: None
713
+ - `ray_scope`: last
714
+ - `ddp_timeout`: 1800
715
+ - `torch_compile`: False
716
+ - `torch_compile_backend`: None
717
+ - `torch_compile_mode`: None
718
+ - `dispatch_batches`: None
719
+ - `split_batches`: None
720
+ - `include_tokens_per_second`: False
721
+ - `include_num_input_tokens_seen`: False
722
+ - `neftune_noise_alpha`: None
723
+ - `optim_target_modules`: None
724
+ - `batch_eval_metrics`: False
725
+ - `eval_on_start`: False
726
+ - `eval_use_gather_object`: False
727
+ - `batch_sampler`: no_duplicates
728
+ - `multi_dataset_batch_sampler`: proportional
729
+
730
+ </details>
731
+
732
+ ### Training Logs
733
+ | Epoch | Step | 99GPT-Finetuning-Embedding-test-01_max_accuracy |
734
+ |:-----:|:----:|:-----------------------------------------------:|
735
+ | 1.0 | 89 | 0.9915 |
736
+
737
+
738
+ ### Framework Versions
739
+ - Python: 3.10.12
740
+ - Sentence Transformers: 3.1.1
741
+ - Transformers: 4.44.2
742
+ - PyTorch: 2.4.1+cu121
743
+ - Accelerate: 0.34.2
744
+ - Datasets: 3.0.1
745
+ - Tokenizers: 0.19.1
746
+
747
+ ## Citation
748
+
749
+ ### BibTeX
750
+
751
+ #### Sentence Transformers
752
+ ```bibtex
753
+ @inproceedings{reimers-2019-sentence-bert,
754
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
755
+ author = "Reimers, Nils and Gurevych, Iryna",
756
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
757
+ month = "11",
758
+ year = "2019",
759
+ publisher = "Association for Computational Linguistics",
760
+ url = "https://arxiv.org/abs/1908.10084",
761
+ }
762
+ ```
763
+
764
+ #### MultipleNegativesRankingLoss
765
+ ```bibtex
766
+ @misc{henderson2017efficient,
767
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
768
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
769
+ year={2017},
770
+ eprint={1705.00652},
771
+ archivePrefix={arXiv},
772
+ primaryClass={cs.CL}
773
+ }
774
+ ```
775
+
776
+ <!--
777
+ ## Glossary
778
+
779
+ *Clearly define terms in order to be accessible across audiences.*
780
+ -->
781
+
782
+ <!--
783
+ ## Model Card Authors
784
+
785
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
786
+ -->
787
+
788
+ <!--
789
+ ## Model Card Contact
790
+
791
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
792
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "marroyo777/bge-99GPT-v1",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.44.2",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35160adc86cb4e7e6f8ec496af9093df1f6bece8ee9bc633b82e224d1b0e4c56
3
+ size 133462128
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff