dnth commited on
Commit
956d94a
·
verified ·
1 Parent(s): 1fcc06a

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,610 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:6032
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: nomic-ai/modernbert-embed-base
11
+ widget:
12
+ - source_sentence: The Ship Agency Manager liaises with port officials and terminal
13
+ operators to plan husbandry works and/or cargo Operations, and is responsible
14
+ for ships interests when they are in port. He/She has a sound knowledge of customs
15
+ and immigration procedures, and port and flag state regulations, and is able to
16
+ anticipate potential disruptions to work plans. He oversees a team and possesses
17
+ strong interpersonal skills to establish strong relationships with the industry.
18
+ sentences:
19
+ - The Port Operations Analyst monitors shipping schedules and cargo data to optimize
20
+ berth allocations and terminal efficiency. This role requires strong analytical
21
+ skills and familiarity with logistics software but does not involve direct liaison
22
+ with port officials or supervision of personnel. The analyst focuses primarily
23
+ on data-driven decision making within port operations, without responsibility
24
+ for customs or flag state compliance.
25
+ - The Ship Agency Manager coordinates closely with port authorities and terminal
26
+ operators to organize husbandry services and cargo handling activities, ensuring
27
+ the protection of the ship's interests while docked. This role demands thorough
28
+ understanding of customs, immigration protocols, as well as port and flag state
29
+ regulations, enabling proactive management of any operational interruptions. Leading
30
+ a dedicated team, the manager also demonstrates excellent interpersonal abilities
31
+ to foster robust industry partnerships.
32
+ - The Network Development Technician supports the implementation of projects related
33
+ to electricity transmission and distribution networks, including the integration
34
+ of renewable energy sources and energy storage with the grid. This role involves
35
+ coordinating civil construction activities for substations, cable laying, and
36
+ equipment installation. The technician performs meter readings, installs and tests
37
+ metering devices, and secures necessary work permits while adhering strictly to
38
+ Safe System of Work protocols. As part of the Emergency Response Team, the technician
39
+ responds promptly during urgent situations following established safety procedures.
40
+ Work is conducted primarily outdoors at construction sites and customer locations
41
+ to develop power transmission and distribution systems. The technician must demonstrate
42
+ strong teamwork, effective communication with various stakeholders, and meticulous
43
+ attention to procedural compliance.
44
+ - source_sentence: The Vice President leads and manages the endorsement of policies
45
+ that govern the Standard Operating Procedures (SOPs) to be executed in the event
46
+ of emergencies. He/She works with senior representatives from different departments
47
+ to enhance emergency response readiness of the organisation and formulates contingency
48
+ plans for different services affected during incidents or accidents. He is in
49
+ charge of evaluating all activities with regards to airport emergency services
50
+ in order to identify and establish goals for long-term planning. He also initiates
51
+ new projects for the airport and builds broad professional networks within and
52
+ outside the organisation. As the Vice President for Airport Emergency Services,
53
+ he possesses an in-depth knowledge of all airport and aviation facilities and
54
+ operations. He is cognisant of new technologies and regulations impacting the
55
+ aviation industry. In addition, he has exceptional leadership and negotiation
56
+ skills to lead the organisation and manage external stakeholders effectively.
57
+ He also possesses strong networking skills and a high level of resourcefulness
58
+ in order to establish partnerships with industrial professionals and alliances
59
+ internally, externally and virtually.
60
+ sentences:
61
+ - The Senior Assistant Engineer (Engineering Train) leads a team responsible for
62
+ both preventive and corrective maintenance of engineering trains. Proficient in
63
+ operating calibration and diagnostic tools, he ensures the upkeep and functionality
64
+ of train systems while supporting continuous process enhancements. He oversees
65
+ team performance against defined KPIs and works on a rotating shift basis across
66
+ multiple train depots and maintenance workshops. Demonstrating strong leadership
67
+ and attention to detail, he enforces stringent safety protocols to guarantee safe
68
+ and organized maintenance operations.
69
+ - The Vice President of Airport Emergency Services is responsible for overseeing
70
+ the development and approval of policies that dictate the Standard Operating Procedures
71
+ (SOPs) during emergencies. Collaborating closely with senior leaders across multiple
72
+ departments, this role enhances the organisation’s readiness to respond to emergencies
73
+ and devises contingency strategies for services impacted by incidents. The Vice
74
+ President evaluates airport emergency operations to set strategic objectives for
75
+ future growth and initiates innovative projects to improve airport safety. Possessing
76
+ comprehensive expertise in aviation facilities and operational protocols, this
77
+ individual stays abreast of evolving technologies and regulatory changes in the
78
+ aviation sector. Strong leadership, negotiation, and networking abilities enable
79
+ the Vice President to effectively guide the organisation and foster partnerships
80
+ with both internal teams and external industry stakeholders.
81
+ - The Vice President of Airport Security oversees the implementation of security
82
+ protocols and manages teams responsible for passenger screening, threat detection,
83
+ and access control at the airport. This role focuses on developing security measures
84
+ to prevent unauthorized access and mitigate security risks while ensuring compliance
85
+ with national and international aviation security regulations. The Vice President
86
+ leads security personnel, coordinates with government agencies, and manages crisis
87
+ situations related to security breaches. Expertise in surveillance technology,
88
+ threat assessment, and counter-terrorism strategies is essential. Strong leadership
89
+ and communication skills are required to maintain security standards and liaise
90
+ with external security partners.
91
+ - source_sentence: The Process Specialist/Shift Leader/Team Leader coordinates the
92
+ day-to-day operations of a production team to meet production and quality standards,
93
+ while ensuring compliance with workplace safety and health (WSH) procedures. He/She
94
+ also works with the team to assess the feasibility of improvements to enhance
95
+ productivity and efficiency at the workplace. He also diagnoses faults, maintains
96
+ machines and oversees the housekeeping of machine tools and devices. He may be
97
+ required to work on rotating shifts in a factory setting. He possesses good communication
98
+ and leadership skills to guide his team and ensure compliance to WSH requirements,
99
+ organisational quality control and other parameters.
100
+ sentences:
101
+ - The Process Specialist/Shift Leader/Team Leader oversees the scheduling and coordination
102
+ of logistics operations to meet delivery deadlines and customer service standards,
103
+ while ensuring compliance with transportation safety regulations. This role requires
104
+ working with cross-functional teams to develop strategies that optimize supply
105
+ chain efficiency. The incumbent is responsible for managing fleet maintenance,
106
+ monitoring shipment tracking systems, and supervising warehouse organization.
107
+ Shift work may be necessary to support 24/7 logistics activities. Excellent communication
108
+ and leadership skills are needed to lead the team and ensure adherence to safety
109
+ policies, service quality, and operational guidelines.
110
+ - The Process Specialist/Shift Leader/Team Leader manages daily operations of a
111
+ manufacturing team to achieve production targets and quality benchmarks, while
112
+ ensuring adherence to workplace safety and health (WSH) regulations. This role
113
+ involves collaborating with the team to evaluate potential process enhancements
114
+ that improve operational productivity and efficiency. The incumbent is also responsible
115
+ for troubleshooting equipment issues, performing routine maintenance, and supervising
116
+ the cleanliness and orderliness of machinery and tools. The position may require
117
+ shift work within a factory environment. Strong leadership and communication abilities
118
+ are essential to effectively direct the team and uphold compliance with WSH standards,
119
+ organizational quality controls, and related protocols.
120
+ - The Enterprise Risk Management Manager oversees the identification and mitigation
121
+ of risks across the entire organisation by partnering with various risk functions.
122
+ This role involves collaborating with internal teams to define risk thresholds
123
+ within business segments, designing risk reporting tools, and recommending control
124
+ measures aligned with enterprise-wide risk frameworks. The manager develops comprehensive
125
+ risk assessments to evaluate risk impact and severity while supporting recovery
126
+ planning following significant risk events. The ideal candidate is proactive,
127
+ innovative, and capable of working autonomously, with a strong understanding of
128
+ organisational operations, decision-making protocols, and business strategy. They
129
+ possess sharp analytical abilities, communicate effectively with senior leadership
130
+ on critical risk matters, and lead cross-functional teams adeptly. Familiarity
131
+ with multiple risk areas across different sectors and a broad understanding of
132
+ risk types are essential for success in this role.
133
+ - source_sentence: The Financial Forensics Manager guides his/her financial forensics
134
+ team in delivering forensic investigation, prevention and detection activities,
135
+ reviewing and communicating the results and recommendations to clients and stakeholders.
136
+ The Financial Forensics Manager reviews findings from fraud risk identification
137
+ exercises and fraud investigations and recommendations to improve prevention and
138
+ detection of fraud schemes. He critiques other expert reports and provides advice
139
+ on settlements for litigation purposes. He also engages in business development
140
+ opportunities, developing proposals for clients if in an external consultant role.
141
+ He conducts fraud awareness and fraud prevention training for both internal and
142
+ external parties. The Financial Forensics Manager may manage the internal team
143
+ or a team of forensic consultants who provide forensic services to external clients.
144
+ He should be results-oriented in his work and is able to deliver reports and findings
145
+ needed for different client groups. He is able to communicate with senior management
146
+ and stakeholders on sensitive issues. He applies the principles of the Code of
147
+ Professional Conduct and Ethics in all his tasks.
148
+ sentences:
149
+ - The Engineer is responsible for overseeing the bus fleet’s operational performance
150
+ to ensure safety and reliability, detecting any system degradations, diagnosing
151
+ underlying issues, and applying corrective measures to minimize service interruptions.
152
+ This role involves providing technical guidance to the maintenance team, leveraging
153
+ a deep understanding of bus systems and engineering concepts. The Engineer also
154
+ undertakes engineering research to optimize bus operations and maintenance strategies
155
+ by adopting industry best practices and integrating advanced technological solutions.
156
+ With strong analytical capabilities, technological proficiency, and innovation-driven
157
+ mindset, the Engineer effectively manages projects aimed at implementing fleet-wide
158
+ improvements and new technology deployments to boost overall fleet performance
159
+ and maintenance effectiveness.
160
+ - The Financial Planning Manager oversees the development and implementation of
161
+ financial strategies to optimize company resources and maximize profitability.
162
+ This role involves budgeting, forecasting, and financial analysis to support operational
163
+ decisions, working closely with department heads to align financial plans with
164
+ organizational goals. The manager leads a team responsible for monitoring expenditures,
165
+ preparing management reports, and ensuring compliance with financial regulations.
166
+ Engaging with external partners, the Financial Planning Manager also handles contract
167
+ negotiations and supports investor relations activities. Strong analytical and
168
+ communication skills are essential for this role, which requires adherence to
169
+ corporate governance and ethical standards.
170
+ - The Financial Forensics Manager leads a team dedicated to conducting forensic
171
+ investigations, focusing on detecting and preventing fraudulent activities. This
172
+ role involves analyzing fraud risk assessments, reviewing investigative outcomes,
173
+ and advising clients and stakeholders on recommended actions. The manager evaluates
174
+ expert analyses and offers guidance on litigation settlements, while also pursuing
175
+ business development by preparing client proposals when serving as an external
176
+ consultant. Additionally, the manager designs and delivers fraud prevention training
177
+ programs to internal teams and external audiences. Overseeing either an in-house
178
+ unit or forensic consultants working with external clients, this position demands
179
+ a results-driven professional capable of producing comprehensive reports tailored
180
+ for various stakeholders and effectively communicating sensitive issues to senior
181
+ leadership. The manager consistently upholds the standards set forth in the Code
182
+ of Professional Conduct and Ethics.
183
+ - source_sentence: The Sound Recordist executes sound recording operations. He/She
184
+ is responsible for recording sound on location or in a studio. He usually records
185
+ sounds in synchronisation with the camera to enable high quality sounds to be
186
+ captured at the time of shooting. He coordinates with other crew members to assess
187
+ the shoot location and studio configuration, and plans the placement of sound
188
+ equipment to ensure that it does not cast shadows on frames. He operates the sound
189
+ recording equipment based on the sound design briefs and ensures that recordings
190
+ are stored appropriately. He monitors the quality of the sound recording and sound
191
+ effects by using headphones and channels it to the appropriate teams for further
192
+ sound quality checks. After the shoot, he has to dismantle and clean the sound
193
+ equipment. He is required to follow workplace safety and health standards and
194
+ escalate any reports or breaches to the relevant authorities. The work involves
195
+ long hours of physically demanding tasks, especially during the operation of sound
196
+ recording equipment. He needs to be physically strong to operate the equipment
197
+ for long periods of time. He is required to have a strong knowledge of sound technology,
198
+ sound equipment, camera equipment and radio transmission technology. He ought
199
+ to be an effective team player and should be able to think of creative solutions
200
+ to problems posed by particular locations and situations. He should have a good
201
+ sense of timing and an excellent sense of hearing.
202
+ sentences:
203
+ - The Airport Emergency Assistant Manager plays a crucial role in evaluating and
204
+ addressing the airport’s safety and security requirements while managing emergency
205
+ response operations. This role involves directing personnel deployment during
206
+ incidents and coordinating with various airport stakeholders to effectively handle
207
+ emergencies, accidents, and other critical situations. The assistant manager also
208
+ organizes external training sessions for new equipment use, and develops comprehensive
209
+ workforce development plans including on-the-job training initiatives. To ensure
210
+ a high standard of safety and security, the individual promotes a rigorous safety
211
+ culture and recommends necessary corrective actions. Leading and mentoring the
212
+ emergency response team, the assistant manager participates in ongoing training
213
+ programs to stay current with emergency preparedness protocols. The position requires
214
+ shift work, possession of a Class 3 driving licence and Airfield Driving Permit
215
+ (ADP) to operate specialised fire apparatus and vehicles. Candidates must demonstrate
216
+ strong physical and mental fitness, keen hearing and vision, as well as the ability
217
+ to maintain calmness and clear judgment under pressure. Effective leadership and
218
+ team coaching skills are essential to identify and fulfill team training needs.
219
+ - The Sound Engineer is tasked with mixing and mastering audio tracks in a post-production
220
+ studio setting. Unlike on-location recording, this role focuses on enhancing recorded
221
+ sound by applying effects, balancing audio levels, and ensuring clarity for final
222
+ distribution. The Sound Engineer works independently to manipulate sound files
223
+ using digital audio workstations and collaborates with music producers to meet
224
+ artistic goals. This position requires advanced knowledge of sound editing software,
225
+ acoustics, and mastering techniques rather than live sound capture. Physical demands
226
+ are minimal compared to field recording, and the role emphasizes technical proficiency
227
+ in audio manipulation over equipment setup and synchronization with camera operations.
228
+ Safety protocols pertain mainly to studio ergonomics and electrical standards.
229
+ - The Sound Recordist is responsible for capturing high-quality audio during both
230
+ studio and on-location shoots. This role involves synchronizing audio recordings
231
+ with camera footage and collaborating closely with the production crew to evaluate
232
+ shooting environments and strategically position sound equipment to avoid visual
233
+ interference. The Sound Recordist operates specialized recording gear following
234
+ sound design requirements, ensures proper storage of audio files, and constantly
235
+ monitors audio quality through headphones, forwarding recordings for further sound
236
+ refinement. Post-shoot duties include equipment breakdown and maintenance. The
237
+ position demands adherence to health and safety regulations, physical stamina
238
+ for handling equipment over extended periods, and a thorough understanding of
239
+ audio technology, camera gear, and wireless transmission. Strong teamwork, creative
240
+ problem-solving, precise timing, and acute auditory skills are essential for success.
241
+ datasets:
242
+ - dnth/ssf-train-valid-v4
243
+ pipeline_tag: sentence-similarity
244
+ library_name: sentence-transformers
245
+ ---
246
+
247
+ # SentenceTransformer based on nomic-ai/modernbert-embed-base
248
+
249
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the [ssf-train-valid-v4](https://huggingface.co/datasets/dnth/ssf-train-valid-v4) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
250
+
251
+ ## Model Details
252
+
253
+ ### Model Description
254
+ - **Model Type:** Sentence Transformer
255
+ - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
256
+ - **Maximum Sequence Length:** 8192 tokens
257
+ - **Output Dimensionality:** 768 dimensions
258
+ - **Similarity Function:** Cosine Similarity
259
+ - **Training Dataset:**
260
+ - [ssf-train-valid-v4](https://huggingface.co/datasets/dnth/ssf-train-valid-v4)
261
+ <!-- - **Language:** Unknown -->
262
+ <!-- - **License:** Unknown -->
263
+
264
+ ### Model Sources
265
+
266
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
267
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
268
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
269
+
270
+ ### Full Model Architecture
271
+
272
+ ```
273
+ SentenceTransformer(
274
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
275
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
276
+ (2): Normalize()
277
+ )
278
+ ```
279
+
280
+ ## Usage
281
+
282
+ ### Direct Usage (Sentence Transformers)
283
+
284
+ First install the Sentence Transformers library:
285
+
286
+ ```bash
287
+ pip install -U sentence-transformers
288
+ ```
289
+
290
+ Then you can load this model and run inference.
291
+ ```python
292
+ from sentence_transformers import SentenceTransformer
293
+
294
+ # Download from the 🤗 Hub
295
+ model = SentenceTransformer("dnth/ssf-retriever-modernbert-embed-base-v4")
296
+ # Run inference
297
+ sentences = [
298
+ 'The Sound Recordist executes sound recording operations. He/She is responsible for recording sound on location or in a studio. He usually records sounds in synchronisation with the camera to enable high quality sounds to be captured at the time of shooting. He coordinates with other crew members to assess the shoot location and studio configuration, and plans the placement of sound equipment to ensure that it does not cast shadows on frames. He operates the sound recording equipment based on the sound design briefs and ensures that recordings are stored appropriately. He monitors the quality of the sound recording and sound effects by using headphones and channels it to the appropriate teams for further sound quality checks. After the shoot, he has to dismantle and clean the sound equipment. He is required to follow workplace safety and health standards and escalate any reports or breaches to the relevant authorities. The work involves long hours of physically demanding tasks, especially during the operation of sound recording equipment. He needs to be physically strong to operate the equipment for long periods of time. He is required to have a strong knowledge of sound technology, sound equipment, camera equipment and radio transmission technology. He ought to be an effective team player and should be able to think of creative solutions to problems posed by particular locations and situations. He should have a good sense of timing and an excellent sense of hearing.',
299
+ 'The Sound Recordist is responsible for capturing high-quality audio during both studio and on-location shoots. This role involves synchronizing audio recordings with camera footage and collaborating closely with the production crew to evaluate shooting environments and strategically position sound equipment to avoid visual interference. The Sound Recordist operates specialized recording gear following sound design requirements, ensures proper storage of audio files, and constantly monitors audio quality through headphones, forwarding recordings for further sound refinement. Post-shoot duties include equipment breakdown and maintenance. The position demands adherence to health and safety regulations, physical stamina for handling equipment over extended periods, and a thorough understanding of audio technology, camera gear, and wireless transmission. Strong teamwork, creative problem-solving, precise timing, and acute auditory skills are essential for success.',
300
+ 'The Sound Engineer is tasked with mixing and mastering audio tracks in a post-production studio setting. Unlike on-location recording, this role focuses on enhancing recorded sound by applying effects, balancing audio levels, and ensuring clarity for final distribution. The Sound Engineer works independently to manipulate sound files using digital audio workstations and collaborates with music producers to meet artistic goals. This position requires advanced knowledge of sound editing software, acoustics, and mastering techniques rather than live sound capture. Physical demands are minimal compared to field recording, and the role emphasizes technical proficiency in audio manipulation over equipment setup and synchronization with camera operations. Safety protocols pertain mainly to studio ergonomics and electrical standards.',
301
+ ]
302
+ embeddings = model.encode(sentences)
303
+ print(embeddings.shape)
304
+ # [3, 768]
305
+
306
+ # Get the similarity scores for the embeddings
307
+ similarities = model.similarity(embeddings, embeddings)
308
+ print(similarities)
309
+ # tensor([[1.0000, 0.8566, 0.4381],
310
+ # [0.8566, 1.0000, 0.4453],
311
+ # [0.4381, 0.4453, 1.0000]])
312
+ ```
313
+
314
+ <!--
315
+ ### Direct Usage (Transformers)
316
+
317
+ <details><summary>Click to see the direct usage in Transformers</summary>
318
+
319
+ </details>
320
+ -->
321
+
322
+ <!--
323
+ ### Downstream Usage (Sentence Transformers)
324
+
325
+ You can finetune this model on your own dataset.
326
+
327
+ <details><summary>Click to expand</summary>
328
+
329
+ </details>
330
+ -->
331
+
332
+ <!--
333
+ ### Out-of-Scope Use
334
+
335
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
336
+ -->
337
+
338
+ <!--
339
+ ## Bias, Risks and Limitations
340
+
341
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
342
+ -->
343
+
344
+ <!--
345
+ ### Recommendations
346
+
347
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
348
+ -->
349
+
350
+ ## Training Details
351
+
352
+ ### Training Dataset
353
+
354
+ #### ssf-train-valid-v4
355
+
356
+ * Dataset: [ssf-train-valid-v4](https://huggingface.co/datasets/dnth/ssf-train-valid-v4) at [8fe074a](https://huggingface.co/datasets/dnth/ssf-train-valid-v4/tree/8fe074acfa95ccc23f130bfbf4bce81683edb1a3)
357
+ * Size: 6,032 training samples
358
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
359
+ * Approximate statistics based on the first 1000 samples:
360
+ | | anchor | positive | negative |
361
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
362
+ | type | string | string | string |
363
+ | details | <ul><li>min: 58 tokens</li><li>mean: 170.17 tokens</li><li>max: 403 tokens</li></ul> | <ul><li>min: 58 tokens</li><li>mean: 138.51 tokens</li><li>max: 268 tokens</li></ul> | <ul><li>min: 47 tokens</li><li>mean: 108.13 tokens</li><li>max: 237 tokens</li></ul> |
364
+ * Samples:
365
+ | anchor | positive | negative |
366
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
367
+ | <code>An Occupational Therapist is responsible for providing client care, performing therapy execution and client education activities to an assigned group of clients. S/He performs continuing education and research to achieve quality client care. S/He is also conscientious in providing therapy recommendations based on evaluation of the clients condition. S/He may work in various settings such as public and private institutions, acute and community hospitals, rehabilitation centres, voluntary welfare organisations, schools, long-term care facilities and clients homes and work environments. S/He may also work as part of collaborative, interdisciplinary teams which may include teachers, doctors, audiologists, psychologists, social workers, physiotherapists and speech therapists. S/He should have initiative and be sensitive to the needs of her/his clients. S/He should possess communication and problem-solving skills.</code> | <code>The Occupational Therapist delivers direct therapeutic interventions and educates clients within an assigned caseload, ensuring high-quality care through ongoing professional development and research. This role entails assessing client conditions to formulate tailored therapy plans and collaborating with multidisciplinary teams including medical, educational, and social care professionals. The Occupational Therapist may practice across diverse environments such as hospitals, rehabilitation centres, schools, community agencies, and client residences. Strong communication, empathy, and problem-solving abilities are essential to effectively address client needs and promote optimal outcomes.</code> | <code>The Speech Therapist provides specialized assessment and intervention for clients with speech, language, and communication disorders across healthcare and educational settings. This role focuses on diagnosing communication challenges, developing individualized therapy programs, and collaborating with educators and healthcare providers to support client progress. Unlike the Occupational Therapist, the Speech Therapist’s expertise centers on speech pathology rather than functional and occupational rehabilitation. Effective communication skills and teamwork are critical for success in this position.</code> |
368
+ | <code>The Producer - Film leads the end-to-end management of film production from a creative and operational perspective. He/She oversees and manages the entire lifecycle of film production from the ideation of content to pre-production, production, post-production to finally reaching the audience by distribution. He performs creative as well as management responsibilities and leads a team responsible for the creative coordination and logistical management of production to ensure smooth production operations. He leads production operations and spends long hours on the production location. He is also required to liaise with multiple internal and external stakeholders to have his proposals approved. In some instances, he is also responsible for the hiring of the right cast for the production to enable the achievement of the creative vision of the production, The work involves leading projects or teams and provision of guidance to the production department in identifying projects with high cust...</code> | <code>The Producer - Film is responsible for overseeing the entire film production process from initial concept development through to final distribution. This role combines creative vision with operational leadership, managing all phases including pre-production, shooting, and post-production. The Producer directs a team to coordinate creative efforts and logistical arrangements, ensuring seamless production workflows. They work extensively on set and collaborate with a variety of internal teams and external partners to secure necessary approvals. Additionally, the Producer may handle casting decisions to align with the creative goals of the project. This position requires strong project management skills to deliver films on schedule and within budget, as well as an acute understanding of audience preferences to select projects with high engagement and commercial potential.</code> | <code>The Producer - Documentary leads the research and content development for non-fiction film projects, focusing primarily on factual storytelling and investigative reporting. Unlike the traditional film producer who manages large-scale productions, this role emphasizes gathering real-world information, conducting interviews, and ensuring factual accuracy. The Producer - Documentary works closely with editorial teams and subject matter experts, spending significant time in the field rather than on a production set. While still coordinating project timelines and budgets, this position requires a deep understanding of documentary ethics and compliance with broadcasting standards rather than commercial entertainment metrics.</code> |
369
+ | <code>The Crewing Manager leads the development of recruitment and deployment strategies for seafarers. He/She oversees the crew recruitment processes and ensures that candidate selection, training and deployment procedures are up-to-date with industry best practices, and in compliance with International Maritime Organisation (IMO) regulations, the Standards for Training, Certification and Watchkeeping for Seafarers (STCW) conventions and the Maritime Labour Convention. He leads engagements with key stakeholders over protection and indemnity (P&I) claims, legal claims and compensation pay-outs, in the event of accidents and/or incidents occurring. He leads negotiations with seafaring unions for collective bargaining agreements and reviews crewing expenditure reports to ensure budget compliance.</code> | <code>The Crewing Manager is responsible for directing the recruitment and placement strategies for maritime personnel. This role involves managing the end-to-end crew hiring process, ensuring that selection, training, and deployment align with the latest industry standards and comply with regulations such as those set by the International Maritime Organisation (IMO), the STCW conventions, and the Maritime Labour Convention. The Crewing Manager also coordinates with stakeholders regarding protection and indemnity (P&I) claims, legal matters, and compensation related to maritime incidents. Additionally, this position leads union negotiations for collective bargaining agreements and monitors crewing budgets to maintain financial adherence.</code> | <code>The Crewing Coordinator manages the scheduling and logistical support for shipboard operations within the maritime industry. They focus primarily on coordinating daily crew assignments and travel arrangements, without direct involvement in recruitment or regulatory compliance. The role requires strong organizational skills to handle personnel rotations and onboard welfare, but does not include negotiation of collective agreements or oversight of legal claims and budgetary control.</code> |
370
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
371
+ ```json
372
+ {
373
+ "scale": 20.0,
374
+ "similarity_fct": "cos_sim",
375
+ "gather_across_devices": false
376
+ }
377
+ ```
378
+
379
+ ### Evaluation Dataset
380
+
381
+ #### ssf-train-valid-v4
382
+
383
+ * Dataset: [ssf-train-valid-v4](https://huggingface.co/datasets/dnth/ssf-train-valid-v4) at [8fe074a](https://huggingface.co/datasets/dnth/ssf-train-valid-v4/tree/8fe074acfa95ccc23f130bfbf4bce81683edb1a3)
384
+ * Size: 1,508 evaluation samples
385
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
386
+ * Approximate statistics based on the first 1000 samples:
387
+ | | anchor | positive | negative |
388
+ |:--------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
389
+ | type | string | string | string |
390
+ | details | <ul><li>min: 57 tokens</li><li>mean: 167.41 tokens</li><li>max: 349 tokens</li></ul> | <ul><li>min: 58 tokens</li><li>mean: 135.9 tokens</li><li>max: 269 tokens</li></ul> | <ul><li>min: 49 tokens</li><li>mean: 106.77 tokens</li><li>max: 250 tokens</li></ul> |
391
+ * Samples:
392
+ | anchor | positive | negative |
393
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
394
+ | <code>The Chief Executive Officer/Chief Operating Officer/Managing Director/General Manager/Vice-President provides the overall direction of the organisation. As a systems thinker, he/she strategises and directs operational activities at the highest level of management with the help of a management team. He translates broad goals into achievable steps, anticipates and stays ahead of trends and takes advantage of opportunities. He also represents the organisation before customers, investors and business partners. He also formulates ideas and drives change in an organisation, while maintaining a culture of innovativeness to sustain value creation in meeting the organisations competitive position and long-term objectives. With a nurturing mindset, he also mentors and develops talent as future leaders.</code> | <code>The Chief Executive Officer/Chief Operating Officer/Managing Director/General Manager/Vice-President leads the organisation by setting strategic direction and overseeing high-level operational functions alongside a management team. Acting as a visionary, this leader converts broad organisational goals into practical initiatives, anticipates market trends, and leverages opportunities to maintain competitive advantage. They represent the company in interactions with clients, investors, and partners, while fostering a culture of innovation and continuous improvement. Additionally, they are committed to talent development, mentoring future leaders to support the organisation’s long-term success.</code> | <code>The Chief Marketing Officer leads the organisation’s marketing efforts by developing branding strategies and managing campaigns to enhance customer engagement. Collaborating with the marketing team, they analyze market data, plan promotional activities, and oversee digital marketing channels. This role requires expertise in consumer behavior, advertising, and communications, focusing on driving sales and brand awareness rather than overall organisational strategy or operational management.</code> |
395
+ | <code>The Business Development Manager assumes the responsibility of leading the organisation's business development strategies by prospecting new buyers and sellers, expanding current business portfolio, and identifying new business ventures both locally and regionally. He/She is expected to maintain extensive knowledge of current market conditions to identify and develop the organisation's propositions and differentiators. The Business Development Manager also collaborates with regional teams to stay informed with the latest geographical trends. In addition, he maintains relationships with existing and new buyers and sellers, and manages a diverse group of stakeholders. He is a highly-driven individual whom possesses great attention to detail and is able to address complex problems in a dynamic business environment.</code> | <code>The Business Development Manager is tasked with spearheading the company’s growth initiatives by identifying potential clients and partners, broadening the existing business portfolio, and exploring new market opportunities both domestically and across the region. This role requires up-to-date insights into prevailing market trends to shape the company’s value propositions and competitive advantages. The Business Development Manager works closely with regional counterparts to monitor emerging geographical market dynamics and sustains strong relationships with a wide range of stakeholders, including current and prospective buyers and sellers. The ideal candidate is a results-oriented professional with meticulous attention to detail, capable of solving complex challenges in a fast-paced commercial setting.</code> | <code>The Business Development Analyst supports the organisation by researching market data and compiling reports on buyer and seller trends to aid decision-making. This role focuses on data gathering rather than direct client engagement and does not involve leading business initiatives or managing stakeholder relationships. The Analyst collaborates with internal teams to update market intelligence and assists in regional market analysis, requiring strong analytical skills and attention to detail but limited strategic responsibility. This position operates under close supervision and is primarily focused on supporting functions rather than driving business growth.</code> |
396
+ | <code>The Senior Care Staff supervises the provision of care to clients. He/She supervises the performance of tasks in care plans by care team members and provides input in the development and review of care plans with social service and/or healthcare professionals. He also supervises the daily operations and maintenance of the care environment and advises on measures to ensure clients observe house rules. He designs activities for clients to promote independence, health, wellness, and quality of life and monitors operations to ensure adherence to relevant statutory requirements and organisational policies. A resourceful, proactive and responsible professional who possesses good leadership and team management skills, the Senior Care Staff works in various voluntary welfare organisations, communities and institutional settings.</code> | <code>The Senior Care Staff oversees the delivery of client care by guiding care team members in executing care plans. They collaborate with social service and healthcare professionals to contribute to the formulation and assessment of these plans. Additionally, they manage daily operations and upkeep of the care environment, ensuring clients comply with house regulations. The role involves creating client activities aimed at enhancing independence, health, and overall quality of life, while ensuring all procedures comply with statutory and organizational standards. As a dependable and proactive leader with strong team management capabilities, the Senior Care Staff typically operates within voluntary welfare organizations, community settings, and institutional care facilities.</code> | <code>The Senior Administrative Officer manages office operations within healthcare organizations, coordinating administrative tasks and supporting staff to ensure efficient workflow. This role focuses on handling documentation, scheduling, and communication rather than direct client care. The Senior Administrative Officer requires strong organizational and clerical skills, with an emphasis on policy compliance and office management, serving primarily in healthcare administration rather than frontline care services.</code> |
397
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
398
+ ```json
399
+ {
400
+ "scale": 20.0,
401
+ "similarity_fct": "cos_sim",
402
+ "gather_across_devices": false
403
+ }
404
+ ```
405
+
406
+ ### Training Hyperparameters
407
+ #### Non-Default Hyperparameters
408
+
409
+ - `eval_strategy`: epoch
410
+ - `per_device_train_batch_size`: 32
411
+ - `per_device_eval_batch_size`: 16
412
+ - `gradient_accumulation_steps`: 16
413
+ - `learning_rate`: 2e-05
414
+ - `num_train_epochs`: 5
415
+ - `lr_scheduler_type`: cosine
416
+ - `warmup_ratio`: 0.2
417
+ - `bf16`: True
418
+ - `tf32`: True
419
+ - `load_best_model_at_end`: True
420
+ - `batch_sampler`: no_duplicates
421
+
422
+ #### All Hyperparameters
423
+ <details><summary>Click to expand</summary>
424
+
425
+ - `overwrite_output_dir`: False
426
+ - `do_predict`: False
427
+ - `eval_strategy`: epoch
428
+ - `prediction_loss_only`: True
429
+ - `per_device_train_batch_size`: 32
430
+ - `per_device_eval_batch_size`: 16
431
+ - `per_gpu_train_batch_size`: None
432
+ - `per_gpu_eval_batch_size`: None
433
+ - `gradient_accumulation_steps`: 16
434
+ - `eval_accumulation_steps`: None
435
+ - `torch_empty_cache_steps`: None
436
+ - `learning_rate`: 2e-05
437
+ - `weight_decay`: 0.0
438
+ - `adam_beta1`: 0.9
439
+ - `adam_beta2`: 0.999
440
+ - `adam_epsilon`: 1e-08
441
+ - `max_grad_norm`: 1.0
442
+ - `num_train_epochs`: 5
443
+ - `max_steps`: -1
444
+ - `lr_scheduler_type`: cosine
445
+ - `lr_scheduler_kwargs`: {}
446
+ - `warmup_ratio`: 0.2
447
+ - `warmup_steps`: 0
448
+ - `log_level`: passive
449
+ - `log_level_replica`: warning
450
+ - `log_on_each_node`: True
451
+ - `logging_nan_inf_filter`: True
452
+ - `save_safetensors`: True
453
+ - `save_on_each_node`: False
454
+ - `save_only_model`: False
455
+ - `restore_callback_states_from_checkpoint`: False
456
+ - `no_cuda`: False
457
+ - `use_cpu`: False
458
+ - `use_mps_device`: False
459
+ - `seed`: 42
460
+ - `data_seed`: None
461
+ - `jit_mode_eval`: False
462
+ - `use_ipex`: False
463
+ - `bf16`: True
464
+ - `fp16`: False
465
+ - `fp16_opt_level`: O1
466
+ - `half_precision_backend`: auto
467
+ - `bf16_full_eval`: False
468
+ - `fp16_full_eval`: False
469
+ - `tf32`: True
470
+ - `local_rank`: 0
471
+ - `ddp_backend`: None
472
+ - `tpu_num_cores`: None
473
+ - `tpu_metrics_debug`: False
474
+ - `debug`: []
475
+ - `dataloader_drop_last`: False
476
+ - `dataloader_num_workers`: 0
477
+ - `dataloader_prefetch_factor`: None
478
+ - `past_index`: -1
479
+ - `disable_tqdm`: False
480
+ - `remove_unused_columns`: True
481
+ - `label_names`: None
482
+ - `load_best_model_at_end`: True
483
+ - `ignore_data_skip`: False
484
+ - `fsdp`: []
485
+ - `fsdp_min_num_params`: 0
486
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
487
+ - `fsdp_transformer_layer_cls_to_wrap`: None
488
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
489
+ - `deepspeed`: None
490
+ - `label_smoothing_factor`: 0.0
491
+ - `optim`: adamw_torch_fused
492
+ - `optim_args`: None
493
+ - `adafactor`: False
494
+ - `group_by_length`: False
495
+ - `length_column_name`: length
496
+ - `ddp_find_unused_parameters`: None
497
+ - `ddp_bucket_cap_mb`: None
498
+ - `ddp_broadcast_buffers`: False
499
+ - `dataloader_pin_memory`: True
500
+ - `dataloader_persistent_workers`: False
501
+ - `skip_memory_metrics`: True
502
+ - `use_legacy_prediction_loop`: False
503
+ - `push_to_hub`: False
504
+ - `resume_from_checkpoint`: None
505
+ - `hub_model_id`: None
506
+ - `hub_strategy`: every_save
507
+ - `hub_private_repo`: None
508
+ - `hub_always_push`: False
509
+ - `hub_revision`: None
510
+ - `gradient_checkpointing`: False
511
+ - `gradient_checkpointing_kwargs`: None
512
+ - `include_inputs_for_metrics`: False
513
+ - `include_for_metrics`: []
514
+ - `eval_do_concat_batches`: True
515
+ - `fp16_backend`: auto
516
+ - `push_to_hub_model_id`: None
517
+ - `push_to_hub_organization`: None
518
+ - `mp_parameters`:
519
+ - `auto_find_batch_size`: False
520
+ - `full_determinism`: False
521
+ - `torchdynamo`: None
522
+ - `ray_scope`: last
523
+ - `ddp_timeout`: 1800
524
+ - `torch_compile`: False
525
+ - `torch_compile_backend`: None
526
+ - `torch_compile_mode`: None
527
+ - `include_tokens_per_second`: False
528
+ - `include_num_input_tokens_seen`: False
529
+ - `neftune_noise_alpha`: None
530
+ - `optim_target_modules`: None
531
+ - `batch_eval_metrics`: False
532
+ - `eval_on_start`: False
533
+ - `use_liger_kernel`: False
534
+ - `liger_kernel_config`: None
535
+ - `eval_use_gather_object`: False
536
+ - `average_tokens_across_devices`: False
537
+ - `prompts`: None
538
+ - `batch_sampler`: no_duplicates
539
+ - `multi_dataset_batch_sampler`: proportional
540
+ - `router_mapping`: {}
541
+ - `learning_rate_mapping`: {}
542
+
543
+ </details>
544
+
545
+ ### Training Logs
546
+ | Epoch | Step | Training Loss | Validation Loss |
547
+ |:-------:|:------:|:-------------:|:---------------:|
548
+ | 1.0 | 12 | 0.1255 | 0.0076 |
549
+ | 2.0 | 24 | 0.0049 | 0.0028 |
550
+ | 3.0 | 36 | 0.0026 | 0.0020 |
551
+ | 4.0 | 48 | 0.002 | 0.0018 |
552
+ | **5.0** | **60** | **0.002** | **0.0018** |
553
+
554
+ * The bold row denotes the saved checkpoint.
555
+
556
+ ### Framework Versions
557
+ - Python: 3.12.8
558
+ - Sentence Transformers: 5.1.0
559
+ - Transformers: 4.55.0
560
+ - PyTorch: 2.8.0+cu128
561
+ - Accelerate: 1.10.0
562
+ - Datasets: 4.0.0
563
+ - Tokenizers: 0.21.4
564
+
565
+ ## Citation
566
+
567
+ ### BibTeX
568
+
569
+ #### Sentence Transformers
570
+ ```bibtex
571
+ @inproceedings{reimers-2019-sentence-bert,
572
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
573
+ author = "Reimers, Nils and Gurevych, Iryna",
574
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
575
+ month = "11",
576
+ year = "2019",
577
+ publisher = "Association for Computational Linguistics",
578
+ url = "https://arxiv.org/abs/1908.10084",
579
+ }
580
+ ```
581
+
582
+ #### MultipleNegativesRankingLoss
583
+ ```bibtex
584
+ @misc{henderson2017efficient,
585
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
586
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
587
+ year={2017},
588
+ eprint={1705.00652},
589
+ archivePrefix={arXiv},
590
+ primaryClass={cs.CL}
591
+ }
592
+ ```
593
+
594
+ <!--
595
+ ## Glossary
596
+
597
+ *Clearly define terms in order to be accessible across audiences.*
598
+ -->
599
+
600
+ <!--
601
+ ## Model Card Authors
602
+
603
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
604
+ -->
605
+
606
+ <!--
607
+ ## Model Card Contact
608
+
609
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
610
+ -->
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "embedding_dropout": 0.0,
16
+ "eos_token_id": 50282,
17
+ "global_attn_every_n_layers": 3,
18
+ "global_rope_theta": 160000.0,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "initializer_cutoff_factor": 2.0,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 1152,
25
+ "layer_norm_eps": 1e-05,
26
+ "local_attention": 128,
27
+ "local_rope_theta": 10000.0,
28
+ "max_position_embeddings": 8192,
29
+ "mlp_bias": false,
30
+ "mlp_dropout": 0.0,
31
+ "model_type": "modernbert",
32
+ "norm_bias": false,
33
+ "norm_eps": 1e-05,
34
+ "num_attention_heads": 12,
35
+ "num_hidden_layers": 22,
36
+ "pad_token_id": 50283,
37
+ "position_embedding_type": "absolute",
38
+ "repad_logits_with_grad": false,
39
+ "sep_token_id": 50282,
40
+ "sparse_pred_ignore_index": -100,
41
+ "sparse_prediction": false,
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.55.0",
44
+ "vocab_size": 50368
45
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.1.0",
4
+ "transformers": "4.55.0",
5
+ "pytorch": "2.8.0+cu128"
6
+ },
7
+ "prompts": {
8
+ "query": "",
9
+ "document": ""
10
+ },
11
+ "default_prompt_name": null,
12
+ "similarity_fn_name": "cosine",
13
+ "model_type": "SentenceTransformer"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c06770ab34ed7f7d4e5e85349652e23fb964f9706f76b5d05eb3dd5c6bff993
3
+ size 596070136
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizerFast",
944
+ "unk_token": "[UNK]"
945
+ }