sm-riti16 commited on
Commit
fc7e101
·
verified ·
1 Parent(s): c6d4768

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,404 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:14131
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: sentence-transformers/all-MiniLM-L6-v2
11
+ widget:
12
+ - source_sentence: 'Honors Thesis I. Business students with outstanding academic records
13
+ may undertake an Honors Thesis. The topic is of the student''s choice but must
14
+ have some original aspect in the question being explored, the data set, or in
15
+ the methods that are used. It must also be of sufficient academic rigor to meet
16
+ the approval of a faculty advisor with expertise in the project''s area. Students
17
+ enroll each semester in a 9-unit independent study course with their faculty advisor
18
+ for the project (70-500 in the fall and 70-501 in the spring). Students and their
19
+ faculty advisor develop a course description for the project and submit it for
20
+ approval as two 9-unit courses to the BA department. Enrollment by permission
21
+ of the BA Program. Industry: business & management. Level: advanced.'
22
+ sentences:
23
+ - project management
24
+ - statistics
25
+ - natural language processing
26
+ - source_sentence: 'Psychology of Sleep. TBA Industry: psychology. Level: intermediate.'
27
+ sentences:
28
+ - scientific computing
29
+ - decision making
30
+ - user research
31
+ - source_sentence: 'Transition Design. Designing for Systems-Level Change. This course
32
+ will provide an overview of the emerging field of Transition Design, which proposes
33
+ societal transitions toward more sustainable futures. The idea of intentional
34
+ (designed) societal transitions has become a global meme and involves an understanding
35
+ of the complex dynamics of socio-technical-ecological systems which form the context
36
+ for many of todays wicked problems (climate change, loss of biodiversity, pollution,
37
+ growing gap between rich/poor, etc.).Through a mix of lecture, readings, classroom
38
+ activities and projects, students will be introduced to the emerging Transition
39
+ Design process which focuses on framing problems in large, spatio-temporal contexts,
40
+ resolving conflict among stakeholder groups and facilitating the co-creation,
41
+ and transition towards, desirable, long-term futures. This course will prepare
42
+ students for work in transdisciplinary teams to address large, societal problems
43
+ that require a deep understanding of the anatomy and dynamics of complex systems.
44
+ Industry: design & hci. Level: advanced.'
45
+ sentences:
46
+ - hardware prototyping
47
+ - stakeholder management
48
+ - mathematical modeling
49
+ - source_sentence: 'Advanced Biochemistry. This is a special topics course in which
50
+ selected topics in biochemistry will be analyzed in depth with emphasis on class
51
+ discussion of papers from the recent research literature. Topics change yearly.
52
+ Recent topics have included single molecule analysis of catalysis and conformational
53
+ changes; intrinsically disordered proteins; cooperative interactions of aspartate
54
+ transcarbamoylase; and the mechanism of ribosomal protein synthesis. Industry:
55
+ biological sciences. Level: advanced.'
56
+ sentences:
57
+ - control systems
58
+ - vector calculus
59
+ - user research
60
+ - source_sentence: 'Metrics for Technology Products & Services. The Metrics for Technology
61
+ Products & Services course provides an in-depth understanding and practice of
62
+ applying metrics to plan and track the development of technology products and
63
+ services and improve them over time by managing their market performance and value
64
+ delivery. The course utilizes a business lens to understand and leverage metrics
65
+ to generate questions and provide answers to meet business and customer goals,
66
+ including delivered value and performance outcomes. Students will be exposed to
67
+ a set of metrics architectures and their specific applications at different levels
68
+ of work aggregation, namely team, program, and portfolio. Value stream mapping
69
+ and analysis will be taught to identify opportunities for delivering value via
70
+ adoption, cost reductions, and organizational capabilities. Through team-oriented
71
+ case study assignments, students can select and design metrics systems to address
72
+ business needs and value generation for product and service development and operations.
73
+ Industry: business & management. Level: advanced.'
74
+ sentences:
75
+ - industrial engineering
76
+ - presentation skills
77
+ - product design
78
+ pipeline_tag: sentence-similarity
79
+ library_name: sentence-transformers
80
+ ---
81
+
82
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
83
+
84
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
85
+
86
+ ## Model Details
87
+
88
+ ### Model Description
89
+ - **Model Type:** Sentence Transformer
90
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
91
+ - **Maximum Sequence Length:** 256 tokens
92
+ - **Output Dimensionality:** 384 dimensions
93
+ - **Similarity Function:** Cosine Similarity
94
+ <!-- - **Training Dataset:** Unknown -->
95
+ <!-- - **Language:** Unknown -->
96
+ <!-- - **License:** Unknown -->
97
+
98
+ ### Model Sources
99
+
100
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
101
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
102
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
103
+
104
+ ### Full Model Architecture
105
+
106
+ ```
107
+ SentenceTransformer(
108
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
109
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
110
+ (2): Normalize()
111
+ )
112
+ ```
113
+
114
+ ## Usage
115
+
116
+ ### Direct Usage (Sentence Transformers)
117
+
118
+ First install the Sentence Transformers library:
119
+
120
+ ```bash
121
+ pip install -U sentence-transformers
122
+ ```
123
+
124
+ Then you can load this model and run inference.
125
+ ```python
126
+ from sentence_transformers import SentenceTransformer
127
+
128
+ # Download from the 🤗 Hub
129
+ model = SentenceTransformer("sentence_transformers_model_id")
130
+ # Run inference
131
+ sentences = [
132
+ 'Metrics for Technology Products & Services. The Metrics for Technology Products & Services course provides an in-depth understanding and practice of applying metrics to plan and track the development of technology products and services and improve them over time by managing their market performance and value delivery. The course utilizes a business lens to understand and leverage metrics to generate questions and provide answers to meet business and customer goals, including delivered value and performance outcomes. Students will be exposed to a set of metrics architectures and their specific applications at different levels of work aggregation, namely team, program, and portfolio. Value stream mapping and analysis will be taught to identify opportunities for delivering value via adoption, cost reductions, and organizational capabilities. Through team-oriented case study assignments, students can select and design metrics systems to address business needs and value generation for product and service development and operations. Industry: business & management. Level: advanced.',
133
+ 'product design',
134
+ 'presentation skills',
135
+ ]
136
+ embeddings = model.encode(sentences)
137
+ print(embeddings.shape)
138
+ # [3, 384]
139
+
140
+ # Get the similarity scores for the embeddings
141
+ similarities = model.similarity(embeddings, embeddings)
142
+ print(similarities)
143
+ # tensor([[1.0000, 0.3146, 0.2180],
144
+ # [0.3146, 1.0000, 0.5224],
145
+ # [0.2180, 0.5224, 1.0000]])
146
+ ```
147
+
148
+ <!--
149
+ ### Direct Usage (Transformers)
150
+
151
+ <details><summary>Click to see the direct usage in Transformers</summary>
152
+
153
+ </details>
154
+ -->
155
+
156
+ <!--
157
+ ### Downstream Usage (Sentence Transformers)
158
+
159
+ You can finetune this model on your own dataset.
160
+
161
+ <details><summary>Click to expand</summary>
162
+
163
+ </details>
164
+ -->
165
+
166
+ <!--
167
+ ### Out-of-Scope Use
168
+
169
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
170
+ -->
171
+
172
+ <!--
173
+ ## Bias, Risks and Limitations
174
+
175
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
176
+ -->
177
+
178
+ <!--
179
+ ### Recommendations
180
+
181
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
182
+ -->
183
+
184
+ ## Training Details
185
+
186
+ ### Training Dataset
187
+
188
+ #### Unnamed Dataset
189
+
190
+ * Size: 14,131 training samples
191
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
192
+ * Approximate statistics based on the first 1000 samples:
193
+ | | sentence_0 | sentence_1 |
194
+ |:--------|:-------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
195
+ | type | string | string |
196
+ | details | <ul><li>min: 14 tokens</li><li>mean: 150.13 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 4.14 tokens</li><li>max: 9 tokens</li></ul> |
197
+ * Samples:
198
+ | sentence_0 | sentence_1 |
199
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------|
200
+ | <code>Design Practicum. This course provides 3 units of pass/fail credit for students participating in a design internship. The student must be registered for this course during the internship, in order to earn the credit. In the summer semester, the course must be paid for as an additional course, as summer courses are not part of the normal fall/spring academic year. At the end of the term, the student's supervisor must email the course coordinator with a brief statement describing the student's activities, and an evaluation of the student's performance. Students are required to submit a statement, reflecting on insights gained from the internship experience. Upon receipt of both statements, the course coordinator will assign a grade of either P or N, depending on the outcome. Industry: design & hci. Level: intermediate.</code> | <code>data analysis</code> |
201
+ | <code>Service Design. In this course, we will collectively define and study services and product service systems, and learn the basics of designing them. We will do this through lectures, studio projects, and verbal and written exposition. Classwork will be done individually and in teams. Industry: design & hci. Level: advanced.</code> | <code>project management</code> |
202
+ | <code>Study Abroad. Students are encouraged to pursue various international collaborative programs offered through the department of Electrical and Computer Engineering. Industry: electrical & computer engineering. Level: intro.</code> | <code>industrial engineering</code> |
203
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
204
+ ```json
205
+ {
206
+ "scale": 20.0,
207
+ "similarity_fct": "cos_sim",
208
+ "gather_across_devices": false
209
+ }
210
+ ```
211
+
212
+ ### Training Hyperparameters
213
+ #### Non-Default Hyperparameters
214
+
215
+ - `per_device_train_batch_size`: 64
216
+ - `per_device_eval_batch_size`: 64
217
+ - `multi_dataset_batch_sampler`: round_robin
218
+
219
+ #### All Hyperparameters
220
+ <details><summary>Click to expand</summary>
221
+
222
+ - `overwrite_output_dir`: False
223
+ - `do_predict`: False
224
+ - `eval_strategy`: no
225
+ - `prediction_loss_only`: True
226
+ - `per_device_train_batch_size`: 64
227
+ - `per_device_eval_batch_size`: 64
228
+ - `per_gpu_train_batch_size`: None
229
+ - `per_gpu_eval_batch_size`: None
230
+ - `gradient_accumulation_steps`: 1
231
+ - `eval_accumulation_steps`: None
232
+ - `torch_empty_cache_steps`: None
233
+ - `learning_rate`: 5e-05
234
+ - `weight_decay`: 0.0
235
+ - `adam_beta1`: 0.9
236
+ - `adam_beta2`: 0.999
237
+ - `adam_epsilon`: 1e-08
238
+ - `max_grad_norm`: 1
239
+ - `num_train_epochs`: 3
240
+ - `max_steps`: -1
241
+ - `lr_scheduler_type`: linear
242
+ - `lr_scheduler_kwargs`: {}
243
+ - `warmup_ratio`: 0.0
244
+ - `warmup_steps`: 0
245
+ - `log_level`: passive
246
+ - `log_level_replica`: warning
247
+ - `log_on_each_node`: True
248
+ - `logging_nan_inf_filter`: True
249
+ - `save_safetensors`: True
250
+ - `save_on_each_node`: False
251
+ - `save_only_model`: False
252
+ - `restore_callback_states_from_checkpoint`: False
253
+ - `no_cuda`: False
254
+ - `use_cpu`: False
255
+ - `use_mps_device`: False
256
+ - `seed`: 42
257
+ - `data_seed`: None
258
+ - `jit_mode_eval`: False
259
+ - `bf16`: False
260
+ - `fp16`: False
261
+ - `fp16_opt_level`: O1
262
+ - `half_precision_backend`: auto
263
+ - `bf16_full_eval`: False
264
+ - `fp16_full_eval`: False
265
+ - `tf32`: None
266
+ - `local_rank`: 0
267
+ - `ddp_backend`: None
268
+ - `tpu_num_cores`: None
269
+ - `tpu_metrics_debug`: False
270
+ - `debug`: []
271
+ - `dataloader_drop_last`: False
272
+ - `dataloader_num_workers`: 0
273
+ - `dataloader_prefetch_factor`: None
274
+ - `past_index`: -1
275
+ - `disable_tqdm`: False
276
+ - `remove_unused_columns`: True
277
+ - `label_names`: None
278
+ - `load_best_model_at_end`: False
279
+ - `ignore_data_skip`: False
280
+ - `fsdp`: []
281
+ - `fsdp_min_num_params`: 0
282
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
283
+ - `fsdp_transformer_layer_cls_to_wrap`: None
284
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
285
+ - `parallelism_config`: None
286
+ - `deepspeed`: None
287
+ - `label_smoothing_factor`: 0.0
288
+ - `optim`: adamw_torch_fused
289
+ - `optim_args`: None
290
+ - `adafactor`: False
291
+ - `group_by_length`: False
292
+ - `length_column_name`: length
293
+ - `project`: huggingface
294
+ - `trackio_space_id`: trackio
295
+ - `ddp_find_unused_parameters`: None
296
+ - `ddp_bucket_cap_mb`: None
297
+ - `ddp_broadcast_buffers`: False
298
+ - `dataloader_pin_memory`: True
299
+ - `dataloader_persistent_workers`: False
300
+ - `skip_memory_metrics`: True
301
+ - `use_legacy_prediction_loop`: False
302
+ - `push_to_hub`: False
303
+ - `resume_from_checkpoint`: None
304
+ - `hub_model_id`: None
305
+ - `hub_strategy`: every_save
306
+ - `hub_private_repo`: None
307
+ - `hub_always_push`: False
308
+ - `hub_revision`: None
309
+ - `gradient_checkpointing`: False
310
+ - `gradient_checkpointing_kwargs`: None
311
+ - `include_inputs_for_metrics`: False
312
+ - `include_for_metrics`: []
313
+ - `eval_do_concat_batches`: True
314
+ - `fp16_backend`: auto
315
+ - `push_to_hub_model_id`: None
316
+ - `push_to_hub_organization`: None
317
+ - `mp_parameters`:
318
+ - `auto_find_batch_size`: False
319
+ - `full_determinism`: False
320
+ - `torchdynamo`: None
321
+ - `ray_scope`: last
322
+ - `ddp_timeout`: 1800
323
+ - `torch_compile`: False
324
+ - `torch_compile_backend`: None
325
+ - `torch_compile_mode`: None
326
+ - `include_tokens_per_second`: False
327
+ - `include_num_input_tokens_seen`: no
328
+ - `neftune_noise_alpha`: None
329
+ - `optim_target_modules`: None
330
+ - `batch_eval_metrics`: False
331
+ - `eval_on_start`: False
332
+ - `use_liger_kernel`: False
333
+ - `liger_kernel_config`: None
334
+ - `eval_use_gather_object`: False
335
+ - `average_tokens_across_devices`: True
336
+ - `prompts`: None
337
+ - `batch_sampler`: batch_sampler
338
+ - `multi_dataset_batch_sampler`: round_robin
339
+ - `router_mapping`: {}
340
+ - `learning_rate_mapping`: {}
341
+
342
+ </details>
343
+
344
+ ### Training Logs
345
+ | Epoch | Step | Training Loss |
346
+ |:------:|:----:|:-------------:|
347
+ | 2.2624 | 500 | 3.114 |
348
+
349
+
350
+ ### Framework Versions
351
+ - Python: 3.12.12
352
+ - Sentence Transformers: 5.1.2
353
+ - Transformers: 4.57.2
354
+ - PyTorch: 2.9.1+cpu
355
+ - Accelerate: 1.12.0
356
+ - Datasets: 4.4.1
357
+ - Tokenizers: 0.22.1
358
+
359
+ ## Citation
360
+
361
+ ### BibTeX
362
+
363
+ #### Sentence Transformers
364
+ ```bibtex
365
+ @inproceedings{reimers-2019-sentence-bert,
366
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
367
+ author = "Reimers, Nils and Gurevych, Iryna",
368
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
369
+ month = "11",
370
+ year = "2019",
371
+ publisher = "Association for Computational Linguistics",
372
+ url = "https://arxiv.org/abs/1908.10084",
373
+ }
374
+ ```
375
+
376
+ #### MultipleNegativesRankingLoss
377
+ ```bibtex
378
+ @misc{henderson2017efficient,
379
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
380
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
381
+ year={2017},
382
+ eprint={1705.00652},
383
+ archivePrefix={arXiv},
384
+ primaryClass={cs.CL}
385
+ }
386
+ ```
387
+
388
+ <!--
389
+ ## Glossary
390
+
391
+ *Clearly define terms in order to be accessible across audiences.*
392
+ -->
393
+
394
+ <!--
395
+ ## Model Card Authors
396
+
397
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
398
+ -->
399
+
400
+ <!--
401
+ ## Model Card Contact
402
+
403
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
404
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.2",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.1.2",
4
+ "transformers": "4.57.2",
5
+ "pytorch": "2.9.1+cpu"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b1b320682d1ebc33c34228ffcfa97f7a81dc1a230ef9fdd60e52776ff9d55dc
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
skill_emb_trained.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31178c688f31ee1b689424eb85d700f4763e5e0296d79642a150da7be5615d8c
3
+ size 247424
skills_index.csv ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ row_index,skill_id,skill_name
2
+ 0,0,python programming
3
+ 1,1,c programming
4
+ 2,2,c++ programming
5
+ 3,3,java programming
6
+ 4,4,matlab programming
7
+ 5,5,r programming
8
+ 6,6,sql
9
+ 7,7,html css javascript
10
+ 8,8,web development
11
+ 9,9,software engineering
12
+ 10,10,object oriented programming
13
+ 11,11,functional programming
14
+ 12,12,data structures
15
+ 13,13,algorithm design
16
+ 14,14,operating systems
17
+ 15,15,computer architecture
18
+ 16,16,distributed systems
19
+ 17,17,cloud computing
20
+ 18,18,computer networking
21
+ 19,19,cybersecurity fundamentals
22
+ 20,20,ethical hacking
23
+ 21,21,databases
24
+ 22,22,data modeling
25
+ 23,23,data analysis
26
+ 24,24,data visualization
27
+ 25,25,data mining
28
+ 26,26,machine learning
29
+ 27,27,deep learning
30
+ 28,28,neural networks
31
+ 29,29,reinforcement learning
32
+ 30,30,natural language processing
33
+ 31,31,computer vision
34
+ 32,32,signal processing
35
+ 33,33,fourier analysis
36
+ 34,34,time series analysis
37
+ 35,35,statistics
38
+ 36,36,probability theory
39
+ 37,37,statistical modeling
40
+ 38,38,regression analysis
41
+ 39,39,classifiers and clustering
42
+ 40,40,optimization methods
43
+ 41,41,convex optimization
44
+ 42,42,numerical methods
45
+ 43,43,numerical linear algebra
46
+ 44,44,scientific computing
47
+ 45,45,computational thinking
48
+ 46,46,mathematical modeling
49
+ 47,47,discrete mathematics
50
+ 48,48,logic and set theory
51
+ 49,49,calculus
52
+ 50,50,vector calculus
53
+ 51,51,differential equations
54
+ 52,52,advanced calculus
55
+ 53,53,graph theory
56
+ 54,54,network science
57
+ 55,55,physics fundamentals
58
+ 56,56,classical mechanics
59
+ 57,57,electromagnetism
60
+ 58,58,thermodynamics
61
+ 59,59,fluid mechanics
62
+ 60,60,heat transfer
63
+ 61,61,materials science
64
+ 62,62,chemistry fundamentals
65
+ 63,63,mechanical design
66
+ 64,64,cad modeling
67
+ 65,65,solidworks
68
+ 66,66,fusion 360
69
+ 67,67,fea analysis
70
+ 68,68,cfd analysis
71
+ 69,69,mechanical vibrations
72
+ 70,70,dynamics
73
+ 71,71,kine​matics
74
+ 72,72,multibody dynamics
75
+ 73,73,robotics fundamentals
76
+ 74,74,robot kinematics
77
+ 75,75,robot dynamics
78
+ 76,76,robot motion planning
79
+ 77,77,path planning
80
+ 78,78,slam (localization and mapping)
81
+ 79,79,robot perception
82
+ 80,80,ros (robot operating system)
83
+ 81,81,embedded systems
84
+ 82,82,microcontrollers
85
+ 83,83,embedded linux
86
+ 84,84,real-time systems
87
+ 85,85,fpgas and digital logic
88
+ 86,86,verilog or systemverilog
89
+ 87,87,circuit design
90
+ 88,88,analog electronics
91
+ 89,89,digital electronics
92
+ 90,90,pcb design
93
+ 91,91,sensor fusion
94
+ 92,92,imu processing
95
+ 93,93,control systems
96
+ 94,94,pid control
97
+ 95,95,state-space modeling
98
+ 96,96,optimal control (lqr)
99
+ 97,97,nonlinear control
100
+ 98,98,system identification
101
+ 99,99,autonomous systems
102
+ 100,100,robot simulation (gazebo mujoco pybullet)
103
+ 101,101,hardware prototyping
104
+ 102,102,rapid prototyping
105
+ 103,103,3d printing
106
+ 104,104,manufacturing processes
107
+ 105,105,machining and cnc
108
+ 106,106,industrial engineering
109
+ 107,107,quality engineering
110
+ 108,108,reliability engineering
111
+ 109,109,failure analysis
112
+ 110,110,fmea (failure modes and effects analysis)
113
+ 111,111,supply chain fundamentals
114
+ 112,112,systems engineering
115
+ 113,113,requirements engineering
116
+ 114,114,systems integration
117
+ 115,115,verification and validation
118
+ 116,116,design of experiments
119
+ 117,117,human factors engineering
120
+ 118,118,human-computer interaction
121
+ 119,119,ui ux design
122
+ 120,120,user research
123
+ 121,121,usability testing
124
+ 122,122,technical documentation
125
+ 123,123,technical writing
126
+ 124,124,research methods
127
+ 125,125,scientific experimentation
128
+ 126,126,data ethics
129
+ 127,127,ai fairness
130
+ 128,128,ethics in engineering
131
+ 129,129,engineering communication
132
+ 130,130,presentation skills
133
+ 131,131,communication skills
134
+ 132,132,team collaboration
135
+ 133,133,leadership
136
+ 134,134,project management
137
+ 135,135,program management
138
+ 136,136,agile development
139
+ 137,137,scrum methodology
140
+ 138,138,stakeholder management
141
+ 139,139,risk management
142
+ 140,140,conflict resolution
143
+ 141,141,negotiation
144
+ 142,142,strategic thinking
145
+ 143,143,problem solving
146
+ 144,144,critical thinking
147
+ 145,145,decision making
148
+ 146,146,creativity and innovation
149
+ 147,147,adaptability
150
+ 148,148,time management
151
+ 149,149,mentorship
152
+ 150,150,cross-cultural communication
153
+ 151,151,entrepreneurship
154
+ 152,152,product management
155
+ 153,153,product design
156
+ 154,154,design thinking
157
+ 155,155,information visualization
158
+ 156,156,business fundamentals
159
+ 157,157,financial literacy
160
+ 158,158,economic analysis
161
+ 159,159,policy analysis
162
+ 160,160,environmental sustainability
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 256,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff