Mdean77 commited on
Commit
6586e26
·
verified ·
1 Parent(s): a8a0424

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,648 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:156
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-l
11
+ widget:
12
+ - source_sentence: What are some potential negative uses of Large Language Models
13
+ as described in the context?
14
+ sentences:
15
+ - 'I think this means that, as individual users, we don’t need to feel any guilt
16
+ at all for the energy consumed by the vast majority of our prompts. The impact
17
+ is likely neglible compared to driving a car down the street or maybe even watching
18
+ a video on YouTube.
19
+
20
+ Likewise, training. DeepSeek v3 training for less than $6m is a fantastic sign
21
+ that training costs can and should continue to drop.
22
+
23
+ For less efficient models I find it useful to compare their energy usage to commercial
24
+ flights. The largest Llama 3 model cost about the same as a single digit number
25
+ of fully loaded passenger flights from New York to London. That’s certainly not
26
+ nothing, but once trained that model can be used by millions of people at no extra
27
+ training cost.'
28
+ - 'Here’s the sequel to this post: Things we learned about LLMs in 2024.
29
+
30
+ Large Language Models
31
+
32
+ In the past 24-36 months, our species has discovered that you can take a GIANT
33
+ corpus of text, run it through a pile of GPUs, and use it to create a fascinating
34
+ new kind of software.
35
+
36
+ LLMs can do a lot of things. They can answer questions, summarize documents, translate
37
+ from one language to another, extract information and even write surprisingly
38
+ competent code.
39
+
40
+ They can also help you cheat at your homework, generate unlimited streams of fake
41
+ content and be used for all manner of nefarious purposes.'
42
+ - 'There’s now a fascinating ecosystem of people training their own models on top
43
+ of these foundations, publishing those models, building fine-tuning datasets and
44
+ sharing those too.
45
+
46
+ The Hugging Face Open LLM Leaderboard is one place that tracks these. I can’t
47
+ even attempt to count them, and any count would be out-of-date within a few hours.
48
+
49
+ The best overall openly licensed LLM at any time is rarely a foundation model:
50
+ instead, it’s whichever fine-tuned community model has most recently discovered
51
+ the best combination of fine-tuning data.
52
+
53
+ This is a huge advantage for open over closed models: the closed, hosted models
54
+ don’t have thousands of researchers and hobbyists around the world collaborating
55
+ and competing to improve them.'
56
+ - source_sentence: Why might some question the necessity of the extensive infrastructure
57
+ investments for future AI models?
58
+ sentences:
59
+ - 'These abilities are just a few weeks old at this point, and I don’t think their
60
+ impact has been fully felt yet. If you haven’t tried them out yet you really should.
61
+
62
+ Both Gemini and OpenAI offer API access to these features as well. OpenAI started
63
+ with a WebSocket API that was quite challenging to use, but in December they announced
64
+ a new WebRTC API which is much easier to get started with. Building a web app
65
+ that a user can talk to via voice is easy now!
66
+
67
+ Prompt driven app generation is a commodity already
68
+
69
+ This was possible with GPT-4 in 2023, but the value it provides became evident
70
+ in 2024.'
71
+ - 'The environmental impact got much, much worse
72
+
73
+ The much bigger problem here is the enormous competitive buildout of the infrastructure
74
+ that is imagined to be necessary for these models in the future.
75
+
76
+ Companies like Google, Meta, Microsoft and Amazon are all spending billions of
77
+ dollars rolling out new datacenters, with a very material impact on the electricity
78
+ grid and the environment. There’s even talk of spinning up new nuclear power stations,
79
+ but those can take decades.
80
+
81
+ Is this infrastructure necessary? DeepSeek v3’s $6m training cost and the continued
82
+ crash in LLM prices might hint that it’s not. But would you want to be the big
83
+ tech executive that argued NOT to build out this infrastructure only to be proven
84
+ wrong in a few years’ time?'
85
+ - 'OpenAI are not the only game in town here. Google released their first entrant
86
+ in the category, gemini-2.0-flash-thinking-exp, on December 19th.
87
+
88
+ Alibaba’s Qwen team released their QwQ model on November 28th—under an Apache
89
+ 2.0 license, and that one I could run on my own machine. They followed that up
90
+ with a vision reasoning model called QvQ on December 24th, which I also ran locally.
91
+
92
+ DeepSeek made their DeepSeek-R1-Lite-Preview model available to try out through
93
+ their chat interface on November 20th.
94
+
95
+ To understand more about inference scaling I recommend Is AI progress slowing
96
+ down? by Arvind Narayanan and Sayash Kapoor.'
97
+ - source_sentence: How have US export regulations on GPUs to China influenced training
98
+ optimizations?
99
+ sentences:
100
+ - 'Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac talks about
101
+ Qwen2.5-Coder-32B in November—an Apache 2.0 licensed model!
102
+
103
+
104
+ I can now run a GPT-4 class model on my laptop talks about running Meta’s Llama
105
+ 3.3 70B (released in December)'
106
+ - 'Those US export regulations on GPUs to China seem to have inspired some very
107
+ effective training optimizations!
108
+
109
+ The environmental impact got better
110
+
111
+ A welcome result of the increased efficiency of the models—both the hosted ones
112
+ and the ones I can run locally—is that the energy usage and environmental impact
113
+ of running a prompt has dropped enormously over the past couple of years.
114
+
115
+ OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days.
116
+ I have it on good authority that neither Google Gemini nor Amazon Nova (two of
117
+ the least expensive model providers) are running prompts at a loss.'
118
+ - 'The GPT-4 barrier was comprehensively broken
119
+
120
+ In my December 2023 review I wrote about how We don’t yet know how to build GPT-4—OpenAI’s
121
+ best model was almost a year old at that point, yet no other AI lab had produced
122
+ anything better. What did OpenAI know that the rest of us didn’t?
123
+
124
+ I’m relieved that this has changed completely in the past twelve months. 18 organizations
125
+ now have models on the Chatbot Arena Leaderboard that rank higher than the original
126
+ GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.'
127
+ - source_sentence: When was GPT-4 officially released by OpenAI?
128
+ sentences:
129
+ - The most recent twist, again from December (December was a lot) is live video.
130
+ ChatGPT voice mode now provides the option to share your camera feed with the
131
+ model and talk about what you can see in real time. Google Gemini have a preview
132
+ of the same feature, which they managed to ship the day before ChatGPT did.
133
+ - 'On the other hand, as software engineers we are better placed to take advantage
134
+ of this than anyone else. We’ve all been given weird coding interns—we can use
135
+ our deep knowledge to prompt them to solve coding problems more effectively than
136
+ anyone else can.
137
+
138
+ The ethics of this space remain diabolically complex
139
+
140
+ In September last year Andy Baio and I produced the first major story on the unlicensed
141
+ training data behind Stable Diffusion.
142
+
143
+ Since then, almost every major LLM (and most of the image generation models) have
144
+ also been trained on unlicensed data.'
145
+ - 'We don’t yet know how to build GPT-4
146
+
147
+ Frustratingly, despite the enormous leaps ahead we’ve had this year, we are yet
148
+ to see an alternative model that’s better than GPT-4.
149
+
150
+ OpenAI released GPT-4 in March, though it later turned out we had a sneak peak
151
+ of it in February when Microsoft used it as part of the new Bing.
152
+
153
+ This may well change in the next few weeks: Google’s Gemini Ultra has big claims,
154
+ but isn’t yet available for us to try out.
155
+
156
+ The team behind Mistral are working to beat GPT-4 as well, and their track record
157
+ is already extremely strong considering their first public model only came out
158
+ in September, and they’ve released two significant improvements since then.'
159
+ - source_sentence: What is the challenge in building AI personal assistants based
160
+ on the gullibility of language models?
161
+ sentences:
162
+ - 'Language Models are gullible. They “believe” what we tell them—what’s in their
163
+ training data, then what’s in the fine-tuning data, then what’s in the prompt.
164
+
165
+ In order to be useful tools for us, we need them to believe what we feed them!
166
+
167
+ But it turns out a lot of the things we want to build need them not to be gullible.
168
+
169
+ Everyone wants an AI personal assistant. If you hired a real-world personal assistant
170
+ who believed everything that anyone told them, you would quickly find that their
171
+ ability to positively impact your life was severely limited.'
172
+ - 'There’s now a fascinating ecosystem of people training their own models on top
173
+ of these foundations, publishing those models, building fine-tuning datasets and
174
+ sharing those too.
175
+
176
+ The Hugging Face Open LLM Leaderboard is one place that tracks these. I can’t
177
+ even attempt to count them, and any count would be out-of-date within a few hours.
178
+
179
+ The best overall openly licensed LLM at any time is rarely a foundation model:
180
+ instead, it’s whichever fine-tuned community model has most recently discovered
181
+ the best combination of fine-tuning data.
182
+
183
+ This is a huge advantage for open over closed models: the closed, hosted models
184
+ don’t have thousands of researchers and hobbyists around the world collaborating
185
+ and competing to improve them.'
186
+ - 'Longer inputs dramatically increase the scope of problems that can be solved
187
+ with an LLM: you can now throw in an entire book and ask questions about its contents,
188
+ but more importantly you can feed in a lot of example code to help the model correctly
189
+ solve a coding problem. LLM use-cases that involve long inputs are far more interesting
190
+ to me than short prompts that rely purely on the information already baked into
191
+ the model weights. Many of my tools were built using this pattern.'
192
+ pipeline_tag: sentence-similarity
193
+ library_name: sentence-transformers
194
+ metrics:
195
+ - cosine_accuracy@1
196
+ - cosine_accuracy@3
197
+ - cosine_accuracy@5
198
+ - cosine_accuracy@10
199
+ - cosine_precision@1
200
+ - cosine_precision@3
201
+ - cosine_precision@5
202
+ - cosine_precision@10
203
+ - cosine_recall@1
204
+ - cosine_recall@3
205
+ - cosine_recall@5
206
+ - cosine_recall@10
207
+ - cosine_ndcg@10
208
+ - cosine_mrr@10
209
+ - cosine_map@100
210
+ model-index:
211
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
212
+ results:
213
+ - task:
214
+ type: information-retrieval
215
+ name: Information Retrieval
216
+ dataset:
217
+ name: Unknown
218
+ type: unknown
219
+ metrics:
220
+ - type: cosine_accuracy@1
221
+ value: 0.875
222
+ name: Cosine Accuracy@1
223
+ - type: cosine_accuracy@3
224
+ value: 1.0
225
+ name: Cosine Accuracy@3
226
+ - type: cosine_accuracy@5
227
+ value: 1.0
228
+ name: Cosine Accuracy@5
229
+ - type: cosine_accuracy@10
230
+ value: 1.0
231
+ name: Cosine Accuracy@10
232
+ - type: cosine_precision@1
233
+ value: 0.875
234
+ name: Cosine Precision@1
235
+ - type: cosine_precision@3
236
+ value: 0.3333333333333333
237
+ name: Cosine Precision@3
238
+ - type: cosine_precision@5
239
+ value: 0.20000000000000004
240
+ name: Cosine Precision@5
241
+ - type: cosine_precision@10
242
+ value: 0.10000000000000002
243
+ name: Cosine Precision@10
244
+ - type: cosine_recall@1
245
+ value: 0.875
246
+ name: Cosine Recall@1
247
+ - type: cosine_recall@3
248
+ value: 1.0
249
+ name: Cosine Recall@3
250
+ - type: cosine_recall@5
251
+ value: 1.0
252
+ name: Cosine Recall@5
253
+ - type: cosine_recall@10
254
+ value: 1.0
255
+ name: Cosine Recall@10
256
+ - type: cosine_ndcg@10
257
+ value: 0.9538662191964322
258
+ name: Cosine Ndcg@10
259
+ - type: cosine_mrr@10
260
+ value: 0.9375
261
+ name: Cosine Mrr@10
262
+ - type: cosine_map@100
263
+ value: 0.9375
264
+ name: Cosine Map@100
265
+ ---
266
+
267
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
268
+
269
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
270
+
271
+ ## Model Details
272
+
273
+ ### Model Description
274
+ - **Model Type:** Sentence Transformer
275
+ - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
276
+ - **Maximum Sequence Length:** 512 tokens
277
+ - **Output Dimensionality:** 1024 dimensions
278
+ - **Similarity Function:** Cosine Similarity
279
+ <!-- - **Training Dataset:** Unknown -->
280
+ <!-- - **Language:** Unknown -->
281
+ <!-- - **License:** Unknown -->
282
+
283
+ ### Model Sources
284
+
285
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
286
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
287
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
288
+
289
+ ### Full Model Architecture
290
+
291
+ ```
292
+ SentenceTransformer(
293
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
294
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
295
+ (2): Normalize()
296
+ )
297
+ ```
298
+
299
+ ## Usage
300
+
301
+ ### Direct Usage (Sentence Transformers)
302
+
303
+ First install the Sentence Transformers library:
304
+
305
+ ```bash
306
+ pip install -U sentence-transformers
307
+ ```
308
+
309
+ Then you can load this model and run inference.
310
+ ```python
311
+ from sentence_transformers import SentenceTransformer
312
+
313
+ # Download from the 🤗 Hub
314
+ model = SentenceTransformer("Mdean77/snow_ft_2025")
315
+ # Run inference
316
+ sentences = [
317
+ 'What is the challenge in building AI personal assistants based on the gullibility of language models?',
318
+ 'Language Models are gullible. They “believe” what we tell them—what’s in their training data, then what’s in the fine-tuning data, then what’s in the prompt.\nIn order to be useful tools for us, we need them to believe what we feed them!\nBut it turns out a lot of the things we want to build need them not to be gullible.\nEveryone wants an AI personal assistant. If you hired a real-world personal assistant who believed everything that anyone told them, you would quickly find that their ability to positively impact your life was severely limited.',
319
+ 'Longer inputs dramatically increase the scope of problems that can be solved with an LLM: you can now throw in an entire book and ask questions about its contents, but more importantly you can feed in a lot of example code to help the model correctly solve a coding problem. LLM use-cases that involve long inputs are far more interesting to me than short prompts that rely purely on the information already baked into the model weights. Many of my tools were built using this pattern.',
320
+ ]
321
+ embeddings = model.encode(sentences)
322
+ print(embeddings.shape)
323
+ # [3, 1024]
324
+
325
+ # Get the similarity scores for the embeddings
326
+ similarities = model.similarity(embeddings, embeddings)
327
+ print(similarities.shape)
328
+ # [3, 3]
329
+ ```
330
+
331
+ <!--
332
+ ### Direct Usage (Transformers)
333
+
334
+ <details><summary>Click to see the direct usage in Transformers</summary>
335
+
336
+ </details>
337
+ -->
338
+
339
+ <!--
340
+ ### Downstream Usage (Sentence Transformers)
341
+
342
+ You can finetune this model on your own dataset.
343
+
344
+ <details><summary>Click to expand</summary>
345
+
346
+ </details>
347
+ -->
348
+
349
+ <!--
350
+ ### Out-of-Scope Use
351
+
352
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
353
+ -->
354
+
355
+ ## Evaluation
356
+
357
+ ### Metrics
358
+
359
+ #### Information Retrieval
360
+
361
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
362
+
363
+ | Metric | Value |
364
+ |:--------------------|:-----------|
365
+ | cosine_accuracy@1 | 0.875 |
366
+ | cosine_accuracy@3 | 1.0 |
367
+ | cosine_accuracy@5 | 1.0 |
368
+ | cosine_accuracy@10 | 1.0 |
369
+ | cosine_precision@1 | 0.875 |
370
+ | cosine_precision@3 | 0.3333 |
371
+ | cosine_precision@5 | 0.2 |
372
+ | cosine_precision@10 | 0.1 |
373
+ | cosine_recall@1 | 0.875 |
374
+ | cosine_recall@3 | 1.0 |
375
+ | cosine_recall@5 | 1.0 |
376
+ | cosine_recall@10 | 1.0 |
377
+ | **cosine_ndcg@10** | **0.9539** |
378
+ | cosine_mrr@10 | 0.9375 |
379
+ | cosine_map@100 | 0.9375 |
380
+
381
+ <!--
382
+ ## Bias, Risks and Limitations
383
+
384
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
385
+ -->
386
+
387
+ <!--
388
+ ### Recommendations
389
+
390
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
391
+ -->
392
+
393
+ ## Training Details
394
+
395
+ ### Training Dataset
396
+
397
+ #### Unnamed Dataset
398
+
399
+ * Size: 156 training samples
400
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
401
+ * Approximate statistics based on the first 156 samples:
402
+ | | sentence_0 | sentence_1 |
403
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
404
+ | type | string | string |
405
+ | details | <ul><li>min: 12 tokens</li><li>mean: 20.94 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.22 tokens</li><li>max: 214 tokens</li></ul> |
406
+ * Samples:
407
+ | sentence_0 | sentence_1 |
408
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
409
+ | <code>What advantage does a 64GB Mac have for running models in terms of CPU and GPU memory sharing?</code> | <code>On paper, a 64GB Mac should be a great machine for running models due to the way the CPU and GPU can share the same memory. In practice, many models are released as model weights and libraries that reward NVIDIA’s CUDA over other platforms.<br>The llama.cpp ecosystem helped a lot here, but the real breakthrough has been Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.<br>Apple’s mlx-lm Python library supports running a wide range of MLX-compatible models on my Mac, with excellent performance. mlx-community on Hugging Face offers more than 1,000 models that have been converted to the necessary format.</code> |
410
+ | <code>How has Apple’s MLX library impacted the performance of running machine learning models on Mac?</code> | <code>On paper, a 64GB Mac should be a great machine for running models due to the way the CPU and GPU can share the same memory. In practice, many models are released as model weights and libraries that reward NVIDIA’s CUDA over other platforms.<br>The llama.cpp ecosystem helped a lot here, but the real breakthrough has been Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.<br>Apple’s mlx-lm Python library supports running a wide range of MLX-compatible models on my Mac, with excellent performance. mlx-community on Hugging Face offers more than 1,000 models that have been converted to the necessary format.</code> |
411
+ | <code>How does the ability of models like ChatGPT Code Interpreter to execute and debug code impact the problem of hallucination in code generation?</code> | <code>Except... you can run generated code to see if it’s correct. And with patterns like ChatGPT Code Interpreter the LLM can execute the code itself, process the error message, then rewrite it and keep trying until it works!<br>So hallucination is a much lesser problem for code generation than for anything else. If only we had the equivalent of Code Interpreter for fact-checking natural language!<br>How should we feel about this as software engineers?<br>On the one hand, this feels like a threat: who needs a programmer if ChatGPT can write code for you?</code> |
412
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
413
+ ```json
414
+ {
415
+ "loss": "MultipleNegativesRankingLoss",
416
+ "matryoshka_dims": [
417
+ 768,
418
+ 512,
419
+ 256,
420
+ 128,
421
+ 64
422
+ ],
423
+ "matryoshka_weights": [
424
+ 1,
425
+ 1,
426
+ 1,
427
+ 1,
428
+ 1
429
+ ],
430
+ "n_dims_per_step": -1
431
+ }
432
+ ```
433
+
434
+ ### Training Hyperparameters
435
+ #### Non-Default Hyperparameters
436
+
437
+ - `eval_strategy`: steps
438
+ - `per_device_train_batch_size`: 10
439
+ - `per_device_eval_batch_size`: 10
440
+ - `num_train_epochs`: 10
441
+ - `multi_dataset_batch_sampler`: round_robin
442
+
443
+ #### All Hyperparameters
444
+ <details><summary>Click to expand</summary>
445
+
446
+ - `overwrite_output_dir`: False
447
+ - `do_predict`: False
448
+ - `eval_strategy`: steps
449
+ - `prediction_loss_only`: True
450
+ - `per_device_train_batch_size`: 10
451
+ - `per_device_eval_batch_size`: 10
452
+ - `per_gpu_train_batch_size`: None
453
+ - `per_gpu_eval_batch_size`: None
454
+ - `gradient_accumulation_steps`: 1
455
+ - `eval_accumulation_steps`: None
456
+ - `torch_empty_cache_steps`: None
457
+ - `learning_rate`: 5e-05
458
+ - `weight_decay`: 0.0
459
+ - `adam_beta1`: 0.9
460
+ - `adam_beta2`: 0.999
461
+ - `adam_epsilon`: 1e-08
462
+ - `max_grad_norm`: 1
463
+ - `num_train_epochs`: 10
464
+ - `max_steps`: -1
465
+ - `lr_scheduler_type`: linear
466
+ - `lr_scheduler_kwargs`: {}
467
+ - `warmup_ratio`: 0.0
468
+ - `warmup_steps`: 0
469
+ - `log_level`: passive
470
+ - `log_level_replica`: warning
471
+ - `log_on_each_node`: True
472
+ - `logging_nan_inf_filter`: True
473
+ - `save_safetensors`: True
474
+ - `save_on_each_node`: False
475
+ - `save_only_model`: False
476
+ - `restore_callback_states_from_checkpoint`: False
477
+ - `no_cuda`: False
478
+ - `use_cpu`: False
479
+ - `use_mps_device`: False
480
+ - `seed`: 42
481
+ - `data_seed`: None
482
+ - `jit_mode_eval`: False
483
+ - `use_ipex`: False
484
+ - `bf16`: False
485
+ - `fp16`: False
486
+ - `fp16_opt_level`: O1
487
+ - `half_precision_backend`: auto
488
+ - `bf16_full_eval`: False
489
+ - `fp16_full_eval`: False
490
+ - `tf32`: None
491
+ - `local_rank`: 0
492
+ - `ddp_backend`: None
493
+ - `tpu_num_cores`: None
494
+ - `tpu_metrics_debug`: False
495
+ - `debug`: []
496
+ - `dataloader_drop_last`: False
497
+ - `dataloader_num_workers`: 0
498
+ - `dataloader_prefetch_factor`: None
499
+ - `past_index`: -1
500
+ - `disable_tqdm`: False
501
+ - `remove_unused_columns`: True
502
+ - `label_names`: None
503
+ - `load_best_model_at_end`: False
504
+ - `ignore_data_skip`: False
505
+ - `fsdp`: []
506
+ - `fsdp_min_num_params`: 0
507
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
508
+ - `fsdp_transformer_layer_cls_to_wrap`: None
509
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
510
+ - `deepspeed`: None
511
+ - `label_smoothing_factor`: 0.0
512
+ - `optim`: adamw_torch
513
+ - `optim_args`: None
514
+ - `adafactor`: False
515
+ - `group_by_length`: False
516
+ - `length_column_name`: length
517
+ - `ddp_find_unused_parameters`: None
518
+ - `ddp_bucket_cap_mb`: None
519
+ - `ddp_broadcast_buffers`: False
520
+ - `dataloader_pin_memory`: True
521
+ - `dataloader_persistent_workers`: False
522
+ - `skip_memory_metrics`: True
523
+ - `use_legacy_prediction_loop`: False
524
+ - `push_to_hub`: False
525
+ - `resume_from_checkpoint`: None
526
+ - `hub_model_id`: None
527
+ - `hub_strategy`: every_save
528
+ - `hub_private_repo`: None
529
+ - `hub_always_push`: False
530
+ - `gradient_checkpointing`: False
531
+ - `gradient_checkpointing_kwargs`: None
532
+ - `include_inputs_for_metrics`: False
533
+ - `include_for_metrics`: []
534
+ - `eval_do_concat_batches`: True
535
+ - `fp16_backend`: auto
536
+ - `push_to_hub_model_id`: None
537
+ - `push_to_hub_organization`: None
538
+ - `mp_parameters`:
539
+ - `auto_find_batch_size`: False
540
+ - `full_determinism`: False
541
+ - `torchdynamo`: None
542
+ - `ray_scope`: last
543
+ - `ddp_timeout`: 1800
544
+ - `torch_compile`: False
545
+ - `torch_compile_backend`: None
546
+ - `torch_compile_mode`: None
547
+ - `dispatch_batches`: None
548
+ - `split_batches`: None
549
+ - `include_tokens_per_second`: False
550
+ - `include_num_input_tokens_seen`: False
551
+ - `neftune_noise_alpha`: None
552
+ - `optim_target_modules`: None
553
+ - `batch_eval_metrics`: False
554
+ - `eval_on_start`: False
555
+ - `use_liger_kernel`: False
556
+ - `eval_use_gather_object`: False
557
+ - `average_tokens_across_devices`: False
558
+ - `prompts`: None
559
+ - `batch_sampler`: batch_sampler
560
+ - `multi_dataset_batch_sampler`: round_robin
561
+
562
+ </details>
563
+
564
+ ### Training Logs
565
+ | Epoch | Step | cosine_ndcg@10 |
566
+ |:-----:|:----:|:--------------:|
567
+ | 1.0 | 16 | 0.9692 |
568
+ | 2.0 | 32 | 0.9539 |
569
+ | 3.0 | 48 | 0.9692 |
570
+ | 3.125 | 50 | 0.9692 |
571
+ | 4.0 | 64 | 0.9692 |
572
+ | 5.0 | 80 | 0.9692 |
573
+ | 6.0 | 96 | 0.9692 |
574
+ | 6.25 | 100 | 0.9692 |
575
+ | 7.0 | 112 | 0.9539 |
576
+ | 8.0 | 128 | 0.9539 |
577
+ | 9.0 | 144 | 0.9539 |
578
+ | 9.375 | 150 | 0.9539 |
579
+ | 10.0 | 160 | 0.9539 |
580
+
581
+
582
+ ### Framework Versions
583
+ - Python: 3.13.0
584
+ - Sentence Transformers: 3.4.1
585
+ - Transformers: 4.48.3
586
+ - PyTorch: 2.6.0
587
+ - Accelerate: 1.3.0
588
+ - Datasets: 3.2.0
589
+ - Tokenizers: 0.21.0
590
+
591
+ ## Citation
592
+
593
+ ### BibTeX
594
+
595
+ #### Sentence Transformers
596
+ ```bibtex
597
+ @inproceedings{reimers-2019-sentence-bert,
598
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
599
+ author = "Reimers, Nils and Gurevych, Iryna",
600
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
601
+ month = "11",
602
+ year = "2019",
603
+ publisher = "Association for Computational Linguistics",
604
+ url = "https://arxiv.org/abs/1908.10084",
605
+ }
606
+ ```
607
+
608
+ #### MatryoshkaLoss
609
+ ```bibtex
610
+ @misc{kusupati2024matryoshka,
611
+ title={Matryoshka Representation Learning},
612
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
613
+ year={2024},
614
+ eprint={2205.13147},
615
+ archivePrefix={arXiv},
616
+ primaryClass={cs.LG}
617
+ }
618
+ ```
619
+
620
+ #### MultipleNegativesRankingLoss
621
+ ```bibtex
622
+ @misc{henderson2017efficient,
623
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
624
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
625
+ year={2017},
626
+ eprint={1705.00652},
627
+ archivePrefix={arXiv},
628
+ primaryClass={cs.CL}
629
+ }
630
+ ```
631
+
632
+ <!--
633
+ ## Glossary
634
+
635
+ *Clearly define terms in order to be accessible across audiences.*
636
+ -->
637
+
638
+ <!--
639
+ ## Model Card Authors
640
+
641
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
642
+ -->
643
+
644
+ <!--
645
+ ## Model Card Contact
646
+
647
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
648
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.48.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.6.0"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:875e361b92d729f9094674dae3822360663f4f977bf697ad51d7dfb28d0f4b53
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff